CN114897124A - Intrusion detection feature selection method based on improved wolf optimization algorithm - Google Patents

Intrusion detection feature selection method based on improved wolf optimization algorithm Download PDF

Info

Publication number
CN114897124A
CN114897124A CN202210321742.9A CN202210321742A CN114897124A CN 114897124 A CN114897124 A CN 114897124A CN 202210321742 A CN202210321742 A CN 202210321742A CN 114897124 A CN114897124 A CN 114897124A
Authority
CN
China
Prior art keywords
wolf
population
wolfs
feature
current
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210321742.9A
Other languages
Chinese (zh)
Inventor
贺敬
刘泽超
施然
李思照
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Engineering University
Original Assignee
Harbin Engineering University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Engineering University filed Critical Harbin Engineering University
Priority to CN202210321742.9A priority Critical patent/CN114897124A/en
Publication of CN114897124A publication Critical patent/CN114897124A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24147Distances to closest patterns, e.g. nearest neighbour classification
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Security & Cryptography (AREA)
  • Evolutionary Biology (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides an intrusion detection feature selection method based on an improved grey wolf optimization algorithm, which is characterized in that a wolf group is disturbed by using Cauchy variation and Levy flight to jump out local optimums, the global search capability is improved, an optimal feature subset is obtained, and the accuracy of intrusion detection is improved. When the grey wolf population is initialized, Logistic chaotic mapping is used, the quality of the initial grey wolf population is improved, and the algorithm mining capacity is improved; disturbance is carried out when the wolfsbane population is updated, the defect of falling into local optimum is made up to a certain extent, and the global search capability is improved; and finally, training an optimal feature subset with a small number of features, and bringing the optimal feature subset into a test set to obtain a higher detection accuracy.

Description

Intrusion detection feature selection method based on improved wolf optimization algorithm
Technical Field
The invention relates to a gray wolf optimization algorithm and kNN and other related technologies for classification and identification, and belongs to a method for seeking an optimal feature subset in an intrusion detection system.
Background
In the era of increasing network security risks, more and more security protection means are provided. The intrusion detection technology can effectively detect known and unknown attacks, and has the capabilities of identification and early warning, so that the intrusion detection technology becomes one of the most safe and common protective measures acknowledged in the industry. Although intrusion detection systems have various classifications according to different classification standards, many intrusion detection systems have problems such as low detection efficiency. As the data volume of network traffic increases exponentially, the intrusion frequency and complexity of network attackers are also increasing continuously, so that it is difficult to accurately detect attacks or anomalies in high-dimensional network traffic, and the time cost also increases exponentially with the increase of the data volume. After the network flow data is analyzed, the fact that the number of data features of the network flow is large but the number of the features with invalid redundancy is small is found, the redundancy features are removed, the dimensionality reduction of the high-dimensional network flow is achieved, the time cost can be effectively reduced, and the accuracy rate is increased.
Feature selection is the best method for reducing the dimension of high-dimensional flow data, and the most representative and most valuable feature subset is selected from the features, so that dimension reduction is realized. The categories of feature selection algorithms can be regarded as three categories, filtering, wrapping and embedding. The filtering process is to select the features and train the classifier, and no algorithm is involved in the process of selecting the feature subset; the wrapping type is that when the feature subsets are selected, various intelligent algorithms are added, an optimal group of feature subsets are found from a solution space consisting of all features, and the wrapping type algorithm is also the most widely researched feature selection algorithm at present; the embedded method mixes the two methods and takes the process of feature selection as a link of training the classifier.
The gray wolf is a predator at the top of the nature, prefers to be social and has a strict social ranking system, so that the strongest leader wolf with the highest ranking can be clearly distinguished in the gray wolf group. The grey wolf optimization algorithm simulates the mass hunting behavior pattern of grey wolfs, improved to encompass the prey, i.e., the optimal fitness, each wolf representing a feature subset solution. The algorithm has strong convergence, and has the advantage of easy realization due to less related parameters, but also has the problem of easy falling into local optimization. The characteristic selection based on the gray wolf optimization algorithm is a wrapping type characteristic selection type, when the optimal characteristic subset is searched in a solution space, the gray wolf optimization algorithm is utilized, a plurality of groups of characteristic subsets are used as base points, the search is carried out nearby, and the optimal characteristic subset is finally obtained, so that the dimensionality reduction of high-dimensional flow data is realized, the cost is reduced, the intrusion detection accuracy is improved, and the rapid improvement of the network defense efficiency is realized.
Disclosure of Invention
The invention aims to solve the problems of low accuracy and low speed of the final intrusion detection result caused by factors such as a large number of data features, redundant features and the like in the intrusion detection process. The traditional intrusion detection depends on the prior art to detect flow data and the like, and selects a feature subset for complicated data features by utilizing a gray wolf optimization algorithm, but the current algorithm has the problem of falling into local optimization and influences the selection of the optimal feature subset and the final accuracy. The invention provides an intrusion detection feature selection technology of a gray wolf optimization algorithm combining Cauchy variation and Levy flight, which utilizes the Cauchy variation and Levy flight to disturb a wolf group, jump out local optimality, improve the global search capability, further obtain an optimal feature subset and improve the accuracy of intrusion detection.
The invention is realized by the following specific steps:
step 1: 80% are obtained from KDD Cup99 data set as training set, and 20% are obtained as testing set. And the intrusion detection classification technology selects a kNN technology, and k is 5. K represents the K nearest neighbors, and each sample can be represented by its nearest K neighbors.
Step 2: parameters are set that improve the gray wolf algorithm. The number N of wolf clusters is 30, the maximum iteration number T is 50, the clusters are initialized, the position of the current wolf is represented by a vector X, the dimension is the feature number 41 of the selected data set, and the first generation wolf clusters are generated through initialization.
And step 3: converting all the positions of the wolfsbane population into a binary system, taking 0.5 as a boundary, setting the wolfsbane population as 1 when the wolfsbane population exceeds 0.5 in each dimension, and representing the characteristic represented by the dimension; otherwise, set to 0 indicates that the feature represented by the dimension is not selected.
And 4, step 4: and substituting the obtained binary grayish wolf population vector group into the kNN training set according to the corresponding characteristic selection scheme so as to select corresponding characteristic data.
And 5: training the characteristic data obtained in the step 4 to obtain the accuracy
Figure BDA0003565211110000021
Where TP indicates the number of correct detections, FP indicates the number of erroneous detections, and the error rate error is 1-Accuracy.
Step 6: calculating the fitness of the subset of features
Figure BDA0003565211110000022
Where a and b represent weights, a is typically 0.99, b is typically 0.01, m represents the number of features in this subset of features, and n represents all the subsets of features. The calculation of the formula is used as the fitness value because two factors, namely accuracy and feature subset length, need to be considered in the process of feature selection.
And 7: and (4) after the fitness values of the feature subset schemes calculated in the step (6) are sorted, and the three smallest values are set as three leadership wolfs alpha, beta and gamma. Among these three wolfs, the wolf head, i.e. the wolf with the best fitness, is denoted by α, in the process belonging to the current best feature subset; the beta wolf is the wolf with the second best fitness, obeys to the head wolf and belongs to the current next-best feature subset; the γ wolf is the third best wolf, subject to α, β wolf, belonging to the current third best feature subset.
And 8: generating a random number of [0,1) to determine the perturbation strategy of alpha, beta, gamma wolf. Bounded by 0.5, the random number is within the range of [0,0.5] step 8.1 is performed, otherwise step 8.2 is performed.
Step 8.1: sequentially carrying out Cauchy variation disturbance on alpha, beta and gamma to obtain new positions
Figure BDA0003565211110000023
Figure BDA0003565211110000024
Step 8.2: sequentially carrying out Cauchy variation disturbance on alpha, beta and gamma to obtain new positions
Figure BDA0003565211110000031
Figure BDA0003565211110000032
Wherein
Figure BDA0003565211110000033
And w represents a random number that follows a normal distribution of standards,
Figure BDA0003565211110000034
wherein Γ (x) ═ x-1! .
Step 8.3: and calculating a corresponding fitness value according to the result obtained in the step 8.1 or 8.2, and storing and updating the positions and the fitness values of the three wolfs before updating.
And step 9: the model of the wolf surrounding the prey is X (t +1) ═ X p (t) -A ° D, wherein X p (t) represents the position vector of the current generation of prey, which refers to the position of the first three wolfs, A is the convergence factor, C is the swing factor, D represents the distance between wolfs, and A is 2 alpha DEG r 1 -α,C=2r 2 ,D=C°X p (t) -X (t), ° representing the Hadamard product, a decreases nonlinearly from 2 to 0 throughout the iteration,
Figure BDA0003565211110000035
r 1 and r 2 Is [0,1 ]]A random vector of (1).
Step 10: in each iteration process, the best three wolfs in the current population are kept, and the positions of the candidate wolfs are calculated. The next generation of gray wolf population locations were obtained using the following formula. D α =|C 1 °X α (t)-X(t)|、D β =|C 2 °X β (t)-X(t)|、D γ =|C 3 °X γ (t)-X(t)|、X 1 =X α -A 1 °D α 、X 2 =X β -A 2 °D β 、X 3 =X γ -A 3 °D β
Figure BDA0003565211110000036
Wherein X α 、X β And X γ Three wolfs, D respectively representing the best current population α 、D β And D γ Respectively representing the distances between the current candidate gray wolf and the optimal three wolfs.
Step 11: carrying out Cauchy variation disturbance on all the gray wolves,
Figure BDA0003565211110000037
step 12: and (5) repeating the steps 3 to 12 until the maximum iteration number is reached.
Step 13: and during the kNN test, extracting test set data by using the obtained optimal feature subset, and performing detection analysis.
Compared with the prior art, the invention has the beneficial effects that:
when the wolf population is initialized, Logistic chaotic mapping is used, the quality of the initial wolf population is improved, and the algorithm mining capacity is improved; disturbance is carried out when the wolfsbane population is updated, the defect of falling into local optimum is made up to a certain extent, and the global search capability is improved; and finally, training an optimal feature subset with a small number of features, and bringing the optimal feature subset into a test set to obtain a higher detection accuracy.
Drawings
FIG. 1 is a schematic flow diagram of the process of the present invention;
Detailed Description
The invention is described in further detail below with reference to the drawings and the detailed description.
Fig. 1 is a schematic flow chart of an intrusion detection feature selection method based on caucasian variation gray wolf optimization algorithm according to the present invention. The method comprises the following specific implementation steps:
1. data were randomly drawn from KDD Cup99, with 80% as training set and 20% as test set.
2. All parameters used by the gray wolf optimization algorithm are set, and a gray wolf population is initialized by using Logistic chaotic mapping.
3. Substituting the obtained characteristic selection scheme corresponding to the initial wolf population into the kNN model, and trainingThe training set is subjected to feature extraction and training to obtain specific accuracy
Figure BDA0003565211110000041
And error rate error 1-Accuracy, by combining the feature numbers contained in the feature subsets
Figure BDA0003565211110000042
And calculating the fitness value of each scheme, and sequencing to obtain the best first 3 wolfs as alpha, beta and gamma wolfs.
4. Levy flight is carried out on the 3 wolf
Figure BDA0003565211110000043
Or coxib mutation
Figure BDA0003565211110000044
Figure BDA0003565211110000045
And (4) disturbing, calculating the adaptability value after disturbance, comparing and storing the best result.
5. Updating the positions of all grey wolfs by using the disturbed alpha, beta and gamma wolfs,
Figure BDA0003565211110000046
6. and carrying out Cauchy variation disturbance on the current Huilus lupulus population to obtain the next generation Huilus lupulus population.
7. Subjecting the population to fitness value
Figure BDA0003565211110000047
And updating the top 3 wolf locations.
8. And judging whether the maximum iteration times is reached, if not, returning to the step 4, sequentially performing each step, and jumping out of the loop after the loop is circulated until the maximum iteration times is reached to complete the search of the optimal feature subset.
9. And according to the obtained optimal feature subset, performing feature extraction on data in the KDD Cup99 test set, and detecting by using kNN to obtain the test accuracy.

Claims (1)

1. An intrusion detection feature selection method based on an improved grayling optimization algorithm is characterized by comprising the following steps:
step 1: acquiring 80% of the KDD Cup99 data set as a training set and 20% of the KDD Cup99 data set as a test set; selecting a kNN technology by using an intrusion detection classification technology, wherein k is 5; k represents the K nearest neighbors, each sample can be represented by its nearest K neighbors;
step 2: setting parameters for improving the gray wolf algorithm; the number N of wolf clusters is 30, the maximum iteration number T is 50, a population is initialized, the position of the current wolf is represented by a vector X, the dimension is the characteristic number 41 of the selected data set, and a first generation wolf cluster is generated through initialization;
and step 3: converting all the positions of the wolfsbane population into a binary system, taking 0.5 as a boundary, setting the wolfsbane population as 1 when the wolfsbane population exceeds 0.5 in each dimension, and representing the characteristic represented by the dimension; otherwise, setting the dimension to be 0 to indicate that the feature represented by the dimension is not selected;
and 4, step 4: substituting the obtained binary grayish wolf population vector group into a kNN training set according to the corresponding characteristic selection scheme so as to select corresponding characteristic data;
and 5: training the characteristic data obtained in the step 4 to obtain the accuracy
Figure FDA0003565211100000011
Wherein TP represents the number of correct detections, FP represents the number of false detections, and the error rate is 1-Accuracy;
step 6: calculating the fitness of the subset of features
Figure FDA0003565211100000012
Wherein a and b represent weights, a is usually 0.99, b is usually 0.01, m represents the number of features in the feature subset, and n represents all feature subsets; the calculation of the formula is used as a fitness value because two factors of accuracy and length of a feature subset need to be considered in the process of feature selection;
and 7: after the fitness values of the feature subset schemes calculated in the step 6 are sorted, the three smallest values are set as three leadership wolfs alpha, beta and gamma; among these three wolfs, the wolf head, i.e. the wolf with the best fitness, is denoted by α, in the process belonging to the current best feature subset; the beta wolf is the wolf with the second best fitness, obeys to the head wolf and belongs to the current next-best feature subset; the γ wolf is the third best wolf, obeys to the α, β wolf, and belongs to the current third best feature subset;
and 8: generating a random number of [0,1) to determine a perturbation strategy of alpha, beta, gamma wolf; taking 0.5 as a boundary, executing a step 8.1 when the random number is in the range of [0,0.5], otherwise, executing a step 8.2;
step 8.1: sequentially carrying out Cauchy variation disturbance on alpha, beta and gamma to obtain new positions
Figure FDA0003565211100000013
Figure FDA0003565211100000014
Step 8.2: sequentially carrying out Cauchy variation disturbance on alpha, beta and gamma to obtain new positions
Figure FDA0003565211100000015
Figure FDA0003565211100000016
Wherein
Figure FDA0003565211100000017
u-w- σ, v and w represent a random number that follows a normal distribution of the norm,
Figure FDA0003565211100000021
wherein Γ (x) ═ x-1! (ii) a
Step 8.3: calculating a corresponding fitness value according to the result obtained in the step 8.1 or 8.2, and storing and updating the positions and the fitness values of the three wolfs;
and step 9: the model of the gray wolf surrounding the prey is
Figure FDA0003565211100000025
Wherein X p (t) represents the position vector of the current generation of prey, which refers to the position of the first three wolfs, A is the convergence factor, C is the swing factor, D represents the distance between wolfs,
Figure FDA0003565211100000026
C=2r 2
Figure FDA0003565211100000027
Figure FDA0003565211100000028
representing the hadamard product, a is reduced nonlinearly from 2 to 0 throughout the iteration,
Figure FDA0003565211100000022
r 1 and r 2 Is [0,1 ]]A random vector of (1);
step 10: in each iteration process, the best three wolfs in the current population are reserved, and the positions of the candidate wolfs are calculated; obtaining the next generation of Huilus lupulus population position by using the following formula;
Figure FDA0003565211100000029
Figure FDA00035652111000000210
Figure FDA0003565211100000023
wherein X α 、X β And X γ Three wolfs, D respectively representing the best current population α 、D β And D γ Respectively representing the distances between the current candidate grey wolf and the optimal three wolfs;
step 11: subjecting all Husky wolfs to Cauchi variationThe disturbance is carried out by the vibration generator,
Figure FDA0003565211100000024
step 12: repeating the step 3 to the step 12 until the maximum iteration number is reached;
step 13: and during kNN testing, extracting test set data by using the obtained optimal characteristic subset, and performing detection analysis.
CN202210321742.9A 2022-03-25 2022-03-25 Intrusion detection feature selection method based on improved wolf optimization algorithm Pending CN114897124A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210321742.9A CN114897124A (en) 2022-03-25 2022-03-25 Intrusion detection feature selection method based on improved wolf optimization algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210321742.9A CN114897124A (en) 2022-03-25 2022-03-25 Intrusion detection feature selection method based on improved wolf optimization algorithm

Publications (1)

Publication Number Publication Date
CN114897124A true CN114897124A (en) 2022-08-12

Family

ID=82714788

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210321742.9A Pending CN114897124A (en) 2022-03-25 2022-03-25 Intrusion detection feature selection method based on improved wolf optimization algorithm

Country Status (1)

Country Link
CN (1) CN114897124A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116886398A (en) * 2023-08-03 2023-10-13 中国石油大学(华东) Internet of things intrusion detection method based on feature selection and integrated learning
CN117354013A (en) * 2023-10-11 2024-01-05 中国电子科技集团公司第三十研究所 Fishing attack detection method based on wolf group hunting algorithm

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116886398A (en) * 2023-08-03 2023-10-13 中国石油大学(华东) Internet of things intrusion detection method based on feature selection and integrated learning
CN116886398B (en) * 2023-08-03 2024-03-29 中国石油大学(华东) Internet of things intrusion detection method based on feature selection and integrated learning
CN117354013A (en) * 2023-10-11 2024-01-05 中国电子科技集团公司第三十研究所 Fishing attack detection method based on wolf group hunting algorithm
CN117354013B (en) * 2023-10-11 2024-04-23 中国电子科技集团公司第三十研究所 Fishing attack detection method based on wolf group hunting algorithm

Similar Documents

Publication Publication Date Title
CN114897124A (en) Intrusion detection feature selection method based on improved wolf optimization algorithm
CN112491796B (en) Intrusion detection and semantic decision tree quantitative interpretation method based on convolutional neural network
CN110070060B (en) Fault diagnosis method for bearing equipment
CN110704840A (en) Convolutional neural network CNN-based malicious software detection method
CN107392919B (en) Adaptive genetic algorithm-based gray threshold acquisition method and image segmentation method
CN113344113B (en) Yolov3 anchor frame determination method based on improved k-means clustering
CN112950445A (en) Compensation-based detection feature selection method in image steganalysis
CN114692156B (en) Memory segment malicious code intrusion detection method, system, storage medium and equipment
CN113098862A (en) Intrusion detection method based on combination of hybrid sampling and expansion convolution
CN114065933B (en) Unknown threat detection method based on artificial immunity thought
CN116484289A (en) Carbon emission abnormal data detection method, terminal and storage medium
CN115987552A (en) Network intrusion detection method based on deep learning
CN113591962B (en) Network attack sample generation method and device
CN108737429B (en) Network intrusion detection method
CN117155701A (en) Network flow intrusion detection method
CN116520795A (en) Industrial control system abnormality detection method based on field opening method
Thanh et al. An approach to reduce data dimension in building effective network intrusion detection systems
CN111581640A (en) Malicious software detection method, device and equipment and storage medium
CN115296837B (en) Sustainable integrated intrusion detection method based on SSA optimization
CN115996135B (en) Industrial Internet malicious behavior real-time detection method based on feature combination optimization
Maru et al. Combining transformer with a discriminator for anomaly detection in multivariate time series
CN117390688B (en) Model inversion method based on supervision training
CN111080727B (en) Color image reconstruction method and device and image classification method and device
CN118298809A (en) Open world fake voice attribution method and system based on soft comparison fake learning
CN116647409A (en) Invasion detection method based on WK-1DCNN-GRU hybrid model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination