CN115296837A - SSA optimization-based sustainable integrated intrusion detection method - Google Patents

SSA optimization-based sustainable integrated intrusion detection method Download PDF

Info

Publication number
CN115296837A
CN115296837A CN202210721435.XA CN202210721435A CN115296837A CN 115296837 A CN115296837 A CN 115296837A CN 202210721435 A CN202210721435 A CN 202210721435A CN 115296837 A CN115296837 A CN 115296837A
Authority
CN
China
Prior art keywords
individual
ssa
intrusion detection
population
detection method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210721435.XA
Other languages
Chinese (zh)
Other versions
CN115296837B (en
Inventor
杨忠君
刘志
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenyang University of Chemical Technology
Original Assignee
Shenyang University of Chemical Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenyang University of Chemical Technology filed Critical Shenyang University of Chemical Technology
Priority to CN202210721435.XA priority Critical patent/CN115296837B/en
Publication of CN115296837A publication Critical patent/CN115296837A/en
Application granted granted Critical
Publication of CN115296837B publication Critical patent/CN115296837B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Security & Cryptography (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computer Hardware Design (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Image Analysis (AREA)

Abstract

A sustainable integrated intrusion detection method based on SSA optimization relates to a network intrusion detection method. The method comprises the following steps: a standard intrusion detection data set is selected as a training set and a test set. Preprocessing the data, and searching the preprocessed data through SSA to obtain a feature subset which maximizes the classification performance of the model. The different models are then trained using a training set containing corresponding feature subsets, and the prediction results are combined by an adaptive integrated decision process. And finally, testing by using the test set. The invention solves the problems that the current network intrusion detection method based on the machine learning model is difficult to classify complex multi-class traffic data and is difficult to obtain the characteristic subset for optimizing the model. The invention can effectively detect complex multi-class flow data, has higher detection precision compared with the traditional intrusion detection method, has the characteristic of sustainable integration, and can continuously integrate a new ML model to optimize the existing model.

Description

SSA optimization-based sustainable integrated intrusion detection method
Technical Field
The invention relates to the field of network intrusion detection, in particular to a sustainable integrated intrusion detection method based on SSA optimization.
Background
With the increasing intelligence and digitization of society, related network intrusion events occur at times, and the property safety of enterprises and individuals is seriously affected. The intrusion detection technology is an active defense technology applied to an intrusion detection system, and can effectively detect network intrusion behaviors by continuously monitoring the flow of a key network link.
The traditional intrusion detection technology is mainly divided into two types: signature-based (misuse) intrusion detection, and anomaly-based intrusion detection. And carrying out pattern matching on the unique features carried by the attack behaviors based on the intrusion detection of the signature so as to judge whether the flow is abnormal or not. However, the method can only detect the currently known attack type, and is easy to cause higher false negative rate. The intrusion detection based on the abnormity judges whether the flow behavior is abnormal or not by establishing a normal behavior model, and the method has the advantages that unknown attacks can be found, but higher false alarm rate is easily caused.
Aiming at the problems existing in the traditional intrusion detection technology, a Machine Learning (ML) technology based on data driving is introduced into the intrusion detection field, and an ML model can directly mine the behavior rules of normal and abnormal flow, so that the problems existing in the traditional intrusion detection technology are solved to a certain extent.
However, a single ML classification model often cannot effectively detect all classes on a multi-classification problem, and an Ensemble Learning (EL) method combining the classification advantages of multiple ML models can effectively alleviate such problems. The idea of EL is to learn multiple models from data, explicitly or implicitly, and combine them efficiently to obtain more reliable and accurate predictions. Training a more reliable and accurate EL model requires two preconditions, namely that the basis classifiers be quasi-distinct, and an efficient integration strategy.
Feature selection can remove redundant and irrelevant features, thereby improving the performance of the base classifier. The sea squirt Algorithm (Salp Swarm Algorithm, SSA) is a group optimization Algorithm and is widely applied to the field of feature selection and the field of engineering optimization. Weighted hard voting is a simple and effective heterogeneous classifier integration strategy, and the weights after careful calibration are often more competitive compared with other integration strategies.
Disclosure of Invention
The invention relates to a sustainable integrated intrusion detection method based on SSA optimization. And then training corresponding machine learning models by using different optimal feature subsets, finally integrating the prediction results of a plurality of machine learning models in a multi-class weighted hard voting mode, and optimizing the corresponding voting weights by SSA (simple steady state analysis) so as to effectively combine the classification advantages of different ML (maximum likelihood) models and further obtain more accurate and reliable prediction results. In addition, the method has the characteristic of sustainable integration, and new ML models can be continuously integrated to optimize the existing models.
The technical scheme of the invention is as follows:
a sustainable integrated intrusion detection method based on SSA optimization, the method comprising the steps of:
step (1): inputting a reference data set; taking an NSL-KDD data set as an example, the data set comprises normal communication traffic and four different types of attack traffic, namely Dos, probe, U2R and R2L;
step (2): preprocessing a data set; the method comprises three parts of data cleaning, feature coding and data normalization; cleaning data, namely removing repeated samples in the reference data set and samples containing missing values and abnormal values; feature coding is to encode character type discrete features in a reference data set into digital features so as to introduce a subsequent machine learning model; normalizing the data, namely eliminating dimension difference between the features;
and (3): selecting characteristics; searching optimal feature subsets corresponding to different ML models, namely feature subsets with optimal fitness values, through a SSA-based packaged feature selection algorithm;
and (4): classifying the models; training a plurality of heterogeneous machine learning classification models by using the reference data set after feature selection;
and (5): self-adaptive integrated decision making; the predictions of multiple ML models are integrated by way of multi-class weighted hard voting, with the corresponding voting weights determined and optimized by an SSA-based weight optimization algorithm.
The sustainable integration intrusion detection method based on SSA optimization, wherein the reference data set used in the step (1) is as follows: the original NSL-KDD data set contains 148517 samples in total, 30% of the samples are extracted for testing according to the layering idea, and the rest 70% of the samples are used for training, so that the proportion of the samples of different classes in the training set is consistent with that in the testing set.
The sustainable integration intrusion detection method based on SSA optimization is characterized in that in the feature coding part of step (2): three discrete character type characteristics exist in an original NSL-KDD data set, wherein the three discrete character type characteristics are respectively 'protocol-type', 'service' and 'flag', the 'protocol-type' has 3 states, the 'service' has 70 states, and the 'flag' has 11 states; adopting single hot coding for the 'protocol-type' characteristic, and expanding the characteristic into a three-dimensional characteristic; for the 'service' and 'flag' features with more states, replacing the corresponding states by the frequency counts of the states; the encoded data set contains 43-dimensional features in total.
The sustainable integration intrusion detection method based on SSA optimization, wherein in the data normalization part of the step (2): data was scaled to interval [0,1] using a minimum-maximum function, with the specific normalization:
Figure 423861DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 357182DEST_PATH_IMAGE002
a characteristic value representing the characteristic of the sample,
Figure 963744DEST_PATH_IMAGE003
and
Figure 222687DEST_PATH_IMAGE004
respectively representing the maximum and minimum values of the feature,
Figure 898388DEST_PATH_IMAGE005
representing the normalized eigenvalues.
The sustainable integrated intrusion detection method based on the SSA optimization comprises the following steps of (3) modeling the SSA-based packaged feature selection algorithm:
(1) Setting a fitness function:
Figure 900979DEST_PATH_IMAGE006
wherein acc and F1 respectively represent the overall accuracy mean value and the weighted F1 score mean value of the model in 5-fold cross validation on the training set;
(2) Setting parameters; setting the population number to be 30, the maximum iteration number to be 200, the upper search limit to be 1 and the lower search limit to be 0;
(3) Initializing a population; randomly initializing the position of the individual goblet sea squirt in the population within the search limit;
(4) Position coding; binary coding is carried out on the position of each individual in the goblet sea squirt population so as to adapt to the problem of feature selection; where 1 indicates that the feature is selected and 0 indicates that the feature is not selected. The specific coding formula is as follows:
Figure 362047DEST_PATH_IMAGE007
note that the encoding here is only for calculating the fitness value, and the position of individual goblet sea squirt in the population will not change;
(5) Determining a food location; calculating the fitness value of each individual goblet ascidian, determining the goblet ascidian individual with the maximum fitness value, and setting the position as the food position;
(6) Searching a population; respectively updating the individual positions of the leader and the follower according to a population updating formula; in the goblet sea squirt population, the first individual is taken as a leader, and the position updating formula is as follows:
Figure 211798DEST_PATH_IMAGE008
wherein the content of the first and second substances,
Figure 391107DEST_PATH_IMAGE009
the first to represent the leader
Figure 931809DEST_PATH_IMAGE010
The position of the dimension(s) is,
Figure 496652DEST_PATH_IMAGE011
to indicate food
Figure 97397DEST_PATH_IMAGE010
The position of the dimension(s) is,
Figure 498423DEST_PATH_IMAGE012
and
Figure 842817DEST_PATH_IMAGE013
are respectively the first
Figure 29210DEST_PATH_IMAGE010
Upper and lower bounds of the dimension decision variables;
Figure 535277DEST_PATH_IMAGE014
Figure 689178DEST_PATH_IMAGE015
is that
Figure 571683DEST_PATH_IMAGE016
A random number in between, and a random number,
Figure 111118DEST_PATH_IMAGE017
is a convergence factor of the algorithm, plays a role in balancing global exploration and local development, and has an expression of
Figure 788087DEST_PATH_IMAGE018
In the formula (I), wherein,
Figure 163705DEST_PATH_IMAGE019
and
Figure 849901DEST_PATH_IMAGE020
respectively representing the current iteration times and the maximum iteration times;
the other individuals are used as followers, and the position updating formula is as follows:
Figure 736518DEST_PATH_IMAGE021
wherein the content of the first and second substances,
Figure 849967DEST_PATH_IMAGE022
indicating the updated position of the individual and,
Figure 712881DEST_PATH_IMAGE023
is indicative of the current location of the individual,
Figure 937189DEST_PATH_IMAGE024
indicating the location of the previous individual;
(7) Repeating (4) - (6) until a maximum number of iterations is reached.
According to the SSA optimization-based sustainable integrated intrusion detection method, the model classification part in the step (4) is associated with feature selection, and an SSA-based feature selection algorithm can select corresponding optimal feature subsets for different machine learning models.
According to the sustainable integration intrusion detection method based on SSA optimization, the model classification part in the step (4) can integrate multiple different ML models at the same time, a new ML model can be added on the basis of the original model to optimize the classification performance of the existing model, the classification can be realized only by selecting a corresponding optimal feature subset for the new ML model and further optimizing voting weight, and certain universality and expandability are achieved.
The sustainable integrated intrusion detection method based on SSA optimization, wherein in step (5): the adaptive integrated decision making process combines predictions of multiple ML models in a multi-class weighted hard voting manner; the specific decision making process is as follows:
suppose there is
Figure 451216DEST_PATH_IMAGE025
A different base classifier
Figure DEST_PATH_IMAGE027
The reference data set has
Figure 673250DEST_PATH_IMAGE028
Individual category label
Figure 820197DEST_PATH_IMAGE030
Then the weight matrix can be represented as
Figure 536612DEST_PATH_IMAGE032
Wherein
Figure 718194DEST_PATH_IMAGE034
Figure 111130DEST_PATH_IMAGE036
Figure 10953DEST_PATH_IMAGE038
For a certain sample
Figure 764014DEST_PATH_IMAGE039
Class of
Figure 800103DEST_PATH_IMAGE040
The weighted probability is output as
Figure DEST_PATH_IMAGE041
Wherein
Figure 363939DEST_PATH_IMAGE042
Indicating weighted sum of
Figure 170965DEST_PATH_IMAGE043
The probability of a particular class of the object,
Figure 540766DEST_PATH_IMAGE044
denotes the first
Figure 634624DEST_PATH_IMAGE045
Individual base classifier for classes
Figure 431679DEST_PATH_IMAGE043
Predicting; the integrated probability prediction for all base classifiers can be represented as one
Figure 227466DEST_PATH_IMAGE046
Dimension vector
Figure 135379DEST_PATH_IMAGE048
(ii) a The final decision can be expressed as
Figure 83743DEST_PATH_IMAGE049
The sustainable integrated intrusion detection method based on SSA optimization, wherein the modeling process of the weight optimization algorithm based on SSA in step (5) is as follows:
a. setting a fitness function:
Figure 51699DEST_PATH_IMAGE050
acc represents the average value of the overall accuracy of the model in 5-fold cross validation on the training set;
b. setting parameters; setting the population quantity to be 30, the maximum iteration number to be 200, the upper search boundary to be 1 and the lower search boundary to be 0;
c. initializing a population; randomly initializing the positions of the goblet and sea squirt individuals in the population within the search limit, wherein the number of the position vector elements represented by each goblet and sea squirt individual is equal to the number of the one-dimensional vector elements generated by the weight matrix according to column extension;
d. determining a food location; calculating the individual fitness value of all goblet ascidians, and determining the goblet ascidian individual position with the maximum fitness value as the food position;
e. searching a population; respectively updating the individual positions of the leader and the follower according to a population updating formula; in the goblet ascidian population, the first individual is used as a leader, and the position updating formula is as follows:
Figure 836247DEST_PATH_IMAGE051
wherein the content of the first and second substances,
Figure 547851DEST_PATH_IMAGE052
first to represent leader
Figure 616301DEST_PATH_IMAGE053
The position of the dimension is measured,
Figure 755158DEST_PATH_IMAGE054
to indicate the first of food
Figure 259958DEST_PATH_IMAGE053
The position of the dimension is measured,
Figure 775253DEST_PATH_IMAGE055
and
Figure 698209DEST_PATH_IMAGE056
are respectively the first
Figure 7968DEST_PATH_IMAGE053
Upper and lower bounds of the dimension decision variables;
Figure 504458DEST_PATH_IMAGE057
Figure 557865DEST_PATH_IMAGE058
is that
Figure 335328DEST_PATH_IMAGE059
A random number in between, and a random number,
Figure 815988DEST_PATH_IMAGE060
is a convergence factor of the algorithm, plays a role in balancing global exploration and local development, and has an expression of
Figure 560959DEST_PATH_IMAGE061
In the formula (I), the reaction is carried out,
Figure 152477DEST_PATH_IMAGE062
and
Figure 50026DEST_PATH_IMAGE063
respectively representing the current iteration times and the maximum iteration times;
the other individuals are used as followers, and the position updating formula is as follows:
Figure 701587DEST_PATH_IMAGE064
wherein the content of the first and second substances,
Figure 169740DEST_PATH_IMAGE065
indicating the updated position of the individual and,
Figure 564949DEST_PATH_IMAGE066
which is indicative of the current location of the individual,
Figure 317005DEST_PATH_IMAGE067
indicating the location of the previous individual;
f. repeating d-e until a maximum number of iterations is reached.
The invention has the following beneficial effects:
according to the sustainable integration intrusion detection method based on SSA optimization, redundant and irrelevant features in original data are removed through packaged feature selection based on an SSA algorithm, the classification performance of a single ML model is enhanced, then decisions of multiple ML models are integrated in a multi-class weighting hard voting mode, voting weights are continuously optimized through a weight optimization algorithm based on SSA, the classification advantages of different models are fully combined, and finally the overall classification performance of an intrusion detection model is effectively improved. The method also provides an effective implementation mode for different ML models, and various novel ML models can be continuously integrated into the intrusion detection model, so that the intrusion detection model is continuously optimized to improve the overall detection performance.
Drawings
FIG. 1 is a block diagram of an overall modeling flow of an embodiment of the present invention;
FIG. 2 is a flow chart of an embodiment of an SSA-based packaged feature selection process;
FIG. 3 is a pseudo-code diagram of an SSA-based packed feature selection algorithm according to an embodiment of the present invention;
FIG. 4 is a schematic diagram illustrating multi-class weighted voting modeling according to an embodiment of the present invention;
fig. 5 is a pseudo code diagram of an SSA-based weight optimization algorithm according to an embodiment of the present invention.
Detailed Description
The present invention will be further described with reference to the accompanying drawings so that those skilled in the art can refer thereto and implement the same.
The invention provides a sustainable integrated intrusion detection method based on SSA optimization, which comprises the following steps:
in the step (1): selecting a public intrusion detection data set NSL-KDD as an evaluation sample, wherein the data set comprises 148517 sample data in total, extracting 30% of samples from the data set for testing according to a layering idea, and using the rest 70% of samples for training, so as to ensure that the proportion of different types of samples in a training set is consistent with that in a testing set.
In the step (2): three discrete character type characteristics exist in an original NSL-KDD data set, namely 'protocol-type', 'service' and 'flag', and the 'protocol-type' characteristic is subjected to independent thermal coding and expanded into a three-dimensional characteristic. For the 'service' and 'flag' features with more states, the frequency count of the state is used to replace the corresponding state. The encoded data set contains 43-dimensional features in total. Secondly, in order to eliminate dimension difference between different characteristics, normalization processing is carried out on the data, values of all the characteristics are scaled to an interval [0,1], and a specific normalization formula is as follows:
Figure 139467DEST_PATH_IMAGE068
wherein the content of the first and second substances,
Figure 593451DEST_PATH_IMAGE069
a characteristic value representing the characteristic of the sample,
Figure 792351DEST_PATH_IMAGE070
and
Figure 398913DEST_PATH_IMAGE071
respectively representing the maximum and minimum values of the feature,
Figure 392277DEST_PATH_IMAGE072
representing the normalized eigenvalues.
In the step (3), the SSA-based packaging type feature selection algorithm is used for searching the optimal feature subsets corresponding to different machine learning models, and the specific SSA-based packaging type feature selection algorithm comprises the following steps:
(1) Setting a fitness function:
Figure 832092DEST_PATH_IMAGE073
wherein acc and F1 respectively represent the overall accuracy mean value and the F1 score mean value of 5-fold cross validation of the model on the training set;
(2) And setting parameters. Setting the population number to be 30, the maximum iteration number to be 200, the upper search limit to be 1 and the lower search limit to be 0;
(3) And (4) initializing a population. Randomly initializing the position of the individual of the goblet sea squirt in the population within the search limits.
(4) And (4) position coding. The location of each individual in the cask ascidian population is binary coded to accommodate the feature selection problem. Where 1 indicates that a feature is selected and 0 indicates that a feature is not selected. The specific coding formula is as follows:
Figure 834683DEST_PATH_IMAGE074
note that the encoding here is only for calculating the fitness value, and the position of individual casoderma in the population will not change
(5) The food location is determined. Calculating the fitness value of each goblet ascidian individual, determining the goblet ascidian individual with the maximum fitness value, and setting the position as the food position.
(6) And (4) searching the population. And respectively updating the individual positions of the leader and the follower according to a population updating formula. In the goblet sea squirt population, the first individual is taken as a leader, and the position updating formula is as follows:
Figure 295752DEST_PATH_IMAGE075
wherein, the first and the second end of the pipe are connected with each other,
Figure 460017DEST_PATH_IMAGE076
the first to represent the leader
Figure 888593DEST_PATH_IMAGE053
The position of the dimension(s) is,
Figure 429296DEST_PATH_IMAGE077
to indicate food
Figure 744870DEST_PATH_IMAGE053
The position of the dimension(s) is,
Figure 345616DEST_PATH_IMAGE078
and
Figure 497374DEST_PATH_IMAGE079
are respectively the first
Figure 841768DEST_PATH_IMAGE053
The upper and lower bounds of the dimension decision variables.
Figure 11849DEST_PATH_IMAGE080
Figure 783496DEST_PATH_IMAGE081
Is that
Figure 921085DEST_PATH_IMAGE082
A random number in between, and a random number,
Figure 803590DEST_PATH_IMAGE083
is a convergence factor of the algorithm, plays a role in balancing global exploration and local development, and has an expression of
Figure 93757DEST_PATH_IMAGE084
In the formula (I), wherein,
Figure 36306DEST_PATH_IMAGE085
and
Figure 989804DEST_PATH_IMAGE086
respectively representing the current iteration number and the maximum iteration number.
The other individuals are used as followers, and the position updating formula is as follows:
Figure 676000DEST_PATH_IMAGE087
wherein, the first and the second end of the pipe are connected with each other,
Figure 820674DEST_PATH_IMAGE088
indicating the updated position of the individual and,
Figure 934123DEST_PATH_IMAGE089
which is indicative of the current location of the individual,
Figure 46305DEST_PATH_IMAGE090
indicating the location of the previous individual of the cask ascidian.
(7) Repeating (4) - (6) until a maximum number of iterations is reached.
In the step (4): the SSA-based packed feature selection algorithm is first used to search for optimal feature subsets corresponding to different ML models, and then the different ML models are trained and evaluated using a training set that contains only the optimal feature subsets.
In the step (5): the adaptive integrated decision process combines predictions of multiple ML models in a multi-class weighted hard vote, with corresponding vote weights determined and optimized by an SSA-based weight optimization algorithm. The specific decision making process is as follows:
suppose there is
Figure 208296DEST_PATH_IMAGE091
A different base classifier
Figure 535372DEST_PATH_IMAGE093
The reference data set has
Figure 242559DEST_PATH_IMAGE094
Individual category label
Figure 655086DEST_PATH_IMAGE096
Then the weight matrix can be represented as
Figure 620768DEST_PATH_IMAGE098
Wherein
Figure 802350DEST_PATH_IMAGE100
For a certain sample
Figure 444553DEST_PATH_IMAGE101
Class of
Figure 78797DEST_PATH_IMAGE102
The weighted probability is output as
Figure 582591DEST_PATH_IMAGE103
Wherein
Figure 884259DEST_PATH_IMAGE104
Indicating weighted sum of
Figure 195898DEST_PATH_IMAGE105
The probability of an individual class of the object,
Figure 317438DEST_PATH_IMAGE106
denotes the first
Figure 624923DEST_PATH_IMAGE107
Individual base classifier for classes
Figure 781097DEST_PATH_IMAGE105
The prediction of (a) is performed,
Figure 765103DEST_PATH_IMAGE108
voting a weight for it. The integrated probability prediction for all base classifiers can be represented as one
Figure 373939DEST_PATH_IMAGE094
Dimension vector
Figure 219535DEST_PATH_IMAGE048
. The final decision can be expressed as
Figure 230216DEST_PATH_IMAGE109
The weight matrix in the weighted hard voting process is determined and optimized through a weight optimization algorithm based on SSA, and the specific modeling process is as follows:
a. setting a fitness function:
Figure 886588DEST_PATH_IMAGE110
acc represents the average value of the overall accuracy of the model in 5-fold cross validation on the training set;
b. and setting parameters. Setting the population number to be 30, the maximum iteration number to be 200, the upper search limit to be 1 and the lower search limit to be 0;
c. and (5) initializing a population. And randomly initializing the positions of the goblet and sea squirt individuals in the population within the search limit, wherein the number of the position vector elements represented by each goblet and sea squirt individual is equal to the number of the one-dimensional vector elements generated by column extension of the weight matrix.
d. The food location is determined. Calculating the individual fitness value of all the goblet ascidians, and determining the individual position of the goblet ascidian with the maximum fitness value as the food position.
e. And (5) searching the population. And respectively updating the individual positions of the leader and the follower according to a population updating formula. In the goblet sea squirt population, the first individual is taken as a leader, and the position updating formula is as follows:
Figure 982720DEST_PATH_IMAGE111
wherein the content of the first and second substances,
Figure 632007DEST_PATH_IMAGE112
first to represent leader
Figure 497195DEST_PATH_IMAGE105
The position of the dimension(s) is,
Figure 823003DEST_PATH_IMAGE113
to indicate food
Figure 406431DEST_PATH_IMAGE105
The position of the dimension is measured,
Figure 859409DEST_PATH_IMAGE078
and
Figure 579103DEST_PATH_IMAGE079
are respectively the first
Figure 580207DEST_PATH_IMAGE105
The upper and lower bounds of the dimension decision variables.
Figure 650931DEST_PATH_IMAGE080
Figure 642021DEST_PATH_IMAGE081
Is that
Figure 481801DEST_PATH_IMAGE082
A random number in between, and a random number,
Figure 149412DEST_PATH_IMAGE083
is a convergence factor of the algorithm, plays a role in balancing global exploration and local development, and has an expression of
Figure 707432DEST_PATH_IMAGE114
In the formula (I), the reaction is carried out,
Figure DEST_PATH_IMAGE115
and
Figure 236633DEST_PATH_IMAGE116
respectively representing the current iteration number and the maximum iteration number.
The other individuals are used as followers, and the position updating formula is as follows:
Figure DEST_PATH_IMAGE117
wherein the content of the first and second substances,
Figure 557018DEST_PATH_IMAGE118
indicating the updated position of the individual and,
Figure DEST_PATH_IMAGE119
which is indicative of the current location of the individual,
Figure 661110DEST_PATH_IMAGE120
indicating the location of the previous individual of the cask ascidian.
f. Repeating d-e until a maximum number of iterations is reached.
In order to verify the beneficial effects of the method, three machine learning models, namely a Decision Tree (DT), a Random Forest (RF) and an eXtreme Gradient Boosting (XGboost) with default parameters are selected to realize the method, then indexes such as accuracy, an F1 score, detection time and the like are used for evaluation, and finally the method is compared with a Particle Swarm Optimization (PSO) algorithm and a Grey Wolf Optimization (GHO) algorithm.
TABLE 1 comparison of Performance of different optimization algorithms on the NSL-KDD test set
Figure DEST_PATH_IMAGE121
As shown in table 1, the accuracy and F1 score of the ML model can be effectively improved by applying the group optimization algorithm to the feature selection process, wherein the SSA according to the present invention obtains the highest accuracy and F1 score, which are better than PSO and GWO. After adaptive voting, the method effectively combines the classification advantages of different ML models, and obtains higher accuracy and F1 score. In terms of detection time, the detection time of the method is also reduced by more than 30% compared with the other two methods.
According to the sustainable integration intrusion detection method based on SSA optimization, firstly, SSA is utilized to independently select the optimal feature subset for different ML models, and then the classification performance of a base classifier is enhanced. And then, the classification advantages of different ML models are combined through self-adaptive decision-making, and finally, the classification performance of the intrusion detection model is effectively improved.
The foregoing description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, and modifications and variations of the present invention are possible for those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the technical scheme and the conception of the invention shall be included in the protection scope of the invention.

Claims (9)

1. A sustainable integrated intrusion detection method based on SSA optimization, characterized by comprising the following steps:
step (1): inputting a reference data set; taking an NSL-KDD data set as an example, the data set comprises normal communication traffic and four different types of attack traffic, namely Dos, probe, U2R and R2L;
step (2): preprocessing a data set; the method comprises three parts of data cleaning, feature coding and data normalization; cleaning data, namely removing repeated samples in the reference data set and samples containing missing values and abnormal values; feature encoding, namely encoding character type discrete features in a reference data set into digital features so as to introduce a subsequent machine learning model; normalizing the data, namely eliminating dimension difference between the features;
and (3): selecting characteristics; searching optimal feature subsets corresponding to different ML models, namely feature subsets with optimal fitness values, through a SSA-based packaged feature selection algorithm;
and (4): classifying the models; training a plurality of heterogeneous machine learning classification models by using the reference data set after feature selection;
and (5): self-adaptive integrated decision making; the predictions of multiple ML models are integrated by way of multi-class weighted hard voting, with the corresponding voting weights determined and optimized by an SSA-based weight optimization algorithm.
2. The SSA-optimization-based sustainable integrated intrusion detection method according to claim 1, wherein the step (1) uses a benchmark dataset comprising: the original NSL-KDD data set contains 148517 samples in total, 30% of the samples are extracted for testing according to the layering idea, and the rest 70% of the samples are used for training, so that the proportion of the samples of different classes in the training set is consistent with that in the testing set.
3. The SSA-optimized sustainable integrated intrusion detection method according to claim 1, wherein the signature coding part of step (2): three discrete character type characteristics exist in an original NSL-KDD data set, wherein the three discrete character type characteristics are respectively 'protocol-type', 'service' and 'flag', the 'protocol-type' has 3 states, the 'service' has 70 states, and the 'flag' has 11 states; adopting single hot coding for the 'protocol-type' characteristic, and expanding the characteristic into a three-dimensional characteristic; for the 'service' and 'flag' features with more states, replacing the corresponding states by the frequency counts of the states; the encoded data set contains 43-dimensional features in total.
4. The SSA optimization-based sustainable integrated intrusion detection method of claim 1, wherein in the data normalization part of step (2): the data is scaled to the interval [0,1] using a minimum-maximum function, with the specific normalization:
Figure 97646DEST_PATH_IMAGE001
wherein, the first and the second end of the pipe are connected with each other,
Figure 679806DEST_PATH_IMAGE002
a characteristic value representing the characteristic of the sample,
Figure 494178DEST_PATH_IMAGE003
and
Figure 254324DEST_PATH_IMAGE004
respectively representing the maximum and minimum values of the feature,
Figure 521357DEST_PATH_IMAGE005
representing the normalized eigenvalues.
5. The SSA-based optimized sustainable integration intrusion detection method according to claim 1, wherein the SSA-based packed feature selection algorithm modeling process of step (3) is:
(1) Setting a fitness function:
Figure 405743DEST_PATH_IMAGE006
wherein acc and F1 respectively represent the overall accuracy mean value and the weighted F1 score mean value of the model in 5-fold cross validation on the training set;
(2) Setting parameters; setting the population number to be 30, the maximum iteration number to be 200, the upper search limit to be 1 and the lower search limit to be 0;
(3) Initializing a population; randomly initializing the position of the individual goblet sea squirt in the population within the search limit;
(4) Position coding; binary coding the position of each individual in the goblet sea squirt population to adapt to the problem of feature selection; wherein 1 indicates that the feature is selected, and 0 indicates that the feature is not selected; the specific coding formula is as follows:
Figure 12305DEST_PATH_IMAGE007
note that the encoding here is only for calculating fitness values, and the location of individual casuia haichoides in the population will not change;
(5) Determining a food location; calculating the fitness value of each individual goblet ascidian, determining the goblet ascidian individual with the maximum fitness value, and setting the position as the food position;
(6) Searching a population; respectively updating the individual positions of the leader and the follower according to a population updating formula; in the goblet sea squirt population, the first individual is taken as a leader, and the position updating formula is as follows:
Figure 5669DEST_PATH_IMAGE008
wherein the content of the first and second substances,
Figure 946949DEST_PATH_IMAGE009
the first to represent the leader
Figure 683961DEST_PATH_IMAGE010
The position of the dimension is measured,
Figure 145029DEST_PATH_IMAGE011
to indicate the first of food
Figure 574873DEST_PATH_IMAGE010
The position of the dimension is measured,
Figure 504914DEST_PATH_IMAGE012
and
Figure 780038DEST_PATH_IMAGE013
are respectively the first
Figure 361192DEST_PATH_IMAGE010
Upper and lower bounds of the dimension decision variables;
Figure 696358DEST_PATH_IMAGE014
Figure 612231DEST_PATH_IMAGE015
is that
Figure 628728DEST_PATH_IMAGE016
A random number in between, and a random number,
Figure 126706DEST_PATH_IMAGE017
is a convergence factor of the algorithm, plays a role in balancing global exploration and local development, and has an expression of
Figure 577979DEST_PATH_IMAGE018
In the formula (I), the reaction is carried out,
Figure 528617DEST_PATH_IMAGE019
and
Figure 348806DEST_PATH_IMAGE020
respectively representing the current iteration times and the maximum iteration times;
the other individuals are used as followers, and the position updating formula is as follows:
Figure 701290DEST_PATH_IMAGE021
wherein the content of the first and second substances,
Figure 830789DEST_PATH_IMAGE022
indicating the updated position of the individual and,
Figure 206406DEST_PATH_IMAGE023
is indicative of the current location of the individual,
Figure 627023DEST_PATH_IMAGE024
indicating the location of the previous individual;
(7) Repeating (4) - (6) until a maximum number of iterations is reached.
6. The SSA optimization-based sustainable integrated intrusion detection method according to claim 1, wherein the model classification part of step (4) is associated with feature selection, and the SSA-based feature selection algorithm can select corresponding optimal feature subsets for different machine learning models.
7. The SSA optimization-based sustainable integrated intrusion detection method according to claim 1, wherein the model classification part in step (4) can integrate a plurality of different ML models simultaneously, and can also add a new ML model on the basis of the original model to optimize the existing model classification performance, and the model classification can be realized by only selecting a corresponding optimal feature subset for the new ML model and further optimizing voting weight, and has certain universality and expandability.
8. The SSA-optimized sustainable integrated intrusion detection method according to claim 1, wherein in the step (5): the adaptive integrated decision making process combines predictions of multiple ML models in a multi-class weighted hard voting manner; the specific decision making process is as follows:
suppose there is
Figure 788009DEST_PATH_IMAGE025
A different base classifier
Figure 901458DEST_PATH_IMAGE026
The reference data set has
Figure 764372DEST_PATH_IMAGE027
Individual category label
Figure 175631DEST_PATH_IMAGE028
Then the weight matrix can be represented as
Figure DEST_PATH_IMAGE029
Wherein
Figure 860296DEST_PATH_IMAGE030
,
Figure DEST_PATH_IMAGE031
,
Figure 82330DEST_PATH_IMAGE032
For a certain sample
Figure DEST_PATH_IMAGE033
Class of
Figure 681808DEST_PATH_IMAGE034
The weighted probability is output as
Figure DEST_PATH_IMAGE035
Wherein
Figure 647490DEST_PATH_IMAGE036
Indicating weighted sum of
Figure DEST_PATH_IMAGE037
The probability of an individual class of the object,
Figure 517488DEST_PATH_IMAGE038
denotes the first
Figure 644844DEST_PATH_IMAGE039
Individual base classifier for classes
Figure 544667DEST_PATH_IMAGE037
Predicting; the integrated probability prediction for all base classifiers can be represented as one
Figure 297728DEST_PATH_IMAGE040
Dimension vector
Figure 599396DEST_PATH_IMAGE042
(ii) a The final decision can be expressed as
Figure 897654DEST_PATH_IMAGE043
9. The sustainable integrated intrusion detection method based on SSA optimization according to claim 1, wherein the modeling process of the SSA-based weight optimization algorithm in the step (5) is:
a. setting a fitness function:
Figure 284773DEST_PATH_IMAGE044
acc represents the average value of the overall accuracy of the model in 5-fold cross validation on the training set;
b. setting parameters; setting the population number to be 30, the maximum iteration number to be 200, the upper search limit to be 1 and the lower search limit to be 0;
c. initializing a population; randomly initializing the positions of the goblet and sea squirt individuals in the population within the search limit, wherein the number of the position vector elements represented by each goblet and sea squirt individual is equal to the number of the one-dimensional vector elements generated by the weight matrix according to column extension;
d. determining a food location; calculating the individual fitness value of all goblet ascidians, and determining the goblet ascidian individual position with the maximum fitness value as the food position;
e. searching the population; respectively updating the individual positions of the leader and the follower according to a population updating formula; in the goblet sea squirt population, the first individual is taken as a leader, and the position updating formula is as follows:
Figure 345919DEST_PATH_IMAGE045
wherein the content of the first and second substances,
Figure 236515DEST_PATH_IMAGE046
the first to represent the leader
Figure 971253DEST_PATH_IMAGE047
The position of the dimension(s) is,
Figure 580089DEST_PATH_IMAGE048
to indicate food
Figure 940532DEST_PATH_IMAGE049
The position of the dimension(s) is,
Figure 951213DEST_PATH_IMAGE050
and
Figure 856852DEST_PATH_IMAGE051
are respectively the first
Figure 641400DEST_PATH_IMAGE049
Upper and lower bounds of the dimension decision variables;
Figure 353004DEST_PATH_IMAGE052
Figure 155875DEST_PATH_IMAGE053
is that
Figure 294732DEST_PATH_IMAGE054
A random number in between, and a random number,
Figure 2794DEST_PATH_IMAGE055
is a convergence factor of the algorithm, plays a role in balancing global exploration and local development, and has an expression of
Figure 252510DEST_PATH_IMAGE056
In the formula (I), wherein,
Figure 923269DEST_PATH_IMAGE057
and
Figure 233028DEST_PATH_IMAGE058
respectively representing the current iteration times and the maximum iteration times;
the other individuals are used as followers, and the position updating formula is as follows:
Figure 241435DEST_PATH_IMAGE059
wherein, the first and the second end of the pipe are connected with each other,
Figure 481792DEST_PATH_IMAGE060
indicating the updated position of the individual and,
Figure 321572DEST_PATH_IMAGE061
which is indicative of the current location of the individual,
Figure 739915DEST_PATH_IMAGE062
indicating the location of the previous individual;
f. repeating d-e until a maximum number of iterations is reached.
CN202210721435.XA 2022-06-24 2022-06-24 Sustainable integrated intrusion detection method based on SSA optimization Active CN115296837B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210721435.XA CN115296837B (en) 2022-06-24 2022-06-24 Sustainable integrated intrusion detection method based on SSA optimization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210721435.XA CN115296837B (en) 2022-06-24 2022-06-24 Sustainable integrated intrusion detection method based on SSA optimization

Publications (2)

Publication Number Publication Date
CN115296837A true CN115296837A (en) 2022-11-04
CN115296837B CN115296837B (en) 2023-09-15

Family

ID=83821233

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210721435.XA Active CN115296837B (en) 2022-06-24 2022-06-24 Sustainable integrated intrusion detection method based on SSA optimization

Country Status (1)

Country Link
CN (1) CN115296837B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105512686A (en) * 2015-12-14 2016-04-20 深圳大学 Integrated feature selection method and system
CN112511519A (en) * 2020-11-20 2021-03-16 华北电力大学 Network intrusion detection method based on feature selection algorithm
CN113839926A (en) * 2021-08-31 2021-12-24 哈尔滨工业大学 Intrusion detection system modeling method, system and device based on gray wolf algorithm feature selection
CN114244549A (en) * 2021-08-10 2022-03-25 和安科技创新有限公司 GSSK-means abnormal flow detection method, memory and processor for industrial internet

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105512686A (en) * 2015-12-14 2016-04-20 深圳大学 Integrated feature selection method and system
CN112511519A (en) * 2020-11-20 2021-03-16 华北电力大学 Network intrusion detection method based on feature selection algorithm
CN114244549A (en) * 2021-08-10 2022-03-25 和安科技创新有限公司 GSSK-means abnormal flow detection method, memory and processor for industrial internet
CN113839926A (en) * 2021-08-31 2021-12-24 哈尔滨工业大学 Intrusion detection system modeling method, system and device based on gray wolf algorithm feature selection

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ALANOUD ALSALEH等: "The Influence of Salp Swarm Algorithm-Based Feature Selection on Network Anomaly Intrusion Detection", 《IEEE ACCESS》, vol. 9 *

Also Published As

Publication number Publication date
CN115296837B (en) 2023-09-15

Similar Documents

Publication Publication Date Title
CN109408389B (en) Code defect detection method and device based on deep learning
CN111753881B (en) Concept sensitivity-based quantitative recognition defending method against attacks
CN110704840A (en) Convolutional neural network CNN-based malicious software detection method
CN110070141A (en) A kind of network inbreak detection method
CN111783442A (en) Intrusion detection method, device, server and storage medium
CN112039903B (en) Network security situation assessment method based on deep self-coding neural network model
CN113297572B (en) Deep learning sample-level anti-attack defense method and device based on neuron activation mode
CN111726349B (en) GRU parallel network flow abnormity detection method based on GA optimization
CN111143838B (en) Database user abnormal behavior detection method
CN112437053B (en) Intrusion detection method and device
CN112215278B (en) Multi-dimensional data feature selection method combining genetic algorithm and dragonfly algorithm
CN111400713B (en) Malicious software population classification method based on operation code adjacency graph characteristics
CN112950445A (en) Compensation-based detection feature selection method in image steganalysis
CN115577357A (en) Android malicious software detection method based on stacking integration technology
CN116318928A (en) Malicious traffic identification method and system based on data enhancement and feature fusion
CN114897124A (en) Intrusion detection feature selection method based on improved wolf optimization algorithm
CN117278314A (en) DDoS attack detection method
CN113283901A (en) Byte code-based fraud contract detection method for block chain platform
CN111737688B (en) Attack defense system based on user portrait
CN113098862A (en) Intrusion detection method based on combination of hybrid sampling and expansion convolution
CN115296837A (en) SSA optimization-based sustainable integrated intrusion detection method
CN113591962B (en) Network attack sample generation method and device
CN114528908A (en) Network request data classification model training method, classification method and storage medium
CN113269217A (en) Radar target classification method based on Fisher criterion
KR20200067713A (en) System and method for detecting of Incorrect Triple

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant