CN115175191A - Mixed model abnormal flow detection system and method based on ELM and deep forest - Google Patents
Mixed model abnormal flow detection system and method based on ELM and deep forest Download PDFInfo
- Publication number
- CN115175191A CN115175191A CN202210783769.XA CN202210783769A CN115175191A CN 115175191 A CN115175191 A CN 115175191A CN 202210783769 A CN202210783769 A CN 202210783769A CN 115175191 A CN115175191 A CN 115175191A
- Authority
- CN
- China
- Prior art keywords
- data
- model
- elm
- deep forest
- flow
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W12/00—Security arrangements; Authentication; Protecting privacy or anonymity
- H04W12/12—Detection or prevention of fraud
- H04W12/121—Wireless intrusion detection systems [WIDS]; Wireless intrusion prevention systems [WIPS]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/10—Flow control; Congestion control
- H04L47/24—Traffic characterised by specific attributes, e.g. priority or QoS
- H04L47/2441—Traffic characterised by specific attributes, e.g. priority or QoS relying on flow classification, e.g. using integrated services [IntServ]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1416—Event detection, e.g. attack signature detection
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1425—Traffic logging, e.g. anomaly detection
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W12/00—Security arrangements; Authentication; Protecting privacy or anonymity
- H04W12/009—Security arrangements; Authentication; Protecting privacy or anonymity specially adapted for networks, e.g. wireless sensor networks, ad-hoc networks, RFID networks or cloud networks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L2209/00—Additional information or applications relating to cryptographic mechanisms or cryptographic arrangements for secret or secure communication H04L9/00
- H04L2209/80—Wireless
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Testing Or Calibration Of Command Recording Devices (AREA)
- Traffic Control Systems (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
The invention provides a mixed model abnormal flow detection system and method based on ELM and deep forest, the system includes characteristic extraction and dimension reduction for real-time flow, mixed detection based on ELM and deep forest model is adopted for flow characteristic after dimension reduction, the detection method mainly includes that ELM algorithm deployed on member nodes in a wireless sensor network is utilized to carry out rapid detection for flow, secondary detection is carried out on abnormal flow detected by the member nodes by using the deep forest model at upper node Sink nodes of the member nodes, abnormal flow detection results are finally given out, finally the flow data is stored in a temporary database at a management node, and the deep forest model is optimized by using evaluation indexes such as original data set and new data set reference accuracy and AUC. The invention can complete the abnormal flow detection on the resource-limited nodes such as the wireless sensor, reduces the energy consumption of abnormal detection of each node, and can improve the accuracy of the abnormal detection result.
Description
Technical Field
The invention relates to the technical field of network security detection, in particular to a mixed model abnormal flow detection system and method based on ELM and deep forest.
Background
According to an internet security threat report (ISTR for short), some encryption hijack criminals in the internet environment at the present stage cannot use some equipment due to high CPU occupancy rate through encryption hijack scripts, network attacks frequently occur and are not very preventive, and in addition, along with popularization of the internet of things system, the internet of things system composed of a plurality of wireless sensors and resource-limited terminals is opened due to large-scale use and long-term environment, becomes more and more attack targets of malicious third parties and has more and more, rapid and low-power consumption abnormal flow detection requirements.
The existing intrusion detection usually uses the traditional machine learning and deep learning algorithm to take abnormal detection as a binary classification problem, establishes a supervised, semi-supervised or unsupervised learning model by using normal data and abnormal data, obtains good effects, particularly obtains a successful deep learning algorithm in a visual task, and is greatly tried and applied to the field of abnormal flow by related research organizations at home and abroad in recent years.
Deep learning adopts training of multiple single-layer nonlinear networks to find and depict complex structural features inside problems, so that essential features of data can be expressed, the method has strong generalization capability, and the accuracy of the data obtained in the abnormal flow detection direction is higher and higher. However, the method also needs larger and larger calculation overhead, has too many over-parameters, has learning performance seriously depending on careful parameter tuning, has too many training interference factors, needs a large amount of training data, has slow convergence rate and long time consumption, and is difficult to be applied to tasks with small-scale training data.
Aiming at the problems, the deep forest has less hyper-parameters than the deep neural network, and has strong robustness advantage for hyper-parameter setting. The deep forest is a deep model established based on an immutable module, and generates the deep forest with three characteristics of layer-by-layer processing, feature conversion in the model and enough model complexity. In most cases, excellent performance is obtained even in different data of different domains using the same default settings.
The Extreme Learning Machine (ELM) has the advantages of less required parameters, less occupied resources, high training speed and high learning efficiency. ELM is a single hidden layer feedforward neural network SLFNs learning algorithm, has the advantages of strong learning ability, approximation to complex nonlinear functions and the like, is directly solved, the final solving process can be changed into a Moore-Penrose generalized inverse problem of solving a matrix, and the model can be trained only by setting the number of hidden layer neurons. In terms of application, the ELM can reduce energy consumption on the basis of ensuring higher detection rate, and is more suitable for a wireless sensor network with limited resources.
Disclosure of Invention
Aiming at one or more problems in the prior art, the invention provides an ELM and deep forest based mixed model abnormal flow detection system and method, which are used for completing abnormal flow detection on resource-limited nodes such as a wireless sensor and the like and have higher detection efficiency and accuracy.
The technical solution for realizing the purpose of the invention is as follows:
a mixed model abnormal flow detection method based on ELM and deep forest is characterized by comprising the following steps:
step 1: performing data cleaning, feature extraction and data dimension reduction on real-time flow data collected by bottom member nodes in resource limiting nodes of the wireless sensor;
and 2, step: deploying an extreme learning machine ELM model and a deep forest model at different nodes in a wireless sensor network, performing mixed abnormal flow detection, and outputting an abnormal flow detection result;
and step 3: and (3) carrying out deep forest model retraining on the updated data set by the management node in the wireless sensor according to the accuracy and AUC serving as evaluation indexes, wherein the AUC is the area under the ROC curve.
Further, the mixed model abnormal flow detection method based on the ELM and the deep forest of the invention specifically comprises the following steps of 1:
s1-1: removing dirty data such as a non-numerical value NAN and an infinite numerical value Infinity in the real-time flow data;
s1-2: for the captured complete network information, 30-dimensional data which has a large influence on abnormal flow detection is selected by referring to a botnet data set (BOT-IOT data set) facing the Internet of things and disclosed in Kanbera network range laboratory of New Nanweil university, 14-dimensional new features are newly added on the basis of the 30-dimensional features, and the statistical data such as the total packet number of each source/target IP are mainly taken as statistical data;
s1-3: and carrying out normalization and standardization operation on the processed data to form preprocessed flow data with 54-dimensional characteristics in total.
Further, the mixed model abnormal flow detection method based on ELM and deep forest of the invention comprises the following steps of 2:
step 2-1: deploying an ELM model to member nodes through a management node in the wireless sensor;
step 2-2: deploying a deep forest model to a sink node (sink node) through a management node in a wireless sensor;
step 2-3: carrying out ELM rapid detection on the real-time flow data subjected to dimensionality reduction in the member node, adding data characteristics to the flow data according to a detection result, marking the data characteristics as normal values and passing if the detection result is normal flow, and marking the data characteristics as abnormal values if the detection result is abnormal flow;
step 2-4: deleting the data characteristic mark of the abnormal flow in the sink node, performing secondary detection on the abnormal flow by adopting a deep forest model, adding data characteristics to the flow data according to a secondary detection result to be used as a final characteristic value of the flow data, and summarizing and fusing the data into a management node.
Further, according to the mixed model abnormal flow detection method based on the ELM and the deep forest, the pre-training process of the ELM deployed in the step 2-1 comprises the following steps:
step 2-1-1: input training sample X train Sample size of [ row 0 ,col 0 ]Dividing the BOT-IOT data set preprocessed in the step 1 into a training set and a testing set according to a ratio of 8 train To train set, row 0 To train the number of samples, col 0 For features of training samplesDimension number; the number of the hidden layer units is n, and n is a positive integer;
step 2-1-2: initializing weight w and deviation b of the ELM model, wherein the value range of w is [ col 0 ,n]B has a value range of [ row ] 0 ,n];
Step 2-1-3: according to the formula h = g (w · X) train + b) calculating the non-linear mapping of the ELM model, where g (X) is the activation function, h is the non-linear mapping of the ELM model, X train For training the samples, the output of the hidden layer is obtained by solving the inverse matrix H of H, where H is [ n, row ] 0 ];
Step 2-1-4: carrying out single hot coding on data labels in a data set, and calculating to obtain beta = H · T, wherein the data labels represent whether the data is abnormal flow, T is a value obtained after the label data is subjected to single hot coding, beta represents output weight and is [ n,2]; .
Step 2-1-5: for input real-time flow data X test Calculate h 1 =g(w·X test +b),h 1 The flow detection result is obtained by nonlinear mapping of the tested real-time flow data in an ELM model and calculating with beta, and the calculation formula is as follows: result = h 1 ·β;
Step 2-1-6: classifying the obtained flow detection result to obtain abnormal flow and normal flow;
step 2-1-7: and saving the trained weight w, deviation b and inverse matrix H as parameters required for deploying the ELM model.
Further, in the mixed model abnormal flow detection method based on the ELM and the deep forest, the pre-training process of the deep forest model deployed in the step 2-2 comprises the following steps:
step 2-2-1: selecting depth forest classifiers which are respectively an XGboost classifier, a random forest classifier and an extreme random tree classifier, setting the maximum depth of the XGboost classifier to be 5, setting the target to be multi-classification and the learning rate to be 0.1, and setting default parameters by using the random forest classifier and the extreme random tree classifier;
and 2-2-2, performing deep forest training by using the processed BOT-IOT model, and storing a deep forest training result.
Further, the mixed model abnormal flow detection method based on the ELM and the deep forest of the invention specifically comprises the following steps of 3:
step 3-1: deploying an optimized deep forest model in a user-oriented management node, and configuring deep forest parameters;
step 3-2: pre-storing marked flow data from each sink node into a temporary database, and setting a data storage upper limit;
step 3-3: when the pre-stored quantity reaches the upper storage limit, extracting data characteristics as labels, mixing the labels with the original data set, dividing the labels into a training set and a testing set, and performing deep forest model optimization by taking the accuracy and AUC as evaluation indexes;
step 3-4: and the management node loads the adjusted model into the sink node.
Further, the mixed model abnormal flow detection method based on ELM and deep forest of the invention, the deep forest model tuning in step 3-3 adopts K-fold cross-validation method, specifically including:
step 3-3-1: backing up the original data set;
step 3-3-2: the original data set is divided into a training set and a testing set, data characteristics of data in the temporary database are taken out to be used as a label set, and the label set and the training set in the original data set are fused into a new data set;
step 3-3-3: equally dividing the new data set into k groups, selecting one group as a verification set, and taking the rest k-1 groups as training sets;
step 3-3-4: constructing a deep forest algorithm model by adopting a deep forest consisting of a random forest classifier, an XGboost classifier, an extreme random tree classifier and a logistic regression classifier;
step 3-3-5: searching a model with the minimum error in each verification set during k-1 compromise, and returning to the step 3-3-3 if the training times are less than k;
step 3-3-6: taking out the model with the minimum error in the verification set, placing the model into a test set for testing the error, and calculating to obtain the average value of the performance on the test set each time;
step 3-3-7: and selecting the accuracy and the AUC value for the evaluation value, comparing the average value of the new model with the performance of the original model, updating the deep forest model and the data set if the performance is improved, and otherwise, continuing to use the original deep forest model and deleting the data stored at this time.
An ELM and deep forest mixed model abnormal flow detection system based on any detection method comprises:
the real-time flow data feature extraction and dimension reduction module is used for carrying out data cleaning, feature extraction and data dimension reduction on the real-time flow data collected by the bottom member nodes in the resource limiting nodes of the wireless sensor;
the abnormal flow detection module is used for deploying an ELM model and a deep forest model at different nodes in the wireless sensor network, performing mixed abnormal flow detection and outputting an abnormal flow detection result;
and the deep forest model training module is used for retraining the deep forest model on the updated data set by the management node in the wireless sensor according to the accuracy and AUC serving as evaluation indexes, wherein the AUC is the area under an ROC curve.
Compared with the prior art, the invention adopting the technical scheme has the following technical effects:
the mixed model abnormal flow detection method based on the ELM and the deep forest is used for resource-limited nodes such as a wireless sensor and the like, mixed detection based on the ELM and the deep forest model is carried out on preprocessed flow characteristics, the ELM and the deep forest model are respectively deployed in a wireless sensor network according to different division of different nodes in the network, and the ELM and the deep forest model are matched in a multi-layer mode, so that the energy consumption of abnormal detection of each node is greatly reduced, and meanwhile, the accuracy of an abnormal detection result can be improved.
Drawings
The accompanying drawings are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention, and together with the description serve to explain the principles of the invention. In the drawings:
FIG. 1 is a flow chart of the mixed model abnormal flow detection method based on ELM and deep forest of the present invention.
FIG. 2 is an architecture diagram of the mixed model abnormal flow detection system based on ELM and deep forest of the present invention.
FIG. 3 is an ELM model schematic diagram of the ELM and deep forest based mixed model abnormal flow detection method of the present invention.
FIG. 4 is a deep forest model schematic diagram of the mixed model abnormal flow detection method based on ELM and deep forest.
FIG. 5 is a k-fold cross validation diagram of the mixed model abnormal flow detection method based on ELM and deep forest.
Detailed Description
For a further understanding of the invention, reference will now be made to the preferred embodiments of the invention by way of example, and it is to be understood that the description is intended to further illustrate features and advantages of the invention, and not to limit the scope of the claims.
The description in this section is for several exemplary embodiments only, and the present invention is not limited only to the scope of the embodiments described. Combinations of the various embodiments, and substitutions of features from different embodiments, or similar prior art means, may be made to or replace some of the features of the embodiments with others, are also within the scope of the invention as described and claimed.
A mixed model abnormal flow detection method based on ELM and deep forest is realized by a real-time flow data feature extraction and dimension reduction module, an abnormal flow detection module, a deep forest model training module and other three modules as shown in figure 1, and specifically comprises the following steps:
s1: the real-time flow data feature extraction and dimension reduction module is used for cleaning real-time flow data collected by bottom member nodes in resource limiting nodes such as a wireless sensor and the like, and then extracting features and reducing dimensions of the data. The method specifically comprises the following steps:
s11: and eliminating the non-numerical NAN and infinite numerical value Infinity dirty data existing in the real-time traffic.
S12: for the captured complete network information, 30-dimensional data which has a large influence on abnormal flow detection, such as text representation of a transaction protocol in a network flow, a source IP address, a target IP address and the like, are selected by referring to a BOT-IOT data set, and in order to improve the detection capability of the classifier, 14-dimensional new features are added on the basis of the 30-dimensional features, and are mainly statistical data such as the total packet number of each source/target IP.
S13: and carrying out normalization and standardization operation on the processed data to form preprocessed flow data with 54-dimensional characteristics in total.
S2: and the abnormal flow detection module is used for deploying ELM and deep forest models at different nodes in the wireless sensor network, performing mixed abnormal flow detection and outputting an abnormal flow detection result. The method specifically comprises the following steps:
s21: and deploying the ELM model to the member nodes through the management node in the wireless sensor.
Preferably, the deployment ELM model pre-training process comprises:
step a1: inputting a training sample X train Sample size of [ row 0 ,col 0 ]The number of the hidden layer units is n; here the sample size is [4000000,54 ]]After testing, the number n of the hidden layer units is selected to be 109.
Step a2: initializing the weight w and the bias b of the ELM model, where w is [ col ] 0 ,n]B is [ row ] in size 0 ,n]. The content of w is a random number within the range, and b initializes an array with all 0;
step a3: according to the formula h = g (w · X) train + b) obtaining the nonlinear mapping of ELM, wherein g (x) is an activation function, and the activation function can be sigmoid, gaussian, multiquartic, etc., and obtaining the output of the hidden layer by solving the inverse matrix H of H, wherein the H is [ n, row ] 0 ];
The multiquadratic activation function is also a radial basis function, is a real-valued function with the value depending only on the distance from the far point, and can select an inverse quadratic function and an inverse multiquadratic function asAndthis pre-trainingAn inverse multi-quadratic function is selected.
Step a4: carrying out one-hot coding on the data labels in the data set to obtain T with the size of [ row ] 0 ,2]Using H of step a3 to obtain β = H · T, β having a size of [ n,2] by the formula];
Step a5: for input real-time flow X test Calculating h 1 =g(w·X test + b), and calculating a flow detection result with beta to obtain result = h 1 ·β。
Step a6: the obtained results are classified into abnormal flow rates and normal flow rates by result = [ item.
Step a7: and storing the w, b and H after training, which are parameters required for deploying the ELM model.
S22: and deploying a deep forest model to the sink node through a management node in the wireless sensor.
Preferably, the pre-training process of the deployed deep forest model comprises the following steps:
step b1: selecting a depth forest classifier which is an XGboost classifier, a random forest classifier and an extreme random tree classifier respectively, setting the maximum depth of the XGboost classifier to be 5, setting the target to be multi-classification, setting the learning rate to be 0.1, and setting default parameters for the random forest classifier and the extreme random tree classifier.
And b2, using the processed BOT-IOT model to store deep forest training results through deep forest training.
S23: firstly, carrying out ELM rapid detection on the flow data after dimensionality reduction in a member node, adding a data feature detection _ result to the flow data according to a detection result, marking the detection _ result as a normal value '0' and passing if the detection result is normal flow, and marking the detection _ result as an abnormal value '1' if the detection result is abnormal flow.
S24: after the data marked as abnormal flow in the sink node is identified, deleting the detection _ result characteristic value, performing secondary detection by adopting a deep forest model, determining the detection _ result value according to the result, and summarizing and fusing the data into a management node for processing.
S3: and the management node in the wireless sensor retrains the deep forest model to the updated data set according to the evaluation indexes such as accuracy and AUC.
Preferably, the deep forest model training module specifically comprises:
s31: and deploying an optimized deep forest algorithm in the management node facing the user, and configuring deep forest parameters.
S32: and (3) pre-storing the marked flow data from each sink node into a temporary database, and setting the upper limit of data storage to 10000.
S33: and after the pre-stored quantity reaches a set threshold value, extracting detection _ result characteristics as labels, mixing the labels with the original data set, splitting the labels into a training set and a testing set, and performing deep forest model optimization according to evaluation indexes such as accuracy and AUC.
Preferably, the tuning algorithm adopted in this example is a K-fold cross-validation method, as shown in fig. 5:
step c1: the original data set is backed up.
And c2: the original data set is divided into a training set and a testing set, the detection _ result characteristics of the data in the temporary database are taken out to be used as a label set, and then the label set and the training set in the original data set are fused into a new data set;
and c3: the new data sets are divided into k groups, one of the k groups is selected as a verification set, the rest k-1 group subsets are used as training sets, and k is 10.
And c4: constructing a deep forest algorithm model by adopting a deep forest consisting of a random forest classifier, an XGboost classifier, an extreme random tree classifier and a logistic regression classifier;
and c5: searching a model with the minimum error in the verification set in each folding of k-1, and returning to the step c3 if the training times are less than k;
step c6: and taking out the model with the minimum error in the verification set, placing the model into a test set for testing the error, and calculating to obtain the average value of the performance on the test set each time.
Step c7: and selecting the accuracy and the AUC value from the evaluation value, comparing the average value of the new model with the performance of the original model, updating the deep forest model and the data set if the performance is improved, or continuing to use the original deep forest model and deleting the stored new data.
The evaluation index comprises:
rate of accuracyThe model classifies the test set, and the number n of samples with correct classification correct Total sample N total The ratio of (a) to (b).
Recall rateAll correctly divided positive samples TP account for the proportion of all positive samples (TP + FN), and the positive samples are positive example data, i.e. the number of abnormal flows, i.e. how much of all abnormal flows is found out.
All correctly classified positive samples account for the proportion of all samples (TP + FP) predicted as positive, i.e. the proportion of the picked flow is abnormal flow
AUC is the area under the ROC curve, which describes the variation relationship between true normal rate and false normal rate, the higher the better,
error of square rooty i Is the true value of the,is a predicted value, n is the number of samples, useThe euclidean distance.
S34: and the management node loads the adjusted model into the sink node.
The invention provides an abnormal flow detection system and method for resource-limited nodes such as a wireless sensor, and the like, wherein mixed detection based on ELM and a deep forest model is carried out on preprocessed flow characteristics, the ELM and the deep forest model are respectively deployed in a wireless sensor network according to the division of work in the network of different nodes, and the multiple layers of cooperation are adopted, so that the energy consumption of abnormal detection of each node is greatly reduced, and the accuracy of an abnormal detection result can be improved.
The description and applications of the invention herein are illustrative and are not intended to limit the scope of the invention to the embodiments described above. The descriptions related to the effects or advantages in the specification may not be reflected in practical experimental examples due to uncertainty of specific condition parameters or influence of other factors, and the descriptions related to the effects or advantages are not used for limiting the scope of the invention. Variations and modifications of the embodiments disclosed herein are possible, and alternative and equivalent various components of the embodiments will be apparent to those of ordinary skill in the art. It will be clear to those skilled in the art that the present invention may be embodied in other forms, structures, arrangements, proportions, and with other components, materials, and parts, without departing from the spirit or essential characteristics thereof. Other variations and modifications of the embodiments disclosed herein may be made without departing from the scope and spirit of the invention.
Claims (8)
1. A mixed model abnormal flow detection method based on ELM and deep forest is characterized by comprising the following steps:
step 1: performing data cleaning, feature extraction and data dimension reduction on real-time flow data collected by bottom member nodes in resource limiting nodes of the wireless sensor;
step 2: deploying an extreme learning machine ELM model and a deep forest model at different nodes in a wireless sensor network, performing mixed abnormal flow detection, and outputting an abnormal flow detection result;
and step 3: and (3) carrying out deep forest model retraining on the updated data set by the management node in the wireless sensor according to the accuracy and AUC serving as evaluation indexes, wherein the AUC is the area under the ROC curve.
2. The mixed model abnormal flow detection method based on ELM and deep forest as claimed in claim 1, wherein step 1 specifically comprises:
s1-1: rejecting dirty data such as non-numerical values, infinite numerical values and the like in the real-time flow data;
s1-2: selecting 30-dimensional data which has large influence on abnormal flow detection in the captured complete network information by referring to a BOT-IOT data set, and adding 14-dimensional new features based on the 30-dimensional features, wherein the 30-dimensional new features are mainly statistical data such as the total packet number of each source/target IP;
s1-3: and carrying out normalization and standardization operation on the processed data to form preprocessed flow data with 54-dimensional characteristics in total.
3. The mixed model abnormal flow detection method based on ELM and deep forest as claimed in claim 1, wherein step 2 specifically comprises:
step 2-1: deploying an ELM model to member nodes through a management node in the wireless sensor;
step 2-2: deploying a deep forest model to a sink node through a management node in a wireless sensor;
step 2-3: carrying out ELM rapid detection on the real-time flow data subjected to dimensionality reduction in the member node, adding data characteristics to the flow data according to a detection result, marking the data characteristics as normal values and passing if the detection result is normal flow, and marking the data characteristics as abnormal values if the detection result is abnormal flow;
step 2-4: deleting the data feature marks of the abnormal flow in the sink node, performing secondary detection on the abnormal flow by adopting a deep forest model, adding data features to the flow data according to a secondary detection result as a final feature value of the flow data, and summarizing and fusing the data into a management node.
4. The ELM and deep forest based hybrid model abnormal traffic detection method as claimed in claim 3, wherein the ELM model pre-training process deployed in step 2-1 comprises:
step 2-1-1: inputting a training sample X train Sample size of [ row 0 ,col 0 ]Dividing the BOT-IOT data set preprocessed in the step 1 into a training set and a testing set according to a ratio of 8 train To train set, row 0 To train the number of samples, col 0 Feature dimensions for training samples; the number of the hidden layer units is n, and n is a positive integer;
step 2-1-2: initializing weight w and deviation b of the ELM model, wherein the value range of w is [ col 0 ,n]B has a value range of [ row ] 0 ,n];
Step 2-1-3: according to the formula h = g (w · X) train + b) calculating the non-linear mapping of the ELM model, where g (X) is the activation function, h is the non-linear mapping of the ELM model, X train For training samples, the output of the hidden layer is obtained by solving the inverse matrix H of H, the size of H is [ n, row ] 0 ];
Step 2-1-4: carrying out single hot coding on data labels in a data set, and calculating to obtain beta = H · T, wherein the data labels represent whether the data is abnormal flow, T is a value obtained after the label data is subjected to single hot coding, beta represents output weight and is [ n,2];
step 2-1-5: for input real-time flow data X test Calculating h 1 =g(w·X test +b),h 1 For nonlinear mapping of the tested real-time flow data in the ELM model, calculating with beta to obtain a flow detection result, wherein the calculation formula is as follows: result = h 1 ·β;
Step 2-1-6: classifying the obtained flow detection result to obtain abnormal flow and normal flow;
step 2-1-7: and saving the trained weight w, deviation b and inverse matrix H as parameters required for deploying the ELM model.
5. The ELM and deep forest based hybrid model abnormal traffic detection method as recited in claim 1, wherein the pre-training process of the deep forest model deployed in step 2-2 comprises:
step 2-2-1: selecting depth forest classifiers which are respectively an XGboost classifier, a random forest classifier and an extreme random tree classifier, setting the maximum depth of the XGboost classifier to be 5, setting the target to be multi-classification and the learning rate to be 0.1, and setting default parameters by using the random forest classifier and the extreme random tree classifier;
and 2-2-2, performing deep forest training by using the processed BOT-IOT model, and storing a deep forest training result.
6. The mixed model abnormal flow detection method based on ELM and deep forest as claimed in claim 1, wherein step 3 specifically comprises:
step 3-1: deploying an optimized deep forest model in a management node facing a user, and configuring deep forest parameters;
step 3-2: pre-storing the marked flow data from each sink node into a temporary database, and setting a data storage upper limit;
step 3-3: when the pre-stored quantity reaches the upper storage limit, extracting data characteristics as labels, mixing the labels with an original data set, dividing the labels into a training set and a testing set, and performing deep forest model optimization by taking the accuracy and the AUC as evaluation indexes;
step 3-4: and the management node loads the adjusted and optimized model into the sink node.
7. The ELM and deep forest based hybrid model abnormal flow detection method as claimed in claim 6, wherein the deep forest model tuning in step 3-3 employs a K-fold cross-validation method, specifically comprising:
step 3-3-1: backing up the original data set;
step 3-3-2: the original data set is divided into a training set and a test set, data characteristics of data in the temporary database are taken out to be used as a label set, and the label set and the training set in the original data set are fused into a new data set;
step 3-3-3: dividing the new data set into k groups equally, selecting one group as a verification set, and taking the rest k-1 groups as a training set;
step 3-3-4: constructing a deep forest algorithm model by adopting a deep forest consisting of a random forest classifier, an XGboost classifier, an extreme random tree classifier and a logistic regression classifier;
step 3-3-5: searching a model with the smallest error in the verification set in k-1 compromise, and if the training times are less than k times, returning to the step 3-3-3;
step 3-3-6: taking out the model with the minimum error in the verification set, placing the model into a test set for testing the error, and calculating to obtain the average value of the performance on the test set each time;
step 3-3-7: and selecting the accuracy and the AUC value for the evaluation value, comparing the average value of the new model with the performance of the original model, updating the deep forest model and the data set if the performance is improved, and otherwise, continuing to use the original deep forest model and deleting the data stored at this time.
8. A mixed model abnormal flow detection system based on ELM and deep forest according to any claim 1-7, which comprises:
the real-time flow data feature extraction and dimension reduction module is used for carrying out data cleaning, feature extraction and data dimension reduction on the real-time flow data collected by the bottom member nodes in the resource limiting nodes of the wireless sensor;
the abnormal flow detection module is used for deploying ELM models and deep forest models at different nodes in the wireless sensor network, performing mixed abnormal flow detection and outputting abnormal flow detection results;
and the deep forest model training module is used for carrying out deep forest model retraining on the updated data set by the management node in the wireless sensor according to the accuracy and AUC serving as evaluation indexes, wherein the AUC is the area under an ROC curve.
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210783769.XA CN115175191A (en) | 2022-06-28 | 2022-06-28 | Mixed model abnormal flow detection system and method based on ELM and deep forest |
JP2023528213A JP2024526480A (en) | 2022-06-28 | 2022-10-24 | System and method for detecting anomalies in a mixture model based on ELM and deep forest |
PCT/CN2022/126962 WO2024000944A1 (en) | 2022-06-28 | 2022-10-24 | Elm- and deep-forest-based hybrid model traffic anomaly detection system and method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210783769.XA CN115175191A (en) | 2022-06-28 | 2022-06-28 | Mixed model abnormal flow detection system and method based on ELM and deep forest |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115175191A true CN115175191A (en) | 2022-10-11 |
Family
ID=83490458
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210783769.XA Pending CN115175191A (en) | 2022-06-28 | 2022-06-28 | Mixed model abnormal flow detection system and method based on ELM and deep forest |
Country Status (3)
Country | Link |
---|---|
JP (1) | JP2024526480A (en) |
CN (1) | CN115175191A (en) |
WO (1) | WO2024000944A1 (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2024000944A1 (en) * | 2022-06-28 | 2024-01-04 | 南京邮电大学 | Elm- and deep-forest-based hybrid model traffic anomaly detection system and method |
CN117408537A (en) * | 2023-12-15 | 2024-01-16 | 安徽科派自动化技术有限公司 | Electric energy quality monitoring system capable of realizing real-time risk prediction |
CN117574296A (en) * | 2023-11-23 | 2024-02-20 | 清远市信和实业有限公司 | Plating bath liquid flow distribution detection system and method thereof |
CN118316723A (en) * | 2024-05-11 | 2024-07-09 | 山东慧贝行信息技术有限公司 | Network security assessment method and system based on network risk detection |
CN118337534A (en) * | 2024-06-13 | 2024-07-12 | 山东网驰信息技术有限公司 | Data monitoring system for determining abnormal flow |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118013201B (en) * | 2024-03-07 | 2024-10-01 | 暨南大学 | Flow anomaly detection method and system based on improved BERT fusion contrast learning |
CN118631589B (en) * | 2024-08-09 | 2024-10-11 | 四川云互未来科技有限公司 | Network traffic supervision abnormality identification early warning method and system |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI783229B (en) * | 2020-05-22 | 2022-11-11 | 國立臺灣大學 | Anomaly flow detection device and anomaly flow detection method |
CN112784881B (en) * | 2021-01-06 | 2021-08-27 | 北京西南交大盛阳科技股份有限公司 | Network abnormal flow detection method, model and system |
CN114553591B (en) * | 2022-03-21 | 2024-02-02 | 北京华云安信息技术有限公司 | Training method of random forest model, abnormal flow detection method and device |
CN115175191A (en) * | 2022-06-28 | 2022-10-11 | 南京邮电大学 | Mixed model abnormal flow detection system and method based on ELM and deep forest |
-
2022
- 2022-06-28 CN CN202210783769.XA patent/CN115175191A/en active Pending
- 2022-10-24 JP JP2023528213A patent/JP2024526480A/en active Pending
- 2022-10-24 WO PCT/CN2022/126962 patent/WO2024000944A1/en unknown
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2024000944A1 (en) * | 2022-06-28 | 2024-01-04 | 南京邮电大学 | Elm- and deep-forest-based hybrid model traffic anomaly detection system and method |
CN117574296A (en) * | 2023-11-23 | 2024-02-20 | 清远市信和实业有限公司 | Plating bath liquid flow distribution detection system and method thereof |
CN117574296B (en) * | 2023-11-23 | 2024-04-23 | 清远市信和实业有限公司 | Plating bath liquid flow distribution detection system and method thereof |
CN117408537A (en) * | 2023-12-15 | 2024-01-16 | 安徽科派自动化技术有限公司 | Electric energy quality monitoring system capable of realizing real-time risk prediction |
CN117408537B (en) * | 2023-12-15 | 2024-05-07 | 安徽科派自动化技术有限公司 | Electric energy quality monitoring system capable of realizing real-time risk prediction |
CN118316723A (en) * | 2024-05-11 | 2024-07-09 | 山东慧贝行信息技术有限公司 | Network security assessment method and system based on network risk detection |
CN118337534A (en) * | 2024-06-13 | 2024-07-12 | 山东网驰信息技术有限公司 | Data monitoring system for determining abnormal flow |
Also Published As
Publication number | Publication date |
---|---|
JP2024526480A (en) | 2024-07-19 |
WO2024000944A1 (en) | 2024-01-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115175191A (en) | Mixed model abnormal flow detection system and method based on ELM and deep forest | |
Haghighat et al. | Intrusion detection system using voting-based neural network | |
Amini et al. | Effective intrusion detection with a neural network ensemble using fuzzy clustering and stacking combination method | |
CN112287997A (en) | Depth map convolution model defense method based on generative confrontation network | |
Jang et al. | Collective decision of one-vs-rest networks for open-set recognition | |
Yadav et al. | Unsupervised federated learning based IoT intrusion detection | |
Yu et al. | An empirical study of pre-trained vision models on out-of-distribution generalization | |
Abdullah et al. | An artificial deep neural network for the binary classification of network traffic | |
Aliza et al. | A comparative analysis of SMS spam detection employing machine learning methods | |
Rustam et al. | Deep ensemble-based efficient framework for network attack detection | |
Shanbhogue et al. | Survey of data mining (DM) and machine learning (ML) methods on cyber security | |
Vuković et al. | Thermal image degradation influence on R-CNN face detection performance | |
Kavitha et al. | Machine learning techniques for detecting DDoS attacks in SDN | |
CN114726800B (en) | Rule type application flow classification method and system based on model interpretation | |
Feng et al. | A deep belief network based machine learning system for risky host detection | |
CN115604025A (en) | Network intrusion detection method based on PLI4DA | |
Perez et al. | Mahalanobis distance metric learning algorithm for instance-based data stream classification | |
Rajabi et al. | Adversarial Profiles: Detecting Out-Distribution & Adversarial Samples in Pre-trained CNNs | |
Boursinos et al. | Reliable probability intervals for classification using inductive venn predictors based on distance learning | |
Prakosa et al. | Using Optimized focal loss for imbalanced dataset on network intrusion detection system | |
Jakotiya et al. | Review on Intrusion Detection System Using Deep Learning and Machine Learning | |
Urmila | Darknet (Tor) Accessing Identification System Using Deep-Wide Cross Network | |
Hussein et al. | Deep neural network with dropout for anomaly detection in software defined networking | |
Pradeep et al. | Detection and Prevention of DDoS Attack Packets on the Distributed Network Using Bi-LSTM Network | |
Disney et al. | An AI-Driven Based Cybersecurity System for Network Intrusion Detection System in Hybrid with EPO and CNNet-LAM |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |