CN112507332A - Artificial intelligence network security attack flow retrieval method - Google Patents

Artificial intelligence network security attack flow retrieval method Download PDF

Info

Publication number
CN112507332A
CN112507332A CN202011361014.8A CN202011361014A CN112507332A CN 112507332 A CN112507332 A CN 112507332A CN 202011361014 A CN202011361014 A CN 202011361014A CN 112507332 A CN112507332 A CN 112507332A
Authority
CN
China
Prior art keywords
classifier
attack
class
classifiers
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011361014.8A
Other languages
Chinese (zh)
Inventor
张秋余
董瑞洪
袁晖
胡颖杰
王春霞
赵金雄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Electric Power Research Institute of State Grid Gansu Electric Power Co Ltd
Lanzhou University of Technology
Original Assignee
Electric Power Research Institute of State Grid Gansu Electric Power Co Ltd
Lanzhou University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Electric Power Research Institute of State Grid Gansu Electric Power Co Ltd, Lanzhou University of Technology filed Critical Electric Power Research Institute of State Grid Gansu Electric Power Co Ltd
Priority to CN202011361014.8A priority Critical patent/CN112507332A/en
Publication of CN112507332A publication Critical patent/CN112507332A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention provides an artificial intelligence network security attack flow retrieval method, which comprises the following steps: and S1, selecting different characteristic value groups, wherein the detection precision for the attack types is different. S2, selecting a group of features with highest detection accuracy for a certain type, wherein the features are close to 100%, training a classifier by using the group of features, and so on, wherein each type has a targeted classifier, and the classifiers are combined to detect that if the data set has i types, i binary classifiers are formed by feature value selection and model training. Each classifier has a very high recognition rate for a class, and a "hit" is determined if the classifier determines that the sample is the class for which the classifier is directed. For each group of data, extracting corresponding characteristic values, inputting the characteristic values into the i classifiers respectively, observing classification results, identifying new attacks while detecting fixed attacks, and automatically adjusting models or strategies through attacks or environmental changes.

Description

Artificial intelligence network security attack flow retrieval method
Technical Field
The invention mainly relates to the technical field of network security, in particular to a method for searching artificial intelligent network security attack flow.
Background
Most of research is mainly aimed at improving the accuracy of an intrusion detection system and reducing the false alarm rate. Adaptability is a pending problem and is a disadvantage. Most intrusion detection can only detect fixed attacks, new attacks cannot be identified, and models or strategies cannot be adjusted automatically after the attacks or the environment changes. Adaptive properties mean that ids should be able to adapt to the needs of the new environment without requiring administrator feedback. This means that the IDS adaptation process cannot be based on tagged data, requiring the use of methods that are applicable to untagged data.
Disclosure of Invention
The invention mainly provides an artificial intelligence network security attack flow retrieval method, which is used for solving the technical problems in the background technology.
The technical scheme adopted by the invention for solving the technical problems is as follows:
the method for searching the artificial intelligent network security attack flow comprises the following steps:
and S1, selecting different characteristic value groups, wherein the detection precision for the attack types is different.
S2, selecting a group of features with highest detection accuracy for a certain type, wherein the features are close to 100%, training a classifier by using the group of features, and so on, wherein each type has a targeted classifier, and the classifiers are combined to detect that if the data set has i types, i binary classifiers are formed by feature value selection and model training. Each classifier has a very high recognition rate for a class, and a "hit" is determined if the classifier determines that the sample is the class for which the classifier is directed. For each group of data, extracting corresponding characteristic values, then respectively inputting the characteristic values into the i classifiers, and observing classification results, wherein the classification results comprise the following conditions:
case 1: one classifier hits, and the other classifiers miss. The sample is determined to be the class targeted by the hit classifier, which is the simplest case, i.e. the sample is a known attack or normal traffic;
case 2: all classifiers miss. The sample may be a new category, which is considered a category; in this case, which indicates that the sample may be an unknown attack or a new form of a known attack, we tend to consider the new form of the known attack (i.e., the attack signature is different from the known signature) as a new attack. In this case, therefore, all samples are classified into a uniform class, which is considered as a new type of attack. When the sample size reaches a certain number and can be used for training, training a new classifier to identify the class;
case 3: more than 1 classifier hit. The sample may be of a new class. In this case, the samples are regarded as unknown attacks, and the samples with the same hit are regarded as the same class, so that (2 i-th power-1-i) unknown attacks (2 hits of C (i, 2) and 3 hits of C (i, 3), … …, and 1 hit of i) can be theoretically identified. When the number of the classifiers reaches a certain degree and is enough to be trained, the model is updated, and a two-classifier is added. Thus, after a certain number of unknown samples have been sufficient to cause the model to be updated, the "unknown" samples become "known" and can be hit by the just-newly generated classifier.
Further, the selected classical machine learning algorithm may adopt one of SVM, Random Forest (RF) and AdaBoost.
Further, the Libsvm algorithm is used for training the multi-classifier, and the classification precision is compared with the classification precision of data standardization.
Compared with the prior art, the invention has the beneficial effects that:
the method for searching the artificial intelligent network security attack flow can identify the new attack when the fixed attack is detected, and automatically adjust the model or the strategy through the attack or the change of the environment.
The present invention will be explained in detail below with reference to the drawings and specific embodiments.
Drawings
FIG. 1 is a schematic diagram of the overall process architecture of the present invention;
FIG. 2 is a schematic diagram of the present invention showing that the addition of data normalization can greatly improve the accuracy of the model;
FIG. 3 is a graph illustrating the accuracy and recall of various classification algorithms of the present invention on a validation set;
FIG. 4 is a diagram illustrating the most important 40 features of the BRF classifier according to the present invention;
FIG. 5 is a diagram of classifier training data of the present invention;
FIG. 6 is a diagram of the accuracy (precision) of the two classifiers for a particular class for different feature numbers according to the present invention;
FIG. 7 is a diagram illustrating the recall (recall) of the two classifiers of the present invention for a specific class with different feature numbers;
FIG. 8 is a diagram illustrating the minimum valid feature set for each class of the present invention.
Detailed Description
In order to facilitate an understanding of the invention, the invention will now be described more fully hereinafter with reference to the accompanying drawings, in which several embodiments of the invention are shown, but which may be embodied in different forms and not limited to the embodiments described herein, but which are provided so as to provide a more thorough and complete disclosure of the invention.
Example 1:
1. the method for searching the artificial intelligent network security attack flow is characterized by comprising the following steps:
and S1, selecting different characteristic value groups, wherein the detection precision for the attack types is different.
S2, selecting a group of features with highest detection accuracy for a certain type, wherein the features are close to 100%, training a classifier by using the group of features, and so on, wherein each type has a targeted classifier, and the classifiers are combined to detect that if the data set has i types, i binary classifiers are formed by feature value selection and model training. Each classifier has a very high recognition rate for a class, and a "hit" is determined if the classifier determines that the sample is the class for which the classifier is directed. For each group of data, extracting corresponding characteristic values, then respectively inputting the characteristic values into the i classifiers, and observing classification results, wherein the classification results comprise the following conditions:
case 1: one classifier hits, and the other classifiers miss. The sample is determined to be the class targeted by the hit classifier, which is the simplest case, i.e. the sample is a known attack or normal traffic;
case 2: all classifiers miss. The sample may be a new category, which is considered a category; in this case, which indicates that the sample may be an unknown attack or a new form of a known attack, we tend to consider the new form of the known attack (i.e., the attack signature is different from the known signature) as a new attack. In this case, therefore, all samples are classified into a uniform class, which is considered as a new type of attack. When the sample size reaches a certain number and can be used for training, training a new classifier to identify the class;
case 3: more than 1 classifier hit. The sample may be of a new class. In this case, the samples are regarded as unknown attacks, and the samples with the same hit are regarded as the same class, so that (2 i-th power-1-i) unknown attacks (2 hits of C (i, 2) and 3 hits of C (i, 3), … …, and 1 hit of i) can be theoretically identified. When the number of the classifiers reaches a certain degree and is enough to be trained, the model is updated, and a two-classifier is added. Thus, after a certain number of unknown samples have been sufficient to cause the model to be updated, the "unknown" samples become "known" and can be hit by the just-newly generated classifier.
The Libsvm algorithm is used for training the multi-classifier, and when the classification accuracy of data standardization is compared with that of the data standardization shown in fig. 2, it can be seen that the standardization of the data greatly contributes to the performance of the lifting model, and the data of all experiments in the report are standardized before training.
The data, after being partitioned and preprocessed, may be used to train a multi-class classifier. The classical machine learning algorithm that can be selected is SVM, random forest (RF for short), AdaBoost, and the like. The accuracy (P) and recall (R) of the various classification models on the validation set are shown in FIG. 3.
Although the average accuracy of each classification algorithm on the verification set is above 95%, the number of samples in the three categories, "Web attach", "Botnet ARES" and "infilteration" is small due to the uneven distribution of the sample brushing amount shown in table 2. The accuracy rate and the recall rate of the SVM algorithm in the three categories are obviously inferior to those of the SVM algorithm in other categories, and the accuracy rate and the recall rate of other multi-sample categories are slightly lower than those of the SVM algorithm in other three categories.
The BRF algorithm is a balanced random forest (balanced random forest classifier) classification algorithm provided by an imblearn library, is suitable for processing class imbalance data, and needs to be matched with a variation smotetome (provided by imblear. complex) for synthesizing a minority class oversampling technology (SMOTE) to sample training samples into classes for equalization. The subsequent experiments in this report all used the algorithm combination of BRF + smotetome.
The SMOTE algorithm has an unobvious improvement effect on the RF and Adaboost algorithms, but can make the model precision more stable without improvement on the SVM algorithm. The essential reason is that the number of samples in the 'Infiltration' category is too small, and the space for synthesizing and sampling is limited, so that the sampled samples in the category have strong homogeneity and insufficient diversity.
Under the combination of the BRF + smotomek algorithm, the classification effect of all classes is at a higher level, so the purpose of feature selection is to explore that the reduction of the identification effect of each class can be maintained at a smaller amplitude, i.e. a minimum effective feature subset is found, under the condition that which features are selected as few as possible.
Referring to fig. 4, the BRF classifier can output the percentage contribution of all features (79) to the classification result for the current classification task, and the contribution reflects the importance of the features to the samples (only the top 40 important samples are shown in the figure).
In order to obtain the minimum effective feature subset of each class sample, 7 classifiers are required to be trained to distinguish each class from all other classes, and when each classifier is trained, the labels of all samples in non-classes are set to 0, and the data statistics are as shown in fig. 5.
After the feature contribution percentages of the classifiers are obtained, the first k most important features are selected, and 7 classifiers are trained again. The values of k are continuously reduced (the number of selected features) and the descending conditions of the accuracy rate and the recall rate of each category of classification are observed as shown in figures 6 and 7.
In 7 classifiers, the recall rate of all classes is higher when k values are respectively 5-20, when k is 20, the performance of all the two classifiers is the closest to the performance of a model when all features are used, and when k is 10, the accuracy rate of the 'Infiltration' class is suddenly reduced.
In summary, the two classifiers trained by selecting the top 15 most important features for each class have the best effect, and the minimum effective feature combination for each class is shown in fig. 8.
The invention is described above with reference to the accompanying drawings, it is obvious that the invention is not limited to the above-described embodiments, and it is within the scope of the invention to adopt such insubstantial modifications of the inventive method concept and solution, or to apply the inventive concept and solution directly to other applications without modification.

Claims (3)

1. The method for searching the artificial intelligent network security attack flow is characterized by comprising the following steps:
and S1, selecting different characteristic value groups, wherein the detection precision for the attack types is different.
S2, selecting a group of features with highest detection accuracy for a certain type, wherein the features are close to 100%, training a classifier by using the group of features, and so on, wherein each type has a targeted classifier, and the classifiers are combined to detect that if the data set has i types, i binary classifiers are formed by feature value selection and model training. Each classifier has a very high recognition rate for a class, and a "hit" is determined if the classifier determines that the sample is the class for which the classifier is directed. For each group of data, extracting corresponding characteristic values, then respectively inputting the characteristic values into the i classifiers, and observing classification results, wherein the classification results comprise the following conditions:
case 1: one classifier hits, and the other classifiers miss. The sample is determined to be the class targeted by the hit classifier, which is the simplest case, i.e. the sample is a known attack or normal traffic;
case 2: all classifiers miss. The sample may be a new category, which is considered a category; in this case, which indicates that the sample may be an unknown attack or a new form of a known attack, we tend to consider the new form of the known attack (i.e., the attack signature is different from the known signature) as a new attack. In this case, therefore, all samples are classified into a uniform class, which is considered as a new type of attack. When the sample size reaches a certain number and can be used for training, training a new classifier to identify the class;
case 3: more than 1 classifier hit. The sample may be of a new class. In this case, the samples are regarded as unknown attacks, and the samples with the same hit are regarded as the same class, so that (2 i-th power-1-i) unknown attacks (2 hits of C (i, 2) and 3 hits of C (i, 3), … …, and 1 hit of i) can be theoretically identified. When the number of the classifiers reaches a certain degree and is enough to be trained, the model is updated, and a two-classifier is added. Thus, after a certain number of unknown samples have been sufficient to cause the model to be updated, the "unknown" samples become "known" and can be hit by the just-newly generated classifier.
2. The method for retrieving the flow of the artificial intelligence network security attack according to claim 1, wherein one of SVM, random forest (RF for short) and AdaBoost is adopted as the selectable classic machine learning algorithm.
3. The artificial intelligence network security attack traffic retrieval method of claim 1, wherein a Libsvm algorithm is used to train the multi-classifier, and comparison is performed with classification accuracy with or without data standardization.
CN202011361014.8A 2020-11-27 2020-11-27 Artificial intelligence network security attack flow retrieval method Pending CN112507332A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011361014.8A CN112507332A (en) 2020-11-27 2020-11-27 Artificial intelligence network security attack flow retrieval method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011361014.8A CN112507332A (en) 2020-11-27 2020-11-27 Artificial intelligence network security attack flow retrieval method

Publications (1)

Publication Number Publication Date
CN112507332A true CN112507332A (en) 2021-03-16

Family

ID=74967041

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011361014.8A Pending CN112507332A (en) 2020-11-27 2020-11-27 Artificial intelligence network security attack flow retrieval method

Country Status (1)

Country Link
CN (1) CN112507332A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108023876A (en) * 2017-11-20 2018-05-11 西安电子科技大学 Intrusion detection method and intruding detection system based on sustainability integrated study
US10063582B1 (en) * 2017-05-31 2018-08-28 Symantec Corporation Securing compromised network devices in a network
CN108650194A (en) * 2018-05-14 2018-10-12 南开大学 Net flow assorted method based on K_means and KNN blending algorithms

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10063582B1 (en) * 2017-05-31 2018-08-28 Symantec Corporation Securing compromised network devices in a network
CN108023876A (en) * 2017-11-20 2018-05-11 西安电子科技大学 Intrusion detection method and intruding detection system based on sustainability integrated study
CN108650194A (en) * 2018-05-14 2018-10-12 南开大学 Net flow assorted method based on K_means and KNN blending algorithms

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
汪洁;杨力立;杨珉;: "基于集成分类器的恶意网络流量检测", 通信学报, no. 10, 25 October 2018 (2018-10-25), pages 159 - 169 *

Similar Documents

Publication Publication Date Title
CN108632279B (en) Multilayer anomaly detection method based on network traffic
Gao et al. An adaptive ensemble machine learning model for intrusion detection
CN109768985B (en) Intrusion detection method based on flow visualization and machine learning algorithm
Eltanbouly et al. Machine learning techniques for network anomaly detection: A survey
CN110351301B (en) HTTP request double-layer progressive anomaly detection method
WO2020159439A1 (en) System and method for network anomaly detection and analysis
CN111740971A (en) Network intrusion detection model SGM-CNN based on class imbalance processing
CN106817248A (en) A kind of APT attack detection methods
CN112100614A (en) CNN _ LSTM-based network flow anomaly detection method
CN113489685B (en) Secondary feature extraction and malicious attack identification method based on kernel principal component analysis
Peng et al. Evaluating deep learning based network intrusion detection system in adversarial environment
CN112560596B (en) Radar interference category identification method and system
CN115622806B (en) Network intrusion detection method based on BERT-CGAN
Zheng Intrusion detection based on convolutional neural network
CN114254691A (en) Multi-channel operation wind control method based on active identification and intelligent monitoring
Hagar et al. Deep Learning for Improving Attack Detection System Using CSE-CICIDS2018
CN112507332A (en) Artificial intelligence network security attack flow retrieval method
CN116707992A (en) Malicious traffic avoidance detection method based on generation countermeasure network
Yerong et al. Intrusion detection based on support vector machine using heuristic genetic algorithm
Tun et al. Network anomaly detection using threshold-based sparse
Thanh et al. An approach to reduce data dimension in building effective network intrusion detection systems
CN113887633B (en) Malicious behavior identification method and system for closed source power industrial control system based on IL
CN113468555A (en) Method, system and device for identifying client access behavior
Amir et al. Efficient & Sustainable Intrusion Detection System Using Machine Learning & Deep Learning for IoT
Komadina et al. Detecting anomalies in firewall logs using artificially generated attacks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination