CN111556016B - Network flow abnormal behavior identification method based on automatic encoder - Google Patents

Network flow abnormal behavior identification method based on automatic encoder Download PDF

Info

Publication number
CN111556016B
CN111556016B CN202010217930.8A CN202010217930A CN111556016B CN 111556016 B CN111556016 B CN 111556016B CN 202010217930 A CN202010217930 A CN 202010217930A CN 111556016 B CN111556016 B CN 111556016B
Authority
CN
China
Prior art keywords
data
model
abnormal
saids
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010217930.8A
Other languages
Chinese (zh)
Other versions
CN111556016A (en
Inventor
蹇诗婕
姜波
卢志刚
刘玉岭
杜丹
刘宝旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Information Engineering of CAS
Original Assignee
Institute of Information Engineering of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Information Engineering of CAS filed Critical Institute of Information Engineering of CAS
Priority to CN202010217930.8A priority Critical patent/CN111556016B/en
Publication of CN111556016A publication Critical patent/CN111556016A/en
Application granted granted Critical
Publication of CN111556016B publication Critical patent/CN111556016B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Physics (AREA)
  • Computer Security & Cryptography (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Algebra (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention provides a network flow abnormal behavior identification method based on an automatic encoder, which belongs to the cross technical field of combination of machine learning and information safety, and aims to balance the category distribution of normal flow data and abnormal flow data in flow data by using a comprehensive few oversampling methods and combine the automatic encoder, so that nonlinear structure information can be effectively extracted from mass data, and abnormal behaviors in network flow can be identified.

Description

Network flow abnormal behavior identification method based on automatic encoder
Technical Field
The invention provides an effective network flow abnormal behavior identification method. The method combines a comprehensive few oversampling methods and an automatic encoder classification algorithm, and belongs to the cross technical field of combination of machine learning and information safety.
Background
With the rapid development of the information age, the internet has become an indispensable part of people's lives. However, the frequency of the attack behaviors in the network and the scale of the attack events are also increasing, and these attack behaviors not only cause huge economic loss, but also pose serious threats to social stability and national security, and maintaining the security of the network space has become a problem to be solved urgently. In order to better maintain the security of network space, ensure the availability of various network resources and prevent various attack behaviors, the intrusion detection technology as an active defense method becomes a hot problem of current research. The intrusion detection system is an active safety protection technology, can monitor the transmission behavior of data in a network, and sends out an alarm or interrupts an abnormal transmission behavior after finding suspicious transmission.
The concept of intrusion detection was first proposed by James Anderson in 1980 to monitor attack behavior. There are a lot of studies on the detection of network intrusion behaviors, and these works can be classified into misuse-based intrusion detection systems (MIDS) and anomaly-based intrusion detection systems (AIDS). The MIDS is also called an intrusion detection system based on signature, and detects attack behaviors according to the existing knowledge. Although the MIDS has higher accuracy and lower false alarm rate, it cannot detect unknown attacks that are not in the signature database. Unlike MIDS, AIDS can detect unknown attacks by comparing normal and abnormal behavior. Thus, AIDS is drawing increasing attention, the most important of which is the use of traditional feature-based machine learning methods, such as decision trees, random forests, na iotave bayes, etc. However, intrusion detection based on the conventional machine learning method usually emphasizes feature engineering, and is a shallow learning method. With the increase of massive high-dimensional data in a network and the increase of network bandwidth, the complexity of the data and the diversity of characteristics are continuously improved, and the purposes of analysis and prediction are difficult to achieve through shallow learning.
In recent years, deep neural network technology has enjoyed great success in the fields of image recognition, natural language processing, speech recognition, and the like. The deep neural network is a method for performing characterization learning on data, can learn the intrinsic rule of the data, and adapts to the requirements of high-dimensional learning and prediction by constructing a nonlinear network structure formed by a plurality of hidden layers. The current intrusion detection method based on deep learning also has a development prospect, comprises an automatic encoder, a deep belief network, a recurrent neural network, a convolutional neural network, a gated recurrent unit and the like, and achieves certain success. However, these deep learning methods for intrusion detection still have some problems.
For example, due to the category imbalance problem, many studies do not consider the overall distribution of traffic data, the decision function is biased towards most samples, low frequency attack samples are considered as noise and are ignored, so that the model is difficult to capture effective features, and low frequency attacks are difficult to detect. On the other hand, some studies do not process high-dimensional data when converting symbolic data into numerical data, which results in low training efficiency, memory space consumption and poor detection performance. Therefore, the efficiency and the accuracy of intrusion detection can be better improved by performing dimension reduction processing on the traffic data.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a novel deep neural network intrusion detection method, namely a network flow abnormal behavior identification method based on an automatic encoder, which uses a comprehensive few oversampling methods to balance the category distribution of normal flow data and abnormal flow data in flow data and combines the automatic encoder, thereby effectively extracting nonlinear structure information from mass data.
In order to achieve the purpose, the invention adopts the specific technical scheme that:
a network flow abnormal behavior identification method based on an automatic encoder comprises the following steps:
1) constructing a sparse abnormal intrusion detection model SAIDS by using an automatic encoder;
2) training the SAIDS model, comprising the following steps:
the SAIDS model preprocesses original training data, and balances the category distribution of normal flow and abnormal flow in flow data by adopting a comprehensive few oversampling method (SMOTE) to the preprocessed training data to obtain balanced data;
classifying normal flow and abnormal flow according to the balance data, calculating a loss value, and finding out a model parameter corresponding to the minimum loss value to obtain a trained SAIDS model;
3) detecting the network traffic to be identified by using the trained SAIDS model, wherein the steps comprise:
the SAIDS model preprocesses the network traffic to be identified, classifies the preprocessed network traffic into normal traffic and abnormal traffic, and identifies abnormal behaviors.
Further, the original training data carries class labels for normal traffic and abnormal traffic.
Further, the normalized data is obtained through preprocessing, wherein the preprocessing comprises converting the symbolic data into numerical data by using one-hot coding, and normalizing the numerical data.
Further, the normalization processing refers to the reduction of numerical data to the range of [0,1] by adopting a Min-Max normalization method.
Further, a linear interpolation is adopted in the comprehensive minority oversampling method, and new data are generated by multiplying the difference between the data samples in the minority class and a randomly selected nearest neighbor sample by a random number between 0 and 1 and then summing the difference and the data samples in the minority class.
Furthermore, the SAIDS model mainly comprises a discarding layer and an automatic encoder except a network structure which is responsible for preprocessing original data and obtaining balanced data by adopting an SMOTE method; the discarding layer preprocesses the balance data to prevent overfitting; the automatic encoder comprises an input layer, an encoding layer and a decoding layer, wherein the input layer receives preprocessed balance data, the encoding layer maps the balance data into low-dimensional features, the decoding layer reconstructs the low-dimensional features into input data, and the input data is classified into normal flow and abnormal flow.
Further, the discarding layer is to perform element product processing on the input balance data and the vector with the probability obeying the Bernoulli distribution.
Further, the SAIDS model selects the Relu activation function and the Adam optimizer when training, and calculates the loss value using the mean square error.
The reason why the invention chooses to integrate a few oversampling methods for balancing the class distribution of the traffic data is that it has the following advantages: (1) the under-sampling method obtains a balanced data set by deleting most types of data, and important data information may be lost, so that the over-sampling method generally has better processing effect and higher use frequency than the under-sampling method. (2) By integrating a few oversampling methods and adopting the theory of linear interpolation, the overfitting phenomenon is effectively reduced, and the limitation in the sampling process is reduced.
Due to the fact that the dimensionality of the data is too high, training efficiency is low, the needed storage space can be reduced by reducing the dimensionality of the data, the calculation speed is increased, redundant features are removed, and the data are better expressed. The traditional linear dimensionality reduction methods such as principal component analysis and the like are difficult to capture nonlinear information in data, and the kernel function-based nonlinear dimensionality reduction methods such as principal component analysis and the like are high in calculation complexity and difficult to apply to large-scale data sets. The automatic encoder is used as a dimension reduction method in deep learning, nonlinear structure information can be effectively extracted from mass data sets, and higher-level features can be obtained. Therefore, the invention adopts the automatic encoder algorithm to construct the intrusion detection system, thereby improving the detection capability of massive high-dimensional data.
Compared with the prior art, the invention has the following positive effects:
the invention performs experiments on a plurality of real network traffic data sets, and evaluates the performance of the model by using the overall accuracy, precision, recall rate and F1 value. Comprehensive experiment results show that the model provided by the invention is superior to the existing baseline recognition methods such as decision trees, random forests, gated neural networks and the like in performance.
Drawings
Fig. 1 is a flowchart of the entire method for identifying abnormal network traffic behavior according to this embodiment.
FIGS. 2A-2B are graphs of the NSL-KDD data set used in the present example; where fig. 2A is the original training data set and fig. 2B is the data set processed by the SMOTE method.
Fig. 3A-3B are distribution diagrams of UNSW-NB15 data sets used in the present embodiment, where fig. 3A is the original training data set and fig. 3B is the data set processed by the SMOTE method.
Fig. 4A-4B are cubic graphs of performance comparisons for the deep learning method, where fig. 4A is an evaluation case for the NSL-KDD dataset and fig. 4B is an evaluation case for the UNSW-NB15 dataset.
Detailed Description
In order to make the technical solutions in the embodiments of the present invention better understood and make the objects, features, and advantages of the present invention more comprehensible, the technical core of the present invention is described in further detail below with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The embodiment provides an effective network traffic abnormal behavior identification method. The method has the general idea that network traffic data is preprocessed firstly, the preprocessing comprises two parts of sign data digitization and numerical data normalization, then the distribution condition of the network traffic data is changed by using a comprehensive few oversampling methods, and a model is established by combining an automatic encoder method, so that the attack behavior in the network traffic data can be detected.
The overall flow chart of the method is shown in fig. 1, and the details of the steps of the method are described as follows:
(1) data pre-processing
The validation dataset used in the method is the NSL-KDD dataset and the UNSW-NB15 dataset. Specifically, the NSL-KDD dataset is a subset of the KDDCup1999 dataset, which is provided by the United states defense advanced research planning agency and contains weeks of attack data that can be used to evaluate intrusion detection performance. However, the KDDCup1999 data set has the problems of redundant records, more preference of a classifier for frequent records and the like. To solve these problems. The NSL-KDD data set effectively solves the problems of redundant features and repeated recording of a KDDCup1999 data set, and the training data set and the testing data set are reasonable in quantity. The NSL-KDD data contains TCP/IP connection records, each record containing 41 features, such as basic features, content features, and traffic features, as well as a category label and a difficulty label. The UNSW-NB15 data set was created in the network wide laboratory of the australian network security center, collecting 16 hours of data on 22 days 1 month 2015 and 15 hours of data on 17 days 2 month 2015, with contemporary actual network traffic, containing more comprehensive attack activity. The UNSW-NB15 dataset contains 49 features, including traffic features, basic features, content features, temporal features, etc.
In order to remove redundant data, the detection efficiency is improved, and the time consumption is reduced. The method carries out data preprocessing on the network flow data and comprises two parts of sign characteristic numeralization and numerical data normalization.
And (3) symbol characteristic numeralization: the symbol feature data is usually contained in the intrusion detection data set, and the symbol data is difficult to be directly processed by the model, so that the symbol data is converted into digital data by using a one-hot encoder in the step. For example, the protocol _ type feature in the NSL-KDD dataset contains three characters, TCP, UDP and ICMP respectively. Mapping the three characters into 3 binary vectors through one-hot coding, wherein the mapping results are as follows: [1,0,0], [0,1,0], and [0,0,1 ]. In this way, all symbol characteristics are mapped by one-hot encoding. For the category label, the normal traffic data in the dataset is labeled as 0 and the abnormal traffic data is labeled as 1.
Normalization of numerical data: the data normalization can solve the problem that the dimensionality of different characteristic data is greatly different, and therefore the data normalization method is widely used in a data preprocessing step. In order to ensure the reliability of the detection result, the normalization processing needs to be performed on the numerical data in the two data sets, wherein the normalization refers to reducing all the characteristic data to [0,1]]Within the range. The method aims to adopt a Min-Max normalization method to process data, and the conversion formula is as follows:
Figure BDA0002425030130000041
where x represents the attribute value of a feature, xmaxMaximum value, x, representing such characteristic propertyminRepresents the minimum value of such characteristic attribute, and x' represents the result of normalizing x.
(2) Changing data class distribution
In network traffic data, the abnormal traffic data is usually much smaller than the normal traffic data, resulting in the decision function being biased towards most samples, and the low frequency attack samples are considered as noise and ignored. Therefore, in order to improve the detection performance of the model, data with a small amount of data needs to be processed. There are generally two processing methods, namely, a solution at an algorithm level and a solution at a data level, wherein the solution at the algorithm level is usually to modify a classifier algorithm or optimize the performance of a learning algorithm, and the distribution of data of different categories is adjusted by adjusting the importance of the categories in the learning or decision making process. Common data-level solutions include under-sampling and over-sampling methods, which balance the distribution of data classes by sampling. Undersampling balances the class distribution by reducing the amount of data for the majority of classes and oversampling balances the class distribution by increasing the amount of data for the minority of classes.
Because the under-sampling method obtains the balanced data set by deleting most types of data, important data information may be lost, and therefore, the over-sampling method generally has a better processing effect and a higher use frequency than the under-sampling method, and the method adopts a famous over-sampling method, namely a comprehensive few over-Sampling Method (SMOTE), to process data with a small data volume, thereby improving the detection performance. SMOTE is a method for randomly generating new samples between a few class samples and their neighbors, and improves class distribution by adding few class data. The SMOTE method adopts the theory of linear interpolation, effectively reduces the overfitting phenomenon and reduces the limitation in the sampling process. The formula for generating synthetic data by the SMOTE method is as follows:
ynew=yi+(yi-yj) X delta formula (1)
Wherein, ynewRepresenting newly generated synthetic data, yiRepresenting data samples in a few categories, yjRepresents from yiAnd δ represents a random number between 0 and 1. The distribution of the NSL-KDD raw training data set and the distribution of the training data set after processing using the SMOTE method are shown in fig. 2A-2B. The distribution of UNSW-NB15 raw training data sets and the distribution of training data sets processed using the SMOTE method are shown in fig. 3A-3B.
(3) Model training
Due to the fact that the dimensionality of the data is too high, training efficiency is low, the needed storage space can be reduced by reducing the dimensionality of the data, the calculation speed is increased, redundant features are removed, and the data are better expressed. The traditional linear dimensionality reduction methods such as principal component analysis and the like are difficult to capture nonlinear information in data, and the kernel function-based nonlinear dimensionality reduction methods such as principal component analysis and the like are high in calculation complexity and difficult to apply to large-scale data sets. The automatic encoder is used as a dimension reduction method in deep learning, nonlinear structure information can be effectively extracted from mass data sets, and higher-level features can be obtained. Therefore, the automatic encoder is very suitable for the tasks of dimension reduction and feature learning, and the intrusion detection system is constructed by adopting the automatic encoder algorithm, so that the detection capability of massive high-dimensional data is improved.
The automatic encoder is a three-layer neural network comprising an input layer, an encoding layer and a decoding layer, and is an unsupervised learning structure consisting of an encoder and a decoder. After data preprocessing and data category distribution processing, an automatic encoder is used for carrying out dimension reduction on training data and training a model. Specifically, in order to avoid overfitting, the discarding layer is added to preprocess the balance data, so that the overfitting phenomenon is prevented, and the balance data preprocessed by the discarding layer is used as the input of the automatic encoder. For an auto-encoder, the encoder maps the input data to low-dimensional features, and the decoder reconstructs the input data using the mapped low-dimensional features. Through reconstruction input, the hidden layer can learn the characterization information of the input data, the characteristic dimensionality of the data set can be effectively reduced, and the integrity of the characteristic information is guaranteed. The formula for the discarded layer is as follows:
r to Bernoulli (p) formula (2)
λ (n) ═ r ═ α (n) formula (3)
Where r is an independent and uniformly distributed vector, obeying a bernoulli distribution with probability p, whose shape is the same as α (n), representing the product of the elements, α (n) representing the output of the current layer, and λ (n) representing the output of the discarded layer.
(4) Abnormal behavior detection
Finally, the Relu activation function and the Adam optimizer are selected and the loss value is calculated using the mean square error. The formula for the mean square error is as follows:
Figure BDA0002425030130000061
wherein the content of the first and second substances,
Figure BDA0002425030130000062
is the true value of the ith data,
Figure BDA0002425030130000063
is the predicted value of the ith data, and m is the total number of samples of the test data. After the models are trained, the best training model with the minimum loss value is selected to classify the test data, and the detection result is evaluated by combining with the evaluation index.
(5) Comparison of results
The invention performs experiments on a plurality of real network traffic data sets, and evaluates the performance of the model by using the overall accuracy, precision, recall rate and F1 value. In order to verify the effectiveness of the proposed method (SMOTE + AE), the invention carries out comparison experiments on both the machine learning method and the deep learning method. Seven commonly used machine learning methods are used for comparison, which are respectively as follows: decision tree, random forest, gaussian naive bayes, polynomial naive bayes, bernoulli naive bayes, Adaboost, extreme gradient boost algorithms. Baseline comparison experiments for the deep learning method were gated-round network (GRU), SMOTE + gated-round network, sparse auto-encoder (SAE), SMOTE + sparse auto-encoder and auto-encoder.
1) Performance comparison with machine learning methods
The results of this experiment comparing the performance of the NSL-KDD dataset with the UNSW-NB15 dataset are shown in Table 1. As can be seen from table 1, for the machine learning method, the detection result based on the tree method is generally better than the detection result based on the probability method because the naive bayes method is difficult to process the features having correlation. The detection capability of the ensemble learning method is superior to that of a single classifier because it is difficult for a single classifier to fully summarize the features of a particular data set, and the ensemble learning method can capture more information. As can also be seen from table 1, the accuracy of the SAIDS method on the two data sets was 91.08% and 97.58%, respectively, which is superior to the traditional machine learning method, demonstrating the effectiveness of the method herein. This is due to the fact that the shallow learning method does not handle unbalanced flow data, resulting in a model biased towards the normal category with a large number of samples. As the complexity of data increases, the learning ability of the shallow learning method is limited.
TABLE 1 comparison of Performance in NSL-KDD and UNSW-NB15 data set machine learning methods
Figure BDA0002425030130000071
2) Performance comparison with deep learning methods
The deep learning method can learn the intrinsic rules of the data, so that the method is more suitable for fitting and predicting the network traffic data. The results of comparing the SAIDS model proposed by the present invention with the most advanced deep learning model are shown in FIGS. 4A-4B. It can be seen from the figure that the SAIDS performance proposed by the present invention is superior to other five methods, and can effectively detect network intrusion data. The detection performance of the model processed using the SMOTE method is generally better than that of the model processed without the SMOTE method, which demonstrates the importance of balancing the data class distribution. The invention considers the processing of unbalanced data and redundant characteristics at the same time, and the detection result is superior to that of a single gated neural network and a sparse automatic encoder.
The performance comparison result with the machine learning and deep learning method shows that the SAIDS model provided by the invention has better prediction accuracy rate for the detection of network traffic and has the potential of practical application.
The above-mentioned embodiments only express the embodiments of the present invention, and the description thereof is specific, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent should be subject to the appended claims.

Claims (6)

1. A network flow abnormal behavior identification method based on an automatic encoder comprises the following steps:
1) constructing a sparse abnormal intrusion detection model SAIDS by using an automatic encoder;
2) training the SAIDS model, comprising the following steps:
preprocessing original training data by the SAIDS model, wherein the original training data comprises an NSL-KDD data set and an UNSW-NB15 data set, and acquiring standardized data through preprocessing, wherein the preprocessing comprises converting symbolic data into numerical data by using one-hot coding and normalizing the numerical data;
balancing the category distribution of normal flow and abnormal flow in the flow data by adopting a comprehensive few oversampling methods for the preprocessed training data to obtain balanced data;
classifying normal flow and abnormal flow according to the balance data, calculating a loss value, and finding out a model parameter corresponding to the minimum loss value to obtain a trained SAIDS model;
3) detecting the network traffic to be identified by using the trained SAIDS model, wherein the steps comprise:
the SAIDS model preprocesses the network traffic to be identified, classifies the preprocessed network traffic into normal traffic and abnormal traffic, and identifies abnormal behavior;
the SAIDS model includes a discard layer and an auto-encoder; the discarding layer preprocesses the balance data to prevent overfitting; the automatic encoder comprises an input layer, an encoding layer and a decoding layer, wherein the input layer receives preprocessed balance data, the encoding layer maps the balance data into low-dimensional features, the decoding layer reconstructs the low-dimensional features into input data, and the input data is classified into normal flow and abnormal flow; the SAIDS model is trained by selecting a Relu activation function and an Adam optimizer and using the mean square error to calculate the loss value.
2. The method of claim 1, wherein the raw training data carries class labels for normal traffic and abnormal traffic.
3. The method of claim 1, wherein the normalization process reduces the numerical data to a range of [0,1 ].
4. The method of claim 3, wherein the normalization is performed using a Min-Max normalization method.
5. The method of claim 1, wherein the integrated minority over-sampling method uses linear interpolation, and new data is generated by multiplying a difference between data samples in the minority class and a randomly selected nearest neighbor sample by a random number between 0 and 1, and summing the difference and the data samples in the minority class.
6. The method of claim 1, wherein the discarding layer is an elemental product of the input balance data and a vector having a probability that obeys a bernoulli distribution.
CN202010217930.8A 2020-03-25 2020-03-25 Network flow abnormal behavior identification method based on automatic encoder Active CN111556016B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010217930.8A CN111556016B (en) 2020-03-25 2020-03-25 Network flow abnormal behavior identification method based on automatic encoder

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010217930.8A CN111556016B (en) 2020-03-25 2020-03-25 Network flow abnormal behavior identification method based on automatic encoder

Publications (2)

Publication Number Publication Date
CN111556016A CN111556016A (en) 2020-08-18
CN111556016B true CN111556016B (en) 2021-02-26

Family

ID=72003821

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010217930.8A Active CN111556016B (en) 2020-03-25 2020-03-25 Network flow abnormal behavior identification method based on automatic encoder

Country Status (1)

Country Link
CN (1) CN111556016B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112039903B (en) * 2020-09-03 2022-03-08 中国民航大学 Network security situation assessment method based on deep self-coding neural network model
CN112165464B (en) * 2020-09-15 2021-11-02 江南大学 Industrial control hybrid intrusion detection method based on deep learning
CN112134875B (en) * 2020-09-18 2022-04-05 国网山东省电力公司青岛供电公司 IoT network abnormal flow detection method and system
CN112702329B (en) * 2020-12-21 2023-04-07 四川虹微技术有限公司 Traffic data anomaly detection method and device and storage medium
CN113158076B (en) * 2021-04-05 2022-07-22 北京工业大学 Social robot detection method based on variational self-coding and K-nearest neighbor combination

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107103332A (en) * 2017-04-07 2017-08-29 武汉理工大学 A kind of Method Using Relevance Vector Machine sorting technique towards large-scale dataset

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2555192B (en) * 2016-08-02 2021-11-24 Invincea Inc Methods and apparatus for detecting and identifying malware by mapping feature data into a semantic space
CN106973057B (en) * 2017-03-31 2018-12-14 浙江大学 A kind of classification method suitable for intrusion detection
CN108768946B (en) * 2018-04-27 2020-12-22 中山大学 Network intrusion detection method based on random forest algorithm
CN109344888A (en) * 2018-09-19 2019-02-15 广东工业大学 A kind of image-recognizing method based on convolutional neural networks, device and equipment
CN110163261A (en) * 2019-04-28 2019-08-23 平安科技(深圳)有限公司 Unbalanced data disaggregated model training method, device, equipment and storage medium

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107103332A (en) * 2017-04-07 2017-08-29 武汉理工大学 A kind of Method Using Relevance Vector Machine sorting technique towards large-scale dataset

Also Published As

Publication number Publication date
CN111556016A (en) 2020-08-18

Similar Documents

Publication Publication Date Title
CN111556016B (en) Network flow abnormal behavior identification method based on automatic encoder
CN109768985B (en) Intrusion detection method based on flow visualization and machine learning algorithm
CN110213222B (en) Network intrusion detection method based on machine learning
CN111833172A (en) Consumption credit fraud detection method and system based on isolated forest
CN110460605B (en) Abnormal network flow detection method based on automatic coding
CN111695597B (en) Credit fraud group identification method and system based on improved isolated forest algorithm
CN110348486A (en) Based on sampling and feature brief non-equilibrium data collection conversion method and system
CN110377605B (en) Sensitive attribute identification and classification method for structured data
CN114553545A (en) Intrusion flow detection and identification method and system
CN112134862B (en) Coarse-fine granularity hybrid network anomaly detection method and device based on machine learning
CN113242207A (en) Iterative clustering network flow abnormity detection method
CN112529638B (en) Service demand dynamic prediction method and system based on user classification and deep learning
CN112348080A (en) RBF improvement method, device and equipment based on industrial control abnormity detection
CN111507385A (en) Extensible network attack behavior classification method
CN117478390A (en) Network intrusion detection method based on improved density peak clustering algorithm
CN112365060A (en) Preprocessing method for power grid internet of things perception data
CN113098862A (en) Intrusion detection method based on combination of hybrid sampling and expansion convolution
Soheily-Khah et al. Intrusion detection in network systems through hybrid supervised and unsupervised mining process-a detailed case study on the ISCX benchmark dataset
CN112508726A (en) False public opinion identification system based on information spreading characteristics and processing method thereof
CN111797997A (en) Network intrusion detection method, model construction method, device and electronic equipment
Acharya et al. Efficacy of CNN-bidirectional LSTM hybrid model for network-based anomaly detection
CN116633589A (en) Malicious account detection method, device and storage medium in social network
CN116170187A (en) Industrial Internet intrusion monitoring method based on CNN and LSTM fusion network
CN115982722A (en) Vulnerability classification detection method based on decision tree
CN115842645A (en) UMAP-RF-based network attack traffic detection method and device and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant