CN111314353A - Network intrusion detection method and system based on hybrid sampling - Google Patents

Network intrusion detection method and system based on hybrid sampling Download PDF

Info

Publication number
CN111314353A
CN111314353A CN202010103246.7A CN202010103246A CN111314353A CN 111314353 A CN111314353 A CN 111314353A CN 202010103246 A CN202010103246 A CN 202010103246A CN 111314353 A CN111314353 A CN 111314353A
Authority
CN
China
Prior art keywords
samples
sampling
sample
network intrusion
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010103246.7A
Other languages
Chinese (zh)
Other versions
CN111314353B (en
Inventor
熊炫睿
陈高升
熊炼
张媛
程占伟
付明凯
刘敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN202010103246.7A priority Critical patent/CN111314353B/en
Publication of CN111314353A publication Critical patent/CN111314353A/en
Application granted granted Critical
Publication of CN111314353B publication Critical patent/CN111314353B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Hardware Design (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention relates to the technical field of network intrusion detection, in particular to a network intrusion detection method and a system based on mixed sampling, wherein the method comprises the steps of converting symbol attributes in network intrusion historical data set into digital attributes; normalizing the network intrusion history data set to an interval [0,1 ]; sampling a network intrusion historical data set by using a hybrid sampling algorithm to obtain a training set with balanced each category; training a BP neural network classifier by using the obtained training set; inputting real-time network intrusion data into a trained BP neural network classifier, and outputting the category of the real-time network intrusion data by the BP neural network classifier; the invention reduces the abandonment of most samples, thereby reducing the loss of valuable information for constructing the classifier; compared with the intrusion detection technology based on SMOTE oversampling, the method reduces the noise introduced when a few new samples are generated, so that the algorithm has better classification performance on unbalanced data.

Description

Network intrusion detection method and system based on hybrid sampling
Technical Field
The invention relates to the technical field of network intrusion detection, in particular to a network intrusion detection method and a system based on mixed sampling.
Background
Machine learning methods have been increasingly applied in recent years to network intrusion detection, which is treated as a classification problem. In network attacks, some attack types frequently occur, and the occurrence frequency of some attack types is low, so intrusion detection is a typical application scenario with unbalanced data, and machine learning has a good classification effect on most types of intrusion samples but has a poor classification effect on few types of intrusion samples when processing unbalanced data, but is also important for detecting few types of intrusion samples. The existing network intrusion detection system processing unbalanced data method comprises a network intrusion detection technology based on an oversampling SMOTE algorithm and a network intrusion detection technology based on a clustering algorithm undersampling.
Yan 26170, Hao, Korea and the like use the improved SMOTE algorithm to generate a few new samples, increase the number of the few samples, and train a deep circulation neural network classifier on the generated balance data set for network intrusion detection. An intrusion detection method of an SMOTE algorithm fusing the density of the maximum dissimilarity coefficient, which is proposed by chenhong, xiaoyue, xiaojiulong and the like, is a network intrusion detection method based on the SMOTE algorithm of the density of the maximum dissimilarity coefficient, a deep belief network and a gradient boosting decision tree, the SMOTE algorithm of the density of the maximum dissimilarity coefficient is used for carrying out oversampling on a small number of samples, and then a gradient boosting decision tree classifier is trained on a preprocessed balanced data set. Anomaly detection based on SMOTE and deep belief networks proposed by Shenshuli, Shuzewain, et al, uses the SMOTE algorithm to add a small number of classes of samples, and then trains a deep belief network classifier on the generated balanced dataset.
However, when dealing with extremely unbalanced data classification, the simple SMOTE oversampling algorithm introduces too much noise due to the generation of a large number of new samples of a small number of classes, thereby degrading the classification performance.
"Improving Detection accuracy for Improving Network Intrusion Detection using Cluster-based Under-sampling with Random forms", proposed by Miah M O, Khan S, Shatabda S, etc., a clustering-based Under-sampling method is used to reduce most of the samples, and then a Random forest classifier is used to perform Network Intrusion Detection. In the Multi-level hybrid supported vector machine and algorithm based modified K-means for intrusion detection system proposed by Al-Yaseen W L, Othman Z A, Nazri M Z A, etc., an abstract smaller data set is generated by using an improved K-means clustering algorithm, the degree of category imbalance is reduced to a certain extent, and then SVM and ELM are used for network intrusion detection.
However, after the majority of classes are clustered by these network intrusion detection techniques based on the clustering algorithm undersampling, the samples are selected on a cluster basis, and the information of all the samples in the cluster is not considered, which may result in that the selected majority of classes of samples are not representative enough.
Disclosure of Invention
Aiming at the problems that when the existing network intrusion detection technology based on machine learning processes extremely unbalanced intrusion data, data is balanced, a large amount of most samples need to be reduced by a simple undersampling method, a large amount of potential information which has important value for constructing a classifier is lost, and a large amount of new samples of a few classes need to be generated by a simple SMOTE algorithm, so that serious noise is caused, the invention provides a network intrusion detection method and a system based on mixed sampling, wherein the method is shown in figure 1, and specifically comprises the following steps:
s1, converting the symbolic attributes in the network intrusion history data set into digital attributes;
s2, normalizing the network intrusion historical data set to an interval [0,1 ];
s3, sampling the network intrusion historical data set by using a hybrid sampling algorithm to obtain a training set with balanced each category;
s4, training a BP neural network classifier by using the obtained training set;
and S5, inputting the real-time network intrusion data into the trained BP neural network classifier, and outputting the category of the real-time network intrusion data by the BP neural network classifier.
Further, the process of sampling the network intrusion history data set by using the hybrid sampling algorithm and training the BP neural network classifier comprises the following steps:
s101, dividing network intrusion attacks with the number of samples larger than the balanced sampling number m in historical data containing N types of intrusion attacks into a plurality of types, and otherwise, dividing the network intrusion attacks into a non-plurality type, wherein the non-plurality type comprises a few types with the number of samples smaller than m and types with the number of samples equal to m;
s102, oversampling is carried out on each minority sample set by using SMOTE, and the minority sample number is close to the balance sample number m;
s103, clustering all the class sample sets by using K-means respectively, generating z clusters for each class, extracting representative samples of the clusters from each cluster without replacing the representative samples, and extracting N x z samples as an initial balanced sample set;
s104, training an initial BP neural network classifier by using an initial balance sample set, and setting the iteration number T of sampling to make T equal to 1;
s105, extracting z samples in each majority sample set without replacement by using undersampling based on the average classification error rate of the samples in the clusters;
s106, randomly extracting z samples from each non-majority type of residual sample data set without returning, and adding the samples to a balanced sample set;
s107, training the balance sample set and training the BP neural network classifier again;
and S108, judging whether T is equal to T-1, if so, ending iterative output of the trained BP neural network classifier, and otherwise, making T equal to T +1 and returning to S105.
Further, the process of using undersampling based on the average classification error rate of the samples in the cluster for the majority of samples comprises:
clustering samples which are not sampled into the balance sample set in a plurality of classes by using K-means again, and generating m clusters in each class;
calculating the average classification error rate of each cluster, extracting samples represented by the respective clusters from the z clusters with the maximum average classification error rate, adding the samples to the balanced sample set and deleting the samples from the plurality of clusters which are not sampled to the balanced sample set.
The invention provides a network intrusion detection system based on hybrid sampling, which comprises a historical data storage module, an attribute conversion module, a normalization module, a sampling module, a BP neural network classifier training module and a real-time prediction module, wherein:
the historical data storage module is used for storing the classified network intrusion data;
the attribute conversion module is used for converting the symbol attribute in the network intrusion data into a digital attribute;
the normalization module is used for normalizing the network intrusion data subjected to attribute conversion into intervals;
the sampling module is used for sampling the network historical data to ensure the data volume balance of the training data;
the BP neural network classifier training module is used for training the BP neural network according to training data to obtain a BP neural network classifier;
and the real-time prediction module is used for inputting real-time network intrusion data into the BP neural network classifier to obtain the type of the network intrusion.
Further, the sampling module comprises a data classification unit, a minority class sampling unit, a sample primary selection unit and a majority class sampling unit, wherein:
the data classification module is used for classifying the attack types in the historical data into a majority type and a non-majority type according to the balanced sampling number m, wherein the non-majority type comprises network intrusion attack types with the sample number smaller than m and network intrusion attack types equal to m;
the minority class sampling unit is used for oversampling by using SMOTE and enabling the minority class sample number to be close to the balance sampling number m;
the system comprises a sample primary selection unit, a network intrusion attack detection unit and a network intrusion attack detection unit, wherein the sample primary selection unit is used for clustering by using K-means so that each network intrusion attack type generates z clusters, representative samples of the clusters are extracted from each cluster without being replaced, and N x z samples are extracted as an initial balanced sample set;
and the majority type sampling unit is used for clustering samples which are not selected by the sample primary selection unit in the majority type again by using K-means, generating m clusters in each type, calculating the average classification error rate of each cluster, and extracting the representative points of the clusters from the z clusters with the maximum average classification error rate without replacing the representative points.
On the basis of converting an extremely unbalanced data set into a balanced data set, compared with an intrusion detection technology based on clustering undersampling, the technology reduces the abandonment of most samples, thereby reducing the loss of valuable information for constructing a classifier, and obtains the classified total information of all samples in a cluster according to the average classified error rate of the samples in the cluster, so as to select more representative most samples.
Drawings
Fig. 1 is a schematic flow chart of a network intrusion detection method based on hybrid sampling according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention provides a network intrusion detection method based on mixed sampling, which specifically comprises the following steps:
s1, converting the symbolic attributes in the network intrusion history data set into digital attributes;
s2, normalizing the network intrusion historical data set to an interval [0,1 ];
s3, sampling the network intrusion historical data set by using a hybrid sampling algorithm to obtain a training set with balanced each category;
s4, training a BP neural network classifier by using the obtained training set;
and S5, inputting the real-time network intrusion data into the trained BP neural network classifier, and outputting the category of the real-time network intrusion data by the BP neural network classifier.
In the invention, the process of sampling the network intrusion historical data set by using a hybrid sampling algorithm and training a BP neural network classifier comprises the following steps:
s101, dividing network intrusion attacks with the number of samples larger than the balanced sampling number m in historical data containing N types of intrusion attacks into a plurality of types, and otherwise, dividing the network intrusion attacks into a non-plurality type, wherein the non-plurality type comprises a few types with the number of samples smaller than m and types with the number of samples equal to m;
s102, oversampling is carried out on each minority sample set by using SMOTE, and the minority sample number is close to the balance sample number m;
s103, clustering all the class sample sets by using K-means respectively, generating z clusters for each class, extracting representative samples of the clusters from each cluster without replacing the representative samples, and extracting N x z samples as an initial balanced sample set;
s104, training an initial BP neural network classifier by using an initial balance sample set, and setting the iteration number T of the BP neural network classifier to make T equal to 1;
s105, undersampling based on the average classification error rate of the samples in the clusters is used for most samples;
s106, randomly extracting z samples from each non-majority type of residual sample data set without returning, and adding the samples to a balanced sample set;
s107, training the balance sample set and training the BP neural network classifier again;
and S108, judging whether T is equal to T-1, if so, ending iterative output of the trained BP neural network classifier, and otherwise, making T equal to T +1 and returning to S105.
In this embodiment, oversampling is performed using SMOTE for each minority class sample set, and the process of setting the oversampling magnification of the minority class i may be expressed as:
Figure BDA0002387577540000061
wherein the content of the first and second substances,
Figure BDA0002387577540000062
sampling multiplying power for oversampling by using SMOTE for a minority class i; siFor the sample set of the i-th type intrusion attack, | SiI represents a sample set SiThe number of samples in (c).
Clustering all class sample sets respectively using K-means, each class generating z clusters, expressed as:
Figure BDA0002387577540000063
preferably, in this embodiment, the balance sampling number m is a number between the sample number of the category with the smallest number of network intrusion attack type samples in the historical data and the sample number of the category with the largest number of network intrusion attack type samples in the historical data.
In this embodiment, the process of using undersampling based on the average classification error rate of the samples in the cluster for most types of samples includes:
clustering samples which are not sampled into the balance sample set in a plurality of classes by using K-means again, and generating m clusters in each class;
calculating the average classification error rate of each cluster, extracting representative points of the respective clusters from the z clusters with the maximum average classification error rate, adding samples to the balanced sample set and deleting the samples from the plurality of clusters which are not sampled to the balanced sample set.
Taking the sample closest to the cluster center in each cluster as a representative of the cluster, and giving a classifier f and a cluster C with known sample labels, the average classification error rate V (C) of the samples in the cluster C is defined as:
Figure BDA0002387577540000071
wherein V (C) represents the average classification error rate of samples within cluster C; x is the number ofjRepresents the jth sample within cluster C; i represents an indication function, if the input is true, 1 is returned, otherwise, 0 is returned; y isjIs the true label of sample j; f (x)j) A prediction label for classifier f for sample j.
V (C) includes the general information of the classification of all samples in the cluster, the larger v (C), the higher the average classification error rate of the classifier on all samples in the cluster C, which indicates that the classifier lacks sufficient information of the cluster, and the representative point of the cluster closest to the center of the cluster can provide a large amount of information of the cluster for the classifier, and the classifier needs to learn the representative point of the cluster to improve the performance of the classifier. Conversely, if V (C) is smaller, the classifier classifies the samples in the cluster C with higher precision, indicating that the classifier already has enough information for the cluster.
Specifically, the embodiment uses the common data set KDD99 in the network intrusion detection application, which includes 5 categories, Normal and 4 attacks, Dos, Probe, U2R and R2L, where the number of samples of the data set and the maximum imbalance are shown in table 1, the maximum imbalance is defined as the ratio of the number of samples of the class with the largest number of samples to the number of samples of the class with the smallest number of samples, and represents the imbalance degree of the data set, the class with the largest number of samples in the KDD99 data set is Dos, the class with the smallest number of samples is U2R, and the maximum imbalance degree of the data set is very large and belongs to the extremely unbalanced data set.
TABLE 1
Figure BDA0002387577540000072
The parameter settings in the present invention are shown in table 2.
TABLE 2
Figure BDA0002387577540000081
In specific implementation, firstly, converting the symbolic attributes in the training set in KDD99 into digital attributes;
normalizing the training set in KDD99 to the interval [0,1 ];
sampling the training set in KDD99 by using the hybrid sampling algorithm proposed herein to obtain various balanced new training sets;
training the neural network with a new training set;
and inputting intrusion data on line, and outputting the intrusion type by the neural network.
The invention can feed back the history data which is successfully judged to the system as training data.
The invention provides a network intrusion detection system based on mixed sampling, which comprises a historical data storage module, an attribute conversion module, a normalization module, a sampling module, a BP neural network classifier training module and a real-time prediction module, wherein:
the historical data storage module is used for storing the classified network intrusion data;
the attribute conversion module is used for converting the symbol attribute in the network intrusion data into a digital attribute;
the normalization module is used for normalizing the network intrusion data subjected to attribute conversion into intervals;
the sampling module is used for sampling the network historical data to ensure the data volume balance of the training data;
the BP neural network classifier training module is used for training the BP neural network according to training data to obtain a BP neural network classifier;
and the real-time prediction module is used for inputting real-time network intrusion data into the BP neural network classifier to obtain the type of the network intrusion.
Further, the sampling module comprises a data classification unit, a minority class sampling unit, a sample primary selection unit and a majority class sampling unit, wherein:
the data classification module is used for classifying the attack types in the historical data into a majority type and a non-majority type according to the balanced sampling number m, wherein the non-majority type comprises network intrusion attack types with the sample number smaller than m and network intrusion attack types equal to m;
the minority class sampling unit is used for oversampling by using SMOTE and enabling the minority class sample number to be close to the balance sampling number m;
the system comprises a sample primary selection unit, a network intrusion attack detection unit and a network intrusion attack detection unit, wherein the sample primary selection unit is used for clustering by using K-means so that each network intrusion attack type generates z clusters, representative samples of the clusters are extracted from each cluster without being replaced, and N x z samples are extracted as an initial balanced sample set;
the majority sampling unit is used for clustering samples which are not selected by the sample primary selection unit again by using K-means in the majority, generating m clusters in each class, calculating the average classification error rate of each cluster, and extracting the representative points of the clusters from the z clusters with the maximum average classification error rate without replacing the representative points, preferably, the representative points of the clusters in the embodiment are samples which are closest to the center of the clusters.
Further, the BP neural network classifier training module trains an initial BP neural network classifier according to the samples selected by the sample primary selection unit, iteration times are set after training is completed, a majority type sampling unit is called in each iteration to select new samples in a majority type to be added into a sample set, the BP neural network classifier is trained until the set iteration times are reached, and the trained BP neural network classifier is output.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (9)

1. A network intrusion detection method based on mixed sampling is characterized by comprising the following steps:
s1, converting the symbolic attributes in the network intrusion history data set into digital attributes;
s2, normalizing the network intrusion historical data set to an interval [0,1 ];
s3, sampling the network intrusion historical data set by using a hybrid sampling algorithm to obtain a training set with balanced each category;
s4, training a BP neural network classifier by using the obtained training set;
and S5, inputting the real-time network intrusion data into the trained BP neural network classifier, and outputting the category of the real-time network intrusion data by the BP neural network classifier.
2. The method according to claim 1, wherein the steps of sampling the network intrusion history data set by using the hybrid sampling algorithm and training the BP neural network classifier comprise:
s101, setting a balance sampling number m, and dividing network intrusion attacks with the sample number larger than the balance sampling number m in historical data containing N types of intrusion attacks into a plurality of types, wherein the network intrusion attacks are not in the plurality of types, and the non-plurality types comprise a few types with the sample number smaller than m and types with the sample number equal to m;
s102, oversampling is carried out on each minority sample set by using SMOTE, and the minority sample number is close to the balance sample number m;
s103, clustering all the class sample sets by using K-means respectively, generating z clusters for each class, extracting representative samples of the clusters from each cluster without replacing the representative samples, and extracting N x z samples as an initial balanced sample set;
s104, training an initial BP neural network classifier by using an initial balance sample set, and setting the iteration number T of the BP neural network classifier to make T equal to 1;
s105, undersampling based on the average classification error rate of the samples in the clusters is used for most samples;
s106, randomly extracting z samples from each non-majority type of residual sample data set without returning, and adding the samples to a balanced sample set;
s107, training the balance sample set and training the BP neural network classifier again;
and S108, judging whether T is equal to T-1, if so, ending iterative output of the trained BP neural network classifier, and otherwise, making T equal to T +1 and returning to S105.
3. The method according to claim 2, wherein the sampling rate for oversampling for each minority sample set using SMOTE is expressed as:
Figure FDA0002387577530000021
wherein the content of the first and second substances,
Figure FDA0002387577530000022
sampling multiplying power for oversampling by using SMOTE for a minority class i; siFor the sample set of the i-th type intrusion attack, | SiI represents a sample set SiThe number of samples in (c).
4. The method of claim 2, wherein the step of using the undersampling based on the average classification error rate of the samples in the cluster for the majority of the samples comprises:
clustering samples which are not sampled into the balance sample set in a plurality of classes by using K-means again, and generating m clusters in each class;
calculating the average classification error rate of each cluster, extracting samples represented by the respective clusters from the z clusters with the maximum average classification error rate, adding the samples to the balanced sample set and deleting the samples from the plurality of clusters which are not sampled to the balanced sample set.
5. The method of claim 4, wherein the samples represented by the clusters are the samples closest to the cluster center in each cluster.
6. The method of claim 4, wherein the average classification error rate of the samples in the cluster is expressed as:
Figure FDA0002387577530000023
wherein V (C) represents the average classification error rate of samples within cluster C; x is the number ofjRepresents the jth sample within cluster C; i represents an indication function, if the input is true, 1 is returned, otherwise, 0 is returned; y isjIs the true label of sample j; f (x)j) A prediction label for classifier f for sample j.
7. The utility model provides a network intrusion detection system based on mixed sampling which characterized in that, includes historical data storage module, attribute conversion module, normalization module, sampling module, BP neural network classifier training module piece and real-time prediction module, wherein:
the historical data storage module is used for storing the classified network intrusion data;
the attribute conversion module is used for converting the symbol attribute in the network intrusion data into a digital attribute;
the normalization module is used for normalizing the network intrusion data subjected to attribute conversion into intervals;
the sampling module is used for sampling the network historical data to ensure the data volume balance of the training data;
the BP neural network classifier training module is used for training the BP neural network according to training data to obtain a BP neural network classifier;
and the real-time prediction module is used for inputting real-time network intrusion data into the BP neural network classifier to obtain the type of the network intrusion.
8. The system according to claim 7, wherein the sampling module comprises a data classification unit, a minority sampling unit, a sample primary selection unit, and a majority sampling unit, wherein:
the data classification module is used for classifying the attack types in the historical data into a majority type and a non-majority type according to the balanced sampling number m, wherein the non-majority type comprises network intrusion attack types with the sample number smaller than m and network intrusion attack types equal to m;
the minority class sampling unit is used for oversampling by using SMOTE and enabling the minority class sample number to be close to the balance sampling number m;
the system comprises a sample primary selection unit, a network intrusion attack detection unit and a network intrusion attack detection unit, wherein the sample primary selection unit is used for clustering by using K-means so that each network intrusion attack type generates z clusters, representative samples of the clusters are extracted from each cluster without being replaced, and N x z samples are extracted as an initial balanced sample set;
and the majority type sampling unit is used for clustering samples which are not selected by the sample primary selection unit in the majority type again by using K-means, generating m clusters in each type, calculating the average classification error rate of each cluster, and extracting the representative points of the clusters from the z clusters with the maximum average classification error rate without replacing the representative points.
9. The system of claim 8, wherein the BP neural network classifier training module trains an initial BP neural network classifier according to the samples selected by the sample initial selection unit, sets iteration times after training is completed, calls the majority sampling unit to select new samples in the majority to add into a sample set in each iteration, trains the BP neural network classifier, and outputs the trained BP neural network classifier until the set iteration times are reached.
CN202010103246.7A 2020-02-19 2020-02-19 Network intrusion detection method and system based on hybrid sampling Active CN111314353B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010103246.7A CN111314353B (en) 2020-02-19 2020-02-19 Network intrusion detection method and system based on hybrid sampling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010103246.7A CN111314353B (en) 2020-02-19 2020-02-19 Network intrusion detection method and system based on hybrid sampling

Publications (2)

Publication Number Publication Date
CN111314353A true CN111314353A (en) 2020-06-19
CN111314353B CN111314353B (en) 2022-09-02

Family

ID=71160546

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010103246.7A Active CN111314353B (en) 2020-02-19 2020-02-19 Network intrusion detection method and system based on hybrid sampling

Country Status (1)

Country Link
CN (1) CN111314353B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111967502A (en) * 2020-07-23 2020-11-20 电子科技大学 Network intrusion detection method based on conditional variation self-encoder
CN112395558A (en) * 2020-11-27 2021-02-23 广东电网有限责任公司肇庆供电局 Improved unbalanced data hybrid sampling method suitable for historical fault data of intelligent electric meter
CN112633319A (en) * 2020-11-23 2021-04-09 贵州大学 Multi-target detection method for incomplete data set balance input data category
CN113518063A (en) * 2021-03-01 2021-10-19 广东工业大学 Network intrusion detection method and system based on data enhancement and BilSTM
CN113542241A (en) * 2021-06-30 2021-10-22 杭州电子科技大学 Intrusion detection method and device based on CNN-BiGRU mixed model
CN113656796A (en) * 2021-08-31 2021-11-16 杭州安恒信息技术股份有限公司 Oversampling method, device, equipment and storage medium
CN114222300A (en) * 2022-02-23 2022-03-22 南京理工大学 Method and equipment for detecting local area network intrusion of vehicle-mounted controller
CN115545111A (en) * 2022-10-13 2022-12-30 重庆工商大学 Network intrusion detection method and system based on clustering self-adaptive mixed sampling
CN116015932A (en) * 2022-12-30 2023-04-25 湖南大学 Intrusion detection network model generation method and data flow intrusion detection method

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030207278A1 (en) * 2002-04-25 2003-11-06 Javed Khan Methods for analyzing high dimensional data for classifying, diagnosing, prognosticating, and/or predicting diseases and other biological states
CN103716204A (en) * 2013-12-20 2014-04-09 中国科学院信息工程研究所 Abnormal intrusion detection ensemble learning method and apparatus based on Wiener process
CN106973038A (en) * 2017-02-27 2017-07-21 同济大学 Network inbreak detection method based on genetic algorithm over-sampling SVMs
CN107506783A (en) * 2017-07-07 2017-12-22 广东科学技术职业学院 A kind of COMPLEX MIXED intrusion detection algorithm
JP2018067304A (en) * 2016-10-21 2018-04-26 ニューソフト コーポレーションNeusoft Corporation Method and device for detecting network intrusion
US20180210944A1 (en) * 2017-01-26 2018-07-26 Agt International Gmbh Data fusion and classification with imbalanced datasets
CN110138784A (en) * 2019-05-15 2019-08-16 重庆大学 A kind of Network Intrusion Detection System based on feature selecting

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030207278A1 (en) * 2002-04-25 2003-11-06 Javed Khan Methods for analyzing high dimensional data for classifying, diagnosing, prognosticating, and/or predicting diseases and other biological states
CN103716204A (en) * 2013-12-20 2014-04-09 中国科学院信息工程研究所 Abnormal intrusion detection ensemble learning method and apparatus based on Wiener process
JP2018067304A (en) * 2016-10-21 2018-04-26 ニューソフト コーポレーションNeusoft Corporation Method and device for detecting network intrusion
US20180115568A1 (en) * 2016-10-21 2018-04-26 Neusoft Corporation Method and device for detecting network intrusion
US20180210944A1 (en) * 2017-01-26 2018-07-26 Agt International Gmbh Data fusion and classification with imbalanced datasets
CN106973038A (en) * 2017-02-27 2017-07-21 同济大学 Network inbreak detection method based on genetic algorithm over-sampling SVMs
CN107506783A (en) * 2017-07-07 2017-12-22 广东科学技术职业学院 A kind of COMPLEX MIXED intrusion detection algorithm
CN110138784A (en) * 2019-05-15 2019-08-16 重庆大学 A kind of Network Intrusion Detection System based on feature selecting

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
M. FINE等: "Differentiated Services Quality of Service Policy Information Base", 《IETF 》 *
张阳等: "基于SMOTE和机器学习的网络入侵检测", 《北京理工大学学报》 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111967502A (en) * 2020-07-23 2020-11-20 电子科技大学 Network intrusion detection method based on conditional variation self-encoder
CN112633319A (en) * 2020-11-23 2021-04-09 贵州大学 Multi-target detection method for incomplete data set balance input data category
CN112395558A (en) * 2020-11-27 2021-02-23 广东电网有限责任公司肇庆供电局 Improved unbalanced data hybrid sampling method suitable for historical fault data of intelligent electric meter
CN113518063A (en) * 2021-03-01 2021-10-19 广东工业大学 Network intrusion detection method and system based on data enhancement and BilSTM
CN113518063B (en) * 2021-03-01 2022-11-22 广东工业大学 Network intrusion detection method and system based on data enhancement and BilSTM
CN113542241A (en) * 2021-06-30 2021-10-22 杭州电子科技大学 Intrusion detection method and device based on CNN-BiGRU mixed model
CN113656796B (en) * 2021-08-31 2024-02-27 杭州安恒信息技术股份有限公司 Oversampling method, device, equipment and storage medium
CN113656796A (en) * 2021-08-31 2021-11-16 杭州安恒信息技术股份有限公司 Oversampling method, device, equipment and storage medium
CN114222300A (en) * 2022-02-23 2022-03-22 南京理工大学 Method and equipment for detecting local area network intrusion of vehicle-mounted controller
WO2023160600A1 (en) * 2022-02-23 2023-08-31 南京理工大学 In-vehicle controller area network instrusion detection method and device
CN115545111B (en) * 2022-10-13 2023-05-30 重庆工商大学 Network intrusion detection method and system based on clustering self-adaptive mixed sampling
CN115545111A (en) * 2022-10-13 2022-12-30 重庆工商大学 Network intrusion detection method and system based on clustering self-adaptive mixed sampling
CN116015932A (en) * 2022-12-30 2023-04-25 湖南大学 Intrusion detection network model generation method and data flow intrusion detection method

Also Published As

Publication number Publication date
CN111314353B (en) 2022-09-02

Similar Documents

Publication Publication Date Title
CN111314353B (en) Network intrusion detection method and system based on hybrid sampling
Gong et al. Psla: Improving audio tagging with pretraining, sampling, labeling, and aggregation
CN112069310B (en) Text classification method and system based on active learning strategy
Tan et al. Application of Self-Organizing Feature Map Neural Network Based on K-means Clustering in Network Intrusion Detection.
CN111556016B (en) Network flow abnormal behavior identification method based on automatic encoder
CN107908642B (en) Industry text entity extraction method based on distributed platform
CN109257383B (en) BGP anomaly detection method and system
CN113762377B (en) Network traffic identification method, device, equipment and storage medium
CN112529638B (en) Service demand dynamic prediction method and system based on user classification and deep learning
CN111327480B (en) Method for monitoring multiple QoS of Web service under mobile edge environment
CN116467451A (en) Text classification method and device, storage medium and electronic equipment
CN111343165B (en) Network intrusion detection method and system based on BIRCH and SMOTE
US11514233B2 (en) Automated nonparametric content analysis for information management and retrieval
Tahayna et al. A novel weighting scheme for efficient document indexing and classification
CN115545111A (en) Network intrusion detection method and system based on clustering self-adaptive mixed sampling
CN113111855B (en) Multi-mode emotion recognition method and device, electronic equipment and storage medium
Yin et al. An improved bayesian algorithm for filtering spam e-mail
CN112463964B (en) Text classification and model training method, device, equipment and storage medium
CN114511747A (en) Unbalanced load data type identification method based on VAE preprocessing and RP-2DCNN
CN116955600A (en) Work order clustering method, device, electronic equipment and storage medium
CN114861004A (en) Social event detection method, device and system
CN114860931A (en) Relay protection defect text grading method based on Voting Classifier model
CN112632229A (en) Text clustering method and device
Gao et al. Network traffic classification based on domain adaptive migration for multimedia services in smart city networks
CN117235137B (en) Professional information query method and device based on vector database

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant