CN113904801B - Network intrusion detection method and system - Google Patents

Network intrusion detection method and system Download PDF

Info

Publication number
CN113904801B
CN113904801B CN202111030340.5A CN202111030340A CN113904801B CN 113904801 B CN113904801 B CN 113904801B CN 202111030340 A CN202111030340 A CN 202111030340A CN 113904801 B CN113904801 B CN 113904801B
Authority
CN
China
Prior art keywords
network
model
data
detection
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111030340.5A
Other languages
Chinese (zh)
Other versions
CN113904801A (en
Inventor
徐凤振
寿增
汪明
高明慧
赵航
卢楷
马力
张志军
董昱
许洪强
周劼英
詹雄
张晓�
李新鹏
崔旭东
何纪成
王洋
郭乃豪
王浩
赵宇
沈鹏
宁志言
高英健
冯思博
佟志鑫
付广宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
Beijing Kedong Electric Power Control System Co Ltd
State Grid Liaoning Electric Power Co Ltd
Original Assignee
State Grid Corp of China SGCC
Beijing Kedong Electric Power Control System Co Ltd
State Grid Liaoning Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, Beijing Kedong Electric Power Control System Co Ltd, State Grid Liaoning Electric Power Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN202111030340.5A priority Critical patent/CN113904801B/en
Publication of CN113904801A publication Critical patent/CN113904801A/en
Application granted granted Critical
Publication of CN113904801B publication Critical patent/CN113904801B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24147Distances to closest patterns, e.g. nearest neighbour classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Hardware Design (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Medical Informatics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)

Abstract

The invention discloses a network intrusion detection method, which is characterized by comprising the following steps: the intercepted network data packets are arranged to obtain a network data set; performing feature engineering processing on the network data set to obtain a network detection data set; performing dimension reduction on the network detection data set by adopting a trained denoising self-coding neural network model; performing intrusion detection on the network detection data after the dimension reduction by adopting a trained XGBoost network intrusion detection model; constructing an intrusion database according to the network data of which the detection result is intrusion and the disclosed intrusion data; the denoising self-coding neural network model and the XGBoost network intrusion detection model are retrained according to the intrusion database at regular intervals, and the network data is subjected to intrusion detection according to the retrained model.

Description

Network intrusion detection method and system
Technical Field
The invention belongs to the technical field of network security, and particularly relates to a network intrusion detection method.
Background
With the advent of the internet era, networks have penetrated in various aspects of life of people, and various network intrusion means with various changes can cause various security problems such as personal information leakage, confidential document theft and account theft, and the like, so that immeasurable losses are caused. Therefore, how to construct an effective network intrusion detection model is increasingly emphasized by the relevant scholars.
In recent years, various machine learning algorithms are applied to network intrusion detection, and some classical algorithms such as KNN, decision trees, SVM and the like have been applied, however, these algorithms have problems of low detection efficiency, high false detection rate and the like in application. The method has lower false detection rate, but has the obvious defect of longer prediction time; in order to improve the detection rate and reduce the false detection rate, the Shougarmy et al performs the operations of reducing and eliminating samples according to the correlation between the characteristics and the similarity between the similar samples in the data processing process, and then uses an SVM algorithm for modeling, however, the operations of directly eliminating the characteristics and eliminating the samples inevitably bring about information loss, and the detection accuracy is also necessarily limited; chen Hong et al perform data dimension reduction based on a deep belief neural network (Deep Belief Networks, DBN), perform intrusion detection by using a plurality of gradient lifting trees, and have good detection effect on unbalanced intrusion data of intrusion data, however, the processing process is complex, and the experiment time is long; zhang Yang et al introduce XGBoost algorithm into intrusion detection to obtain good detection rate and low false detection rate, but have the problems of difficulty in high-data processing and unsatisfactory detection effect on attack types with fewer samples.
The data in the network is in a growing state at any time, a few intrusion threat behaviors are often hidden in massive normal network behaviors, meanwhile, the network environment is very complex, influence factors are many, and the data has the characteristics of large scale, high dimensionality and unbalance in the intrusion detection process. Although the current research method has some achievements, the method has the problems that high-dimensional data processing is difficult and the detection effect on attack types with fewer samples is not ideal.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a network intrusion detection method which can accurately detect network intrusion data.
The technical problems to be solved by the invention are realized by the following technical scheme:
in a first aspect, a network intrusion detection method is provided, including: the intercepted network data packets are arranged to obtain a network data set;
performing feature engineering processing on the network data set to obtain a network detection data set;
performing dimension reduction on the network detection data set by adopting a trained denoising self-coding neural network model;
performing intrusion detection on the network detection data after the dimension reduction by adopting a trained XGBoost network intrusion detection model;
constructing an intrusion database according to the network data of which the detection result is intrusion and the disclosed intrusion data;
and periodically retraining the denoising self-coding neural network model and the XGBoost network intrusion detection model according to the intrusion database, and performing intrusion detection on network data according to the retrained model.
With reference to the first aspect, further, the step of organizing the intercepted network data packet to obtain a network data set specifically includes: and obtaining a network data set ND according to the basic characteristic attribute of the TCP connection, the content attribute of the TCP connection, the time-based network traffic characteristic attribute and the content of the host-based network traffic statistical characteristic in the intercepted network data packet.
With reference to the first aspect, further, the performing feature engineering processing on the network data set to obtain a network detection data set specifically includes:
and digitizing character data in the network data set by adopting onehot coding, normalizing the numerical data in the network data set, and obtaining a network detection data set D according to the digitized and normalized data.
With reference to the first aspect, further, the trained denoising self-coding neural network training process includes:
manually marking the historical data in the same network environment, and forming a training data set by the marked historical data and the public network attack data;
performing numeric processing on character type data in the training data set by adopting onehot coding, performing normalization processing on numeric data in the numeric data set, and dividing the processed training data set into training sets T 1 And test set T 2
Inputting the training set into a denoising self-coding neural network model to train the model;
will test the data set T 2 And (3) inputting the model to a trained denoising self-coding neural network model to test the model, and continuing training after adjusting the model parameters according to the standard of the model until the model parameters reach the standard.
With reference to the first aspect, further, the performing dimension reduction on the network detection data set by using the trained denoising self-coding neural network model includes:
training set T 1 And test set T 2 Inputting the training data into an encoder part in a trained self-coding neural network model reaching the standard, and outputting a training set T after dimension reduction 1 ' sum test set T 2 '。
With reference to the first aspect, further, the performing dimension reduction on the network detection data set by using the trained denoising self-coding neural network model includes:
the network detection data set D is input into an encoder part of the trained denoising self-coding neural network model, and the output of the encoder is the network detection data set D' after the dimensionality reduction.
With reference to the first aspect, further, the training process of the trained XGBoost network intrusion detection model includes:
training set T after dimension reduction 1 ' training an XGBoost network intrusion detection model;
through the test set T after dimension reduction 2 And (3) testing the trained model, and if the model does not reach the standard, continuing training after adjusting the parameters of the model until the test result reaches the standard.
With reference to the first aspect, further, building an intrusion database includes:
inputting the network detection data set D' subjected to dimension reduction into an XGBoost network intrusion detection model, and putting network data with intrusion detection results into the data set D p In the method, network data with normal detection result is put into a data set D n In (a) and (b);
according to D p And network intrusion data disclosed in netlab to construct an intrusion database IDB.
In a second aspect, a network intrusion detection system is provided, comprising:
the data processing module is used for sorting the intercepted network data packets to obtain a network data set;
performing feature engineering processing on the network data set to obtain a network detection data set;
the intrusion detection module is used for reducing the dimension of the network detection data set by adopting a trained denoising self-coding neural network model;
performing intrusion detection on the network detection data after the dimension reduction by adopting a trained XGBoost network intrusion detection model;
constructing an intrusion database according to the network data of which the detection result is intrusion and the disclosed intrusion data;
and periodically retraining the denoising self-coding neural network model and the XGBoost network intrusion detection model according to the intrusion database, and performing intrusion detection on network data according to the retrained model.
The invention has the beneficial effects that: before the XGBoost model is used for detection, network data is input into the DAE model to reduce the dimension of the DAE model. In the training process of the self-coding neural network model, certain characteristics of part of samples are randomly covered or replaced before the samples are input into the neural network, so that effective information and data dimension reduction can be effectively extracted from massive complex network data, and the information extraction capacity and robustness of the model are improved; meanwhile, the invention adds the sample weighing factor into the loss function of the XGBoost model to influence the construction process of the tree, so that the rare attack type is more important in the modeling process, and the detection capability of the model for rare network intrusion is improved; the invention combines the information extraction capability of DAE to large-scale high-dimensional data with the detection capability of XGBoost model added with a weighing factor to unbalanced intrusion type, builds a DAE-XGBoost intrusion detection model, and well solves the problems of difficult high-dimensional and non-ideal rare attack type detection effect in the network intrusion detection process.
Drawings
Fig. 1 is a flowchart of a network intrusion detection method provided by the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In order to better understand the present invention, the following describes related technologies in the technical solution of the present invention.
Example 1
The invention provides a network intrusion detection method, which comprises the following steps:
collecting network data by using a network analysis tool, selecting a basic characteristic attribute (TCP_B) of a TCP connection, a content attribute (TCP_C) of the TCP connection in a data packet, a time-based network traffic characteristic attribute (TCP_TF) and a HOST-based network traffic statistical characteristic (HOST_NF) to form a network data set. The specific content contained by each attribute is shown in table 1:
table 1 details of attributes in network packets
Step two, carrying out data preprocessing and characteristic engineering on the collected network data set ND, wherein the characteristic engineering comprises the following steps: for character type characteristics of network data, adopting onehot coding technology to convert the character type characteristics into numerical vectors which can be identified by a computer, for example, protocol types comprise three types of TCP, UDP and ICMP, and the character type characteristics are respectively [1, 0], [0,1,0] and [0,1] after onehot coding; the numerical value type characteristic is normalized, the mathematical expression is shown as a formula (1), the numerical value is constrained in the range of [0,1] so as to reduce the influence of the difference of the dimensions among different characteristics on the modeling process, and a network detection data set D is obtained at the moment;
wherein, the sample is normalized by x'; x is the original sample; x is x max Is the maximum value of the sample; x is x min Is the minimum of the samples.
Thirdly, manually marking all data into five types of normal, dos attack, probe attack, U2R attack and R2L attack by utilizing network data collected by a network analyzer and network intrusion data disclosed on the network to form a pre-training data set T;
step four, dividing the training data into a pre-training data set T according to 8:2 into training set T 1 And test set T 2 By T 1 Training a DAE model (denoising self-coding neural network model), and stopping training when the loss function value is lower than 0.0001 in the training process; using test set T 2 Testing the training effect of the DAE model, wherein if the loss function value expressed by the formula (6) in the test is lower than 0.0001, the DAE model can be used in the network intrusion detection process after the training is finished; if the number of the neurons is more than 0.0001, manually adjusting the number of the neurons and the number of layers of the neural network, and retraining the DAE model; after training is finished, training set T 1 And test set T 2 The encoder part input into the trained DAE model is output by the encoder to obtain a low-dimensional training set T 1 'and Low dimensional test set T' 2 For subsequent training of XGBoost models.
Step five, utilizing a low-dimensional training set T 1 ' training the XGBoost model, and stopping training when the recall rate shown in the following formula (16) is more than or equal to 0.75 in the training process; with a low-dimensional test set T' 2 The XGBoost model intrusion detection effect is tested, and when the recall rate in the test is more than or equal to 0.75, the model is trained, so that the XGBoost model intrusion detection method can be used in the actual network intrusion detection process; and if the recall rate is less than 0.75, adjusting parameters of the XGBoost model, and retraining.
And step six, inputting the network data set D into an encoder part of the DAE model trained in the step four, and outputting to obtain a low-dimensional network detection data set D' by the encoder.
Step seven, inputting the low-dimensional network detection data set D' into the XGBoost network intrusion detection model trained in the step five; construction of data set D p Putting network data with intrusion detection result into D p In (a) and (b); construction of data set D n Putting the network data with normal detection result into D n Is a kind of medium.
Step eight, constructing an intrusion database IDB, and carrying out D p And network intrusion data disclosed in netlab are stored in the IDB.
Step nine, utilizing the whole IDB and 5000D n The DAE and XGBoost models are retrained every two months, and when the loss function value in the training is lower than 0.0001 and the recall rate is greater than or equal to 0.75, the training is ended, so that the DAE and XGBoost models have a good detection effect on new intrusion means.
The DAE model and the XGBoost model obtained through the previous steps can be combined to be used for detecting network intrusion in a network environment, and the invention aims to overcome the defects of the prior art, provide a network intrusion detection model based on a neural network DAE and integrated learning XGBoost, solve the problems that the network intrusion detection is difficult to process high-dimensional data and the unbalanced attack type detection effect is not ideal enough, and have higher detection effect on various intrusion detection.
In the data preprocessing stage, the invention constructs a data structure consisting of 26 fields, and network data collected by a network analysis tool after data preprocessing and feature engineering are stored in a csv file in the structure, wherein the structure comprises the following 26 tuples for forming transaction features of each time in a network data set ND, and the specific meaning of each field is shown in table 1:
<“duration”,“protocol_type”,“service”,“flag”,“src_bytes”,“dst_bytes”,“land”,“wrong_fragment”,“urgent”,“hot”,“num_failed_logins”,“logged_in”,“num_compromised”,“root_shell”,“su_attempted”,“num_root”,“num_file_creations”,“num_shells”,“num_access_files”,“num_outbound_cmds”,“is_hot_logins”,“is_guest_login”,“count”,“srv_count”,“dst_host_count”,“dst_host_srv_count”>
a piece of data is collected as follows in the network analysis tool:
<“2”,“tcp”,“smtp”,“SF”,“1684”,“363”,“0”,“0”,“0”,“0”,“0”,“1”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“1”,“1”,“104”,“66”>
the data structure designed by the invention can be obtained after the onehot coding and normalized feature engineering operation as follows:
<“0.16”,“0,0,1”,“1,0(69)”,“1,0(10)”,“0.37”,“0.15”,“0”,“0”,“0”,“0”,“0”,“1”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“0”,“1”,“1”,“0.24”,“0.19”>
in the DAE data dimension reduction stage, model parameters trained in advance in the embodiment are stored in a pkl file, and the DAE data dimension reduction can be performed only by calling the model parameters in the intrusion detection process.
In XGBoost intrusion detection stage, the embodiment stores the model parameters trained in advance in a pkl file, when intrusion is detected, the file is called to perform model recovery, then the low-dimensional network detection data subjected to DAE dimension reduction is input into the XGBoost model, a detection result can be obtained, and the data with intrusion as the result is stored in D p Simultaneously storing network intrusion data which are newly disclosed on the internet together with the network intrusion data in an intrusion database IDB; save the data with normal result in D n Is a kind of medium.
During the periodic model update phase, data and part D of the intrusion database IDB are utilized n The DAE model and the XGBoost model are retrained, and the detection capability of the DAE model and the XGBoost model to new intrusion means is ensured.
The experimental results and analysis are given below:
the experimental environment is CPU Intel (R) Core (TM) i5-6300@2.30GHz,8 memory, hard disk 1T and operating system is Windows 10 computer. Run in Anaconda jupyter notebook using the Python language.
The test data sets experimentally selected were as follows:
the data set selected by the invention is a KDD99 data set, which is derived from advanced planning agency (DARRA) of the United states department of defense in 1998, carries out an intrusion detection evaluation project in MIT Lincoln laboratories, simulates various user types, various different network flows and attack means, and is like a real network environment.
Performance analysis:
in order to measure the classified detection condition of each intrusion detection model for each type of attack, several evaluation indexes of accuracy (precision), recall (recall), false detection rate and F1score are designed according to the common macro average macro evaluation mode of the multi-classification problem.
In order to comprehensively evaluate the model designed by the invention, two levels of verification experiments are designed, namely normal and attack types are classified, and at the moment, attack types except normal are unified into the attack type; and secondly, multi-classification, namely, detecting various attack types according to the label condition in the data set.
The number of sample categories in the training set and the test set are shown in tables 2 and 3 respectively. In evaluating the model, four test sets were constructed in order to test the generalization ability of the model for different data sets.
Table 2 attack category, number of training sets
Attack type Normal Dos Probe R2L U2R Total
Quantity of 47278 191458 1607 500 180 241023
Table 3 attack category, number of test sets
Test Data Normal Dos Probe R2L U2R Total
DATA1 12634 5362 1032 26 35 19159
DATA2 2836 78326 262 323 6 81753
DATA3 32048 86343 569 148 32 119140
DATA4 2482 29969 637 129 27 33244
In order to compare the detection effect, the invention designs four groups of comparison experiments for respectively combining the DAE model with a Random Forest (RF), a k-nearest neighbor (knn) and a Support Vector Machine (SVM) and directly using XGBoost for detection without combining the DAE model to obtain the evaluation index of each model.
(1) In the two-classification experiment, the average value of the evaluation indexes obtained in the test data set in each experiment is shown in table 4:
TABLE 4 average evaluation index for each test set under two classification conditions
Model precision recall F1score false
DAE-knn 0.9795 0.9685 0.974 0.00214
DAE-SVM 0.9562 0.9488 0.9524 0.00162
DAE-RF 0.9626 0.9528 0.9577 0.00174
XGBoost 0.9842 0.9786 0.9813 0.00186
DAE-XGBoost 0.9921 0.9823 0.9871 0.0008
The results of the evaluation indexes shown in the comprehensive table show that the detection effect of each method is good for the two-class condition of intrusion detection, and the method designed by the invention has little effect compared with other comparison methods.
(2) For the case of multi-class testing, the average evaluation index for each test set is shown in table 5,
TABLE 5 average evaluation index for each test set under multiple categories
Model Macro-P Macro-R Macro-f1 Macro-F
DAE-knn 0.7594 0.7562 0.7578 0.0427
DAE-SVM 0.7847 0.7684 0.7765 0.0254
DAE-RF 0.8126 0.8042 0.8084 0.0204
XGBoost 0.8394 0.8094 0.8241 0.0214
DAE-XGBoost 0.8785 0.8572 0.8677 0.00809
Example 2
There is provided a network intrusion detection system comprising:
the data processing module is used for sorting the intercepted network data packets to obtain a network data set;
performing feature engineering processing on the network data set to obtain a network detection data set;
the intrusion detection module is used for reducing the dimension of the network detection data set by adopting a trained denoising self-coding neural network model;
performing intrusion detection on the network detection data after the dimension reduction by adopting a trained XGBoost network intrusion detection model;
constructing an intrusion database according to the network data of which the detection result is intrusion and the disclosed intrusion data;
and periodically retraining the denoising self-coding neural network model and the XGBoost network intrusion detection model according to the intrusion database, and performing intrusion detection on network data according to the retrained model.
As can be seen from the average evaluation index of each test set in the table, when various attack types are detected under the condition of unbalanced distribution of various attack types, compared with other methods, the method designed by the invention has higher accuracy and lower false detection rate.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Claims (7)

1. A method for network intrusion detection, comprising:
the intercepted network data packets are arranged to obtain a network data set;
performing feature engineering processing on the network data set to obtain a network detection data set;
performing dimension reduction on the network detection data set by adopting a trained denoising self-coding neural network model;
performing intrusion detection on the network detection data after the dimension reduction by adopting a trained XGBoost network intrusion detection model;
constructing an intrusion database according to the network data of which the detection result is intrusion and the disclosed intrusion data;
the denoising self-coding neural network model and the XGBoost network intrusion detection model are retrained according to the intrusion database at regular intervals, and intrusion detection is carried out on network data according to the retrained model;
the trained denoising self-coding neural network training process comprises the following steps:
manually marking the historical data in the same network environment, and forming a training data set by the marked historical data and the public network attack data;
counting character data in training data set by adopting onehot codingThe value processing is carried out, the normalization processing is carried out on the numerical value data in the value processing, and the processed training data set is divided into a training set T 1 And test set T 2
Inputting the training set into a denoising self-coding neural network model to train the model;
will test the data set T 2 The model is input into a trained denoising self-coding neural network model to test the model, and if the model parameter does not reach the standard, the model parameter is adjusted and then the training is continued until the model reaches the standard;
the step of adopting the trained denoising self-coding neural network model to reduce the dimension of the network detection data set comprises the following steps:
training set T 1 And test set T 2 Inputting the training data into an encoder part in a trained self-coding neural network model reaching the standard, and outputting a training set T after dimension reduction 1 'and test set T' 2 The encoder is an encoder;
and inputting the network detection data set D into an encoder part of the trained denoising self-coding neural network model, and outputting the network detection data set after the dimensionality reduction by the encoder.
2. The network intrusion detection method according to claim 1, wherein: the step of acquiring the network data set by arranging the intercepted network data packet comprises the following steps: and obtaining a network data set ND according to the basic characteristic attribute of the TCP connection, the content attribute of the TCP connection, the time-based network traffic characteristic attribute and the content of the host-based network traffic statistical characteristic in the intercepted network data packet.
3. A network intrusion detection method according to claim 2, wherein: the feature engineering processing is performed on the network data set to obtain a network detection data set specifically comprises the following steps:
and digitizing character data in the network data set by adopting onehot coding, normalizing the numerical data in the network data set, and obtaining a network detection data set D according to the digitized and normalized data.
4. The network intrusion detection method according to claim 1, wherein the training process of the trained XGBoost network intrusion detection model comprises:
training set T after dimension reduction 1 ' training an XGBoost network intrusion detection model;
by test set T 'after dimension reduction' 2 And testing the trained model, and if the model does not reach the standard, continuing training after adjusting the model parameters until the test result reaches the standard.
5. The network intrusion detection method according to claim 1, wherein building an intrusion database comprises:
inputting the network detection data set D' subjected to dimension reduction into an XGBoost network intrusion detection model, and putting network data with intrusion detection results into the data set D p In the method, network data with normal detection result is put into a data set D n In (a) and (b);
according to D p And constructing an intrusion database IDB by using the network intrusion data disclosed in netlab, wherein the netlab is a network laboratory.
6. The network intrusion detection method of claim 1 wherein the XGBoost network intrusion detection model incorporates a sample tradeoff factor.
7. A network intrusion detection system, comprising:
the data processing module is used for sorting the intercepted network data packets to obtain a network data set;
performing feature engineering processing on the network data set to obtain a network detection data set;
the intrusion detection module is used for reducing the dimension of the network detection data set by adopting a trained denoising self-coding neural network model;
performing intrusion detection on the network detection data after the dimension reduction by adopting a trained XGBoost network intrusion detection model;
constructing an intrusion database according to the network data of which the detection result is intrusion and the disclosed intrusion data;
the denoising self-coding neural network model and the XGBoost network intrusion detection model are retrained according to the intrusion database at regular intervals, and intrusion detection is carried out on network data according to the retrained model;
the trained denoising self-coding neural network training process comprises the following steps:
manually marking the historical data in the same network environment, and forming a training data set by the marked historical data and the public network attack data;
performing numeric processing on character type data in the training data set by adopting onehot coding, performing normalization processing on numeric data in the numeric data set, and dividing the processed training data set into training sets T 1 And test set T 2
Inputting the training set into a denoising self-coding neural network model to train the model;
will test the data set T 2 The model is input into a trained denoising self-coding neural network model to test the model, and if the model parameter does not reach the standard, the model parameter is adjusted and then the training is continued until the model reaches the standard;
the step of adopting the trained denoising self-coding neural network model to reduce the dimension of the network detection data set comprises the following steps:
training set T 1 And test set T 2 Inputting the training data into an encoder part in a trained self-coding neural network model reaching the standard, and outputting a training set T after dimension reduction 1 'and test set T' 2 The encoder is an encoder;
and inputting the network detection data set D into an encoder part of the trained denoising self-coding neural network model, and outputting the network detection data set after the dimensionality reduction by the encoder.
CN202111030340.5A 2021-09-03 2021-09-03 Network intrusion detection method and system Active CN113904801B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111030340.5A CN113904801B (en) 2021-09-03 2021-09-03 Network intrusion detection method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111030340.5A CN113904801B (en) 2021-09-03 2021-09-03 Network intrusion detection method and system

Publications (2)

Publication Number Publication Date
CN113904801A CN113904801A (en) 2022-01-07
CN113904801B true CN113904801B (en) 2024-02-06

Family

ID=79188352

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111030340.5A Active CN113904801B (en) 2021-09-03 2021-09-03 Network intrusion detection method and system

Country Status (1)

Country Link
CN (1) CN113904801B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018077285A1 (en) * 2016-10-31 2018-05-03 腾讯科技(深圳)有限公司 Machine learning model training method and apparatus, server and storage medium
CN110472817A (en) * 2019-07-03 2019-11-19 西北大学 A kind of XGBoost of combination deep neural network integrates credit evaluation system and its method
CN113206860A (en) * 2021-05-17 2021-08-03 北京交通大学 DRDoS attack detection method based on machine learning and feature selection

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018077285A1 (en) * 2016-10-31 2018-05-03 腾讯科技(深圳)有限公司 Machine learning model training method and apparatus, server and storage medium
CN110472817A (en) * 2019-07-03 2019-11-19 西北大学 A kind of XGBoost of combination deep neural network integrates credit evaluation system and its method
CN113206860A (en) * 2021-05-17 2021-08-03 北京交通大学 DRDoS attack detection method based on machine learning and feature selection

Also Published As

Publication number Publication date
CN113904801A (en) 2022-01-07

Similar Documents

Publication Publication Date Title
CN111882446B (en) Abnormal account detection method based on graph convolution network
CN103870751B (en) Method and system for intrusion detection
CN106973038B (en) Network intrusion detection method based on genetic algorithm oversampling support vector machine
CN111614491B (en) Power monitoring system oriented safety situation assessment index selection method and system
CN112491796B (en) Intrusion detection and semantic decision tree quantitative interpretation method based on convolutional neural network
CN102291392B (en) Hybrid intrusion detection method based on Bagging algorithm
CN111901340B (en) Intrusion detection system and method for energy Internet
CN111598179B (en) Power monitoring system user abnormal behavior analysis method, storage medium and equipment
CN112910859B (en) Internet of things equipment monitoring and early warning method based on C5.0 decision tree and time sequence analysis
CN117081858B (en) Intrusion behavior detection method, system, equipment and medium based on multi-decision tree
CN111126820A (en) Electricity stealing prevention method and system
CN112134862A (en) Coarse-fine granularity mixed network anomaly detection method and device based on machine learning
CN111507385A (en) Extensible network attack behavior classification method
CN113067798A (en) ICS intrusion detection method and device, electronic equipment and storage medium
CN114448657B (en) Distribution communication network security situation awareness and abnormal intrusion detection method
Harbola et al. Improved intrusion detection in DDoS applying feature selection using rank & score of attributes in KDD-99 data set
CN114662405A (en) Rock burst prediction method based on few-sample measurement and ensemble learning
CN111047428A (en) Bank high-risk fraud client identification method based on small amount of fraud samples
CN117278314A (en) DDoS attack detection method
CN113904801B (en) Network intrusion detection method and system
CN116611003A (en) Transformer fault diagnosis method, device and medium
Chao et al. Research on network intrusion detection technology based on dcgan
CN115842645A (en) UMAP-RF-based network attack traffic detection method and device and readable storage medium
CN115296851A (en) Network intrusion detection method based on mutual information and gray wolf promotion algorithm
CN114969761A (en) Log anomaly detection method based on LDA theme characteristics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant