CN111797997A - Network intrusion detection method, model construction method, device and electronic equipment - Google Patents

Network intrusion detection method, model construction method, device and electronic equipment Download PDF

Info

Publication number
CN111797997A
CN111797997A CN202010655063.6A CN202010655063A CN111797997A CN 111797997 A CN111797997 A CN 111797997A CN 202010655063 A CN202010655063 A CN 202010655063A CN 111797997 A CN111797997 A CN 111797997A
Authority
CN
China
Prior art keywords
feature
attribute
randomness
intrusion detection
characteristic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010655063.6A
Other languages
Chinese (zh)
Inventor
薛智慧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Topsec Technology Co Ltd
Beijing Topsec Network Security Technology Co Ltd
Beijing Topsec Software Co Ltd
Original Assignee
Beijing Topsec Technology Co Ltd
Beijing Topsec Network Security Technology Co Ltd
Beijing Topsec Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Topsec Technology Co Ltd, Beijing Topsec Network Security Technology Co Ltd, Beijing Topsec Software Co Ltd filed Critical Beijing Topsec Technology Co Ltd
Priority to CN202010655063.6A priority Critical patent/CN111797997A/en
Publication of CN111797997A publication Critical patent/CN111797997A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computer Hardware Design (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The application provides a network intrusion detection method, a model construction method, a device and electronic equipment, and relates to the technical field of network security. The network intrusion model construction method comprises the following steps: acquiring network flow data; converting the network traffic data into at least one feature vector, wherein each feature vector corresponds to at least one feature attribute; determining a training feature vector based on the randomness of each feature attribute; and performing machine learning model training by adopting the training characteristic vector to obtain a network intrusion detection model. The network intrusion detection model is matched with a network intrusion detection method, the characteristic attributes are optimized according to the uncertainty of the characteristic attributes, the low-quality characteristic attributes are eliminated, and the discarding of high-quality characteristic vectors due to the interference of the low-quality characteristic vectors at the subsequent stage is avoided, so that the higher-quality characteristic vectors can be extracted in a dimensionality reduction mode, and the accuracy of network intrusion detection is improved.

Description

Network intrusion detection method, model construction method, device and electronic equipment
Technical Field
The present application relates to the field of network security technologies, and in particular, to a network intrusion detection method, a model construction method, an apparatus, and an electronic device.
Background
With the rapid development of information-based construction and IT technology, various network technologies are more widely and deeply applied, and a lot of network security problems also occur while the openness of the network brings great convenience, and various network attacks from the inside and the outside become one of the main threats faced by enterprises. Therefore, the network-based intrusion detection system becomes one of the important tools for maintaining the security of the enterprise network for network administrators, system operation and maintenance personnel, and the like. Network intrusion detection systems have been developed for more than two decades, and in the past, intrusion detection techniques have relied primarily on known signature and anomaly detection techniques. Currently, with the fire development of machine learning and artificial intelligence technologies, research on intrusion detection systems based on machine learning, deep learning and other technologies becomes a hotspot direction in academic and industrial fields.
At present, the main research methods for network intrusion detection systems are based on machine learning or deep learning techniques, and these methods are mainly classified into two categories: supervised and unsupervised. The supervision type is generally realized according to a classification task area by using machine learning and deep learning algorithms, manual feature selection is carried out on a labeled data set, then feature selection or dimension reduction processing in various modes is carried out on a large number of feature attributes, model training and optimization are carried out through different algorithms, and finally a supervision type model is formed; the unsupervised type usually realizes intrusion detection from the perspective of anomaly detection, and selects a traditional machine learning algorithm if characteristics are manually selected, and selects a deep learning algorithm if characteristics are automatically selected.
Different detection techniques and methods have certain detection effects under specific conditions, but have some defects and shortcomings. Based on machine learning and a method for manually determining features, the method depends on feature selection and feature screening, dimension disasters are easily caused when too many features are used, and relevance is easily lost in a high-dimensional or even extra-high-dimensional space; too little feature usage tends to miss critical features. Although the method based on deep learning does not use manual feature selection, the feature self-learning function of the neural network can send data of network traffic types into a deep learning algorithm after preprocessing, so that partial attributes and features of original traffic data are lost, and the accuracy of network intrusion detection is reduced.
Disclosure of Invention
In view of this, embodiments of the present disclosure provide a network intrusion detection method, a model construction method, a device and an electronic device, so as to solve the problem of network traffic in the prior art.
The embodiment of the application provides a network intrusion detection model construction method, which comprises the following steps: acquiring network flow data; converting the network traffic data into at least one feature vector, wherein each feature vector corresponds to at least one feature attribute; determining a training feature vector based on the randomness of each feature attribute; and performing machine learning model training by adopting the training characteristic vector to obtain a network intrusion detection model.
In the implementation mode, the feature attribute optimization is carried out aiming at the uncertainty of the feature attribute, the low-quality feature attribute is excluded, the feature attribute screening is carried out on the high-dimensional feature vector used in the network intrusion detection, then the feature selection and the subsequent processing are carried out according to the screening result, then, machine learning model training is carried out to obtain a network intrusion detection model for network intrusion detection, before processing the feature vector, feature attribute optimization is firstly carried out according to the uncertainty of the feature attribute, low-quality feature attributes are eliminated, the phenomenon that high-quality feature vectors are discarded due to the interference of the low-quality feature vectors in the stages of feature selection and subsequent processing is avoided, and therefore the feature vectors with higher quality can be extracted in a dimension reduction way, meanwhile, the algorithm is simple in calculation and good in performance, and the accuracy and the efficiency of the generated network intrusion detection model are improved.
Optionally, the converting the network traffic data into at least one feature vector includes: converting the network traffic data into the at least one feature vector in units of sessions; and marking the characteristic attribute and the attack category of each characteristic vector in the at least one characteristic vector.
In the implementation mode, the feature attributes and the attack categories are labeled on the feature vectors, processing basis is provided for screening of subsequent feature attributes, and feature screening efficiency and accuracy can be improved.
Optionally, the determining a training feature vector based on the randomness of the feature attributes includes: respectively determining attribute randomness of each characteristic attribute corresponding to each attack category; and determining a training feature vector based on the attribute randomness of all the feature vectors corresponding to each feature attribute.
In the implementation mode, the relevance between the characteristic attribute and the attack category is determined through the randomness of the characteristic attribute corresponding to the attack category, and then the training characteristic vector is determined based on the randomness, so that the characteristic vector with high characteristic attribute phase quality can be accurately extracted for model training, and the detection accuracy of the generated network intrusion detection model is improved.
Optionally, the determining the attribute randomness of each feature attribute corresponding to the attack category respectively includes: respectively determining first randomness of each feature vector corresponding to each attack category; dividing the feature vector corresponding to each feature attribute into a plurality of first randomness sets based on the value of the first randomness; determining second randomness, corresponding to each attack category, of each characteristic attribute based on the plurality of first randomness sets; and determining the attribute randomness of each characteristic attribute corresponding to different attack categories based on the value of the second randomness.
In the implementation mode, the first randomness set is carried out on the feature vectors based on the randomness of the feature vectors corresponding to the attack categories, then the second randomness of the feature attributes corresponding to the attack categories is determined in each first randomness set, the influence degree of the feature attributes on the judgment of the attack categories when the randomness of the feature vectors corresponding to the attack categories is the same is reflected, and then the attribute randomness is determined, so that the attribute randomness can more accurately reflect the quality of the feature attributes, and the detection accuracy of the generated network intrusion detection model is improved.
Optionally, the determining a training feature vector based on the attribute randomness comprises: dividing all the characteristic attributes into a plurality of second randomness sets based on the values of the attribute randomness; the second randomness sets are arranged in a descending order according to the values of the attribute randomness, and a preset number of characteristic attributes are selected from the second randomness sets after the second randomness sets are arranged in the descending order as characteristic attribute vectors; converting the network traffic data into a feature vector to be trained based on the feature attribute vector; and sequentially carrying out dimensionality reduction processing and standard normal transformation processing on the characteristic vector to be trained to obtain the training characteristic vector.
In the implementation mode, the second randomness set with the attribute randomness values being earlier is selected to determine the feature attribute vector, so that the high-quality feature attributes which have a large influence on network intrusion judgment can be rapidly and accurately screened out, feature extraction and processing are performed again based on the feature attribute vector, and finally the obtained training feature vector has higher feature attribute quality, so that the detection accuracy of the generated network intrusion detection model is improved.
The embodiment of the application also provides a network intrusion detection method, which comprises the following steps: acquiring network flow data; converting the network traffic data into at least one feature vector, wherein each feature vector corresponds to at least one feature attribute; converting the at least one feature vector into an input feature vector; inputting the input feature vector into the network intrusion detection model obtained by the network intrusion detection model construction method, and determining whether network intrusion exists in the network traffic data based on an output result of the network intrusion detection model.
In the implementation mode, the feature attributes are optimized according to the uncertainty of the feature attributes, the low-quality feature attributes are eliminated, the high-dimensional feature vectors used in network intrusion detection are firstly subjected to feature attribute screening, then feature selection and subsequent processing are carried out according to the screening result, then machine learning model training is carried out to obtain a network intrusion detection model for network intrusion detection, before the feature vectors are processed, the feature attributes are firstly optimized according to the uncertainty of the feature attributes, the low-quality feature attributes are eliminated, the phenomenon that the high-quality feature vectors are discarded due to the interference of the low-quality feature vectors in the feature selection and subsequent processing stages is avoided, therefore, the feature vectors with higher quality can be extracted in a dimensionality reduction mode, meanwhile, the network intrusion detection model obtained by training based on the same principle is adopted to carry out network intrusion detection, the algorithm is simple in calculation, the algorithm is simple, the performance is better, and the accuracy and the efficiency of network intrusion detection are improved.
The embodiment of the present application further provides a device for constructing a network intrusion detection model, the device includes: the training data acquisition module is used for acquiring network flow data; the first characteristic acquisition module is used for converting the network traffic data into at least one characteristic vector, and each characteristic vector corresponds to at least one characteristic attribute; the training feature vector acquisition module is used for determining a training feature vector based on the randomness of each feature attribute; and the model training module is used for performing machine learning model training by adopting the training characteristic vectors to obtain a network intrusion detection model.
In the implementation mode, the feature attribute optimization is carried out aiming at the uncertainty of the feature attribute, the low-quality feature attribute is excluded, the feature attribute screening is carried out on the high-dimensional feature vector used in the network intrusion detection, then the feature selection and the subsequent processing are carried out according to the screening result, then, machine learning model training is carried out to obtain a network intrusion detection model for network intrusion detection, before processing the feature vector, feature attribute optimization is firstly carried out according to the uncertainty of the feature attribute, low-quality feature attributes are eliminated, the phenomenon that high-quality feature vectors are discarded due to the interference of the low-quality feature vectors in the stages of feature selection and subsequent processing is avoided, and therefore the feature vectors with higher quality can be extracted in a dimension reduction way, meanwhile, the algorithm is simple in calculation and good in performance, and the accuracy and the efficiency of the generated network intrusion detection model are improved.
Optionally, the first feature obtaining module is specifically configured to: converting the network traffic data into the at least one feature vector in units of sessions; and marking the characteristic attribute and the attack category of each characteristic vector in the at least one characteristic vector.
In the implementation mode, the feature attributes and the attack categories are labeled on the feature vectors, processing basis is provided for screening of subsequent feature attributes, and feature screening efficiency and accuracy can be improved.
Optionally, the training feature vector obtaining module is specifically configured to: respectively determining attribute randomness of each characteristic attribute corresponding to each attack category; and determining a training feature vector based on the attribute randomness of all the feature vectors corresponding to each feature attribute.
In the implementation mode, the relevance between the characteristic attribute and the attack category is determined through the randomness of the characteristic attribute corresponding to the attack category, and then the training characteristic vector is determined based on the randomness, so that the characteristic vector with high characteristic attribute phase quality can be accurately extracted for model training, and the detection accuracy of the generated network intrusion detection model is improved.
Optionally, the training feature vector obtaining module is specifically configured to: respectively determining first randomness of each feature vector corresponding to each attack category; dividing the feature vector corresponding to each feature attribute into a plurality of first randomness sets based on the value of the first randomness; determining second randomness, corresponding to each attack category, of each characteristic attribute based on the plurality of first randomness sets; and determining the attribute randomness of each characteristic attribute corresponding to different attack categories based on the value of the second randomness.
In the implementation mode, the first randomness set is carried out on the feature vectors based on the randomness of the feature vectors corresponding to the attack categories, then the second randomness of the feature attributes corresponding to the attack categories is determined in each first randomness set, the influence degree of the feature attributes on the judgment of the attack categories when the randomness of the feature vectors corresponding to the attack categories is the same is reflected, and then the attribute randomness is determined, so that the attribute randomness can more accurately reflect the quality of the feature attributes, and the detection accuracy of the generated network intrusion detection model is improved.
Optionally, the training feature vector obtaining module is specifically configured to: dividing all the characteristic attributes into a plurality of second randomness sets based on the values of the attribute randomness; the second randomness sets are arranged in a descending order according to the values of the attribute randomness, and a preset number of characteristic attributes are selected from the second randomness sets after the second randomness sets are arranged in the descending order as characteristic attribute vectors; converting the network traffic data into a feature vector to be trained based on the feature attribute vector; and sequentially carrying out dimensionality reduction processing and standard normal transformation processing on the characteristic vector to be trained to obtain the training characteristic vector.
In the implementation mode, the second randomness set with the attribute randomness values being earlier is selected to determine the feature attribute vector, so that the high-quality feature attributes which have a large influence on network intrusion judgment can be rapidly and accurately screened out, feature extraction and processing are performed again based on the feature attribute vector, and finally the obtained training feature vector has higher feature attribute quality, so that the detection accuracy of the generated network intrusion detection model is improved.
An embodiment of the present application further provides a network intrusion detection apparatus, where the apparatus includes: the detection data acquisition module is used for acquiring network flow data; the second characteristic acquisition module is used for converting the network flow data into at least one characteristic vector, and each characteristic vector corresponds to at least one characteristic attribute; an input feature vector obtaining module, configured to convert the at least one feature vector into an input feature vector; and the intrusion detection module is used for inputting the input characteristic vectors into the network intrusion detection model obtained by the network intrusion detection model construction device and determining whether network intrusion exists in the network flow data or not based on the output result of the network intrusion detection model.
In the implementation mode, the feature attributes are optimized according to the uncertainty of the feature attributes, the low-quality feature attributes are eliminated, the high-dimensional feature vectors used in network intrusion detection are firstly subjected to feature attribute screening, then feature selection and subsequent processing are carried out according to the screening result, then machine learning model training is carried out to obtain a network intrusion detection model for network intrusion detection, before the feature vectors are processed, the feature attributes are firstly optimized according to the uncertainty of the feature attributes, the low-quality feature attributes are eliminated, the phenomenon that the high-quality feature vectors are discarded due to the interference of the low-quality feature vectors in the feature selection and subsequent processing stages is avoided, therefore, the feature vectors with higher quality can be extracted in a dimensionality reduction mode, meanwhile, the network intrusion detection model obtained by training based on the same principle is adopted to carry out network intrusion detection, the algorithm is simple in calculation, the algorithm is simple, the performance is better, and the accuracy and the efficiency of network intrusion detection are improved.
An embodiment of the present application further provides an electronic device, where the electronic device includes a memory and a processor, where the memory stores program instructions, and the processor executes steps in any one of the above implementation manners when reading and executing the program instructions.
The embodiment of the present application further provides a readable storage medium, in which computer program instructions are stored, and the computer program instructions are read by a processor and executed to perform the steps in any of the above implementation manners.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.
Fig. 1 is a schematic flowchart of a method for constructing a network intrusion detection model according to an embodiment of the present application.
Fig. 2 is a flowchart illustrating an attribute randomness determining step according to an embodiment of the present application.
Fig. 3 is a schematic flowchart of a training feature vector determining step according to an embodiment of the present disclosure.
Fig. 4 is a flowchart illustrating a network intrusion detection method according to an embodiment of the present application.
Fig. 5 is a schematic block diagram of a network intrusion detection model building apparatus according to an embodiment of the present application.
Fig. 6 is a schematic block diagram of a network intrusion detection device according to an embodiment of the present disclosure.
Icon: 30-a network intrusion detection model construction device; 31-a training data acquisition module; 32-a first feature acquisition module; 33-a training feature vector acquisition module; 34-a model training module; 40-a network intrusion detection device; 41-detection data acquisition module; 42-a second feature acquisition module; 43-input feature vector acquisition module; 44-intrusion detection module.
Detailed Description
The technical solution in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
The research of the applicant discovers that the characteristics selected manually based on the machine learning method are more, the performance is seriously reduced due to the fact that dimension is not reduced, overfitting can be caused, the weight characteristics of the attributes can be lost due to the fact that dimension reduction is directly performed on the characteristic attributes, some original important attributes are discarded, the attribute characteristic vectors during model training can be not accurate enough, and the accuracy of the model is finally influenced. On the other hand, the method based on deep learning performs feature self-learning, firstly, network traffic needs to be converted into pictures or data sequences, and loss of original traffic information can be caused in the conversion process, so that the quality of training data is deteriorated, and finally the accuracy of the model is influenced.
In order to solve the above problems in the prior art, an embodiment of the present application provides a method for constructing a network intrusion detection model. Referring to fig. 1, fig. 1 is a schematic flow chart of a method for building a network intrusion detection model according to an embodiment of the present application, where the method for building a network intrusion detection model includes the following specific steps:
step S12: and acquiring network flow data.
Since the network intrusion detection model construction method inputs the training and construction stages of the network intrusion detection model, the offline data with the common category label can be used as the input network traffic data in this embodiment.
In particular, the network traffic data may be a classified labeled attackHit class of process characteristics analysis software package (pcap) formatted data sets. Alternatively, the attack category in this embodiment may be Lable0,Label1,Label2,…,LabelL-1Is shown in which Lable0The normal category of non-attack is shown, and the total of L attack categories.
Optionally, the Attack categories may include common vulnerability spillover, scanning, extraction, and Distributed Denial of Service Attack (DDoS).
Step S14: and converting the network traffic data into at least one feature vector, wherein each feature vector corresponds to at least one feature attribute.
Reading and analyzing all network traffic data in the pcap, and converting the network traffic data into at least one feature vector by taking a Session (Session) as a unit.
Further, in order to perform feature attribute screening subsequently, in this embodiment, feature attributes and categories of feature vectors are labeled, and the feature vector corresponding to each session may be represented as follows: v ═ V1,V2,V3,…Vv,LaveliV represents the number of original feature attributes, ViRepresenting a characteristic attribute i, LabeliRepresenting an attack category i.
Commonly used feature attributes are more than one hundred dimensions, and optionally, the feature attributes may include: the number of uplink packets, the number of downlink packets, the number of uplink bytes, the number of downlink bytes, the average value of uplink packet length, the average value of downlink packet length, the distribution of the first 50 bytes, the value of N-Gram (Chinese language model), and the like.
Step S16: a training feature vector is determined based on the randomness of each feature attribute.
For step S16, it is for each feature attribute V in the feature vector generated in step S14iAnd optimizing the characteristic attributes based on the uncertainty of the value range of each characteristic attribute in the whole data set. The basic principle of the optimization is as follows: given a characteristic attribute ViObtaining the attack category value of all data in the data set in the characteristic attribute, if all data in the whole data set are in the attack category valueThe larger the randomness of the attack category value of the data on the attribute (the larger the uncertainty is), the better the effect of the characteristic attribute in the classification model is shown, so the characteristic attribute is judged to be the high-quality characteristic attribute; if all the values of the attribute are very similar or have small randomness, the effect of the attribute in the classification model is weak, and therefore the characteristic attribute is judged to be the low-quality characteristic attribute.
Specifically, step S16 may include the following sub-steps:
step S161: and respectively determining the attribute randomness of each characteristic attribute corresponding to each attack category.
Further, referring to fig. 2, fig. 2 is a schematic flowchart of a step of determining attribute randomness provided in an embodiment of the present application, where the step of determining attribute randomness specifically includes the following steps:
step S1611: and respectively determining the first randomness of each feature vector corresponding to each attack category.
In this embodiment, D represents the data set of the whole feature vector, and the value is D1,D2,…DN,DiRepresenting the i-th feature vector in the sample set, N representing the total amount of data in the data set, diRepresenting the number of eigenvectors with attack type i in D, giA first randomness of an attack class i in D,
Figure BDA0002576049160000101
wherein,
Figure BDA0002576049160000102
then G { G } is adopted0,g1,…,gL-1Denotes a data set g0,g1,…,gL-1Randomness of data sets corresponding to different attack categories:
Figure BDA0002576049160000103
step S1612: and dividing the feature vector corresponding to each feature attribute into a plurality of first randomness sets based on the value of the first randomness.
TkA set of characteristic vectors, T, of value k representing a first randomness in the dataseti,kRepresentation feature attribute ViThe first randomness in (1) is a set of characteristic vectors with a value of k, Ti,kI.e. one of the first set of randomness.
Step S1613: and determining second randomness corresponding to each characteristic attribute of each attack category based on the plurality of first randomness sets.
Gi,kIs shown at Ti,kThe characteristic attribute V is corresponding to each attack category in the setiThe second randomness of (2) is calculated in the same manner as the randomness calculation described above.
Step S1614: and determining attribute randomness of each characteristic attribute corresponding to different attack categories based on the value of the second randomness.
By using GiAttribute V for representing corresponding characteristics of attack categoryiRandomness of (2) to
Figure BDA0002576049160000111
Figure BDA0002576049160000112
Wherein G is G { G ═ G0,g1,…,gL-1}。
Using E (i) to represent the characteristic attribute ViThe attribute randomness corresponding to the attack category, e (i) ═ G-Gi
Step S162: and determining training feature vectors based on the attribute randomness of all the feature vectors corresponding to each feature attribute.
Referring to fig. 3, fig. 3 is a schematic flowchart illustrating a training feature vector determining step according to an embodiment of the present disclosure. Specifically, the training feature vector determination step S162 may include the following sub-steps:
step S1621: and dividing all the characteristic attributes into a plurality of second randomness sets based on the values of the attribute randomness.
The second randomness set may be denoted as e ═ e1,e2,…,ev}。
Step S1622: and performing descending arrangement on the plurality of second randomness sets according to the values of the attribute randomness, and selecting the characteristic attributes with the preset number from the plurality of second randomness sets after the arrangement as characteristic attribute vectors.
The feature attribute set in descending order is defined as E ═ E1,E2,…,Ev}。
It should be understood that the value of the preset number may be adjusted according to the specific network intrusion detection requirement, and in this embodiment, it may be, but is not limited to, H, where H is a positive integer.
The feature attribute vector may be expressed as NV ═ NV1,NV2,…,NVH},1<H<v。
Step S1623: and converting the network traffic data into a characteristic vector to be trained based on the characteristic attribute vector.
Step S1624: and sequentially carrying out dimensionality reduction processing and standard normal transformation processing on the training feature vector to be determined to obtain the training feature vector.
Optionally, in this embodiment, the dimensionality of the training feature vector to be determined may be reduced by using a Principal Component Analysis (PCA) dimensionality reduction method. The PCA transforms the original data into a group of representations which are linearly independent of each dimension through linear transformation, can be used for extracting main characteristic components of the data, is commonly used for dimensionality reduction of high-dimensional data, and can retain most characteristics and reduce dimensionality simultaneously.
Wherein, Standard Normal transformation (SNV) is also called Standard positive-phase variance correction or normalization processing, and after the Standard Normal transformation, the feature attributes in the feature vector to be determined are screened again to generate a training feature vector SNV ═ SNV { (SNV)1,SNV2,…,NVP},1<P<H。
Step S18: and training a machine learning model by adopting the training characteristic vector to obtain a network intrusion detection model.
And processing the training characteristic vector SNV in an SNV characteristic attribute mode to generate a training characteristic vector set based on the SNV, and sending the training characteristic vector set into a machine learning model for training to obtain a network intrusion detection model.
Optionally, the Machine learning model in this embodiment may be based on a supervised Machine learning algorithm such as a random forest, a Support Vector Machine (SVM), a Gradient Boost iterative Decision Tree (GBDT), and the like, and a learning model thereof.
After obtaining the network intrusion detection model, when performing real-time network intrusion detection, it is further necessary to input network traffic data into the network intrusion detection model for detection, so that this embodiment provides a network intrusion detection method, please refer to fig. 4, where fig. 4 is a schematic flow diagram of a network intrusion detection method provided in this embodiment of the present application, and the specific steps of the network intrusion detection method may be as follows:
step S21: and acquiring network flow data.
Step S22: and converting the network traffic data into at least one feature vector, wherein each feature vector corresponds to at least one feature attribute.
Step S23: the at least one feature vector is converted into an input feature vector.
Alternatively, the feature vector conversion may be vectorization processing based on the SNV feature vector.
Step S24: and inputting the input characteristic vector into a network intrusion detection model, and determining whether network intrusion exists in the network flow data based on an output result of the network intrusion detection model.
Based on the network intrusion detection model construction method and the network intrusion detection method provided by the embodiment of the application, before the dimensionality reduction of high-dimensional network traffic data is carried out, the feature attributes are firstly screened and filtered based on the feature attribute randomness, the low-value feature attributes are discarded, and the high-value feature attributes are reserved, so that the feature selection can be rapidly, efficiently and accurately realized, the feature attribute relevance of original traffic data is reserved, and the online detection performance of an intrusion detection system based on machine learning is improved under the condition that the preparation rate of a detection model is ensured.
In order to cooperate with the method for constructing a network intrusion detection model provided in this embodiment, an apparatus 30 for constructing a network intrusion detection model is also provided in this embodiment. Referring to fig. 5, fig. 5 is a schematic block diagram of a network intrusion detection model building apparatus according to an embodiment of the present disclosure.
The network intrusion detection model building apparatus 30 includes:
a training data obtaining module 31, configured to obtain network traffic data;
a first feature obtaining module 32, configured to convert the network traffic data into at least one feature vector, where each feature vector corresponds to at least one feature attribute;
a training feature vector obtaining module 33, configured to determine a training feature vector based on randomness of each feature attribute;
and the model training module 34 is configured to perform machine learning model training by using the training feature vectors to obtain a network intrusion detection model.
Optionally, the first feature obtaining module 32 is specifically configured to: converting the network traffic data into at least one feature vector by taking a session as a unit; and marking the characteristic attribute and the attack category of each characteristic vector in the at least one characteristic vector.
Optionally, the training feature vector obtaining module 33 is specifically configured to: respectively determining attribute randomness of each characteristic attribute corresponding to each attack category; and determining training feature vectors based on the attribute randomness of all the feature vectors corresponding to each feature attribute.
Optionally, the training feature vector obtaining module 33 is specifically configured to: respectively determining first randomness of each feature vector corresponding to each attack category; dividing the feature vector corresponding to each feature attribute into a plurality of first randomness sets based on the value of the first randomness; determining second randomness, corresponding to each attack category, of each characteristic attribute based on the plurality of first randomness sets; and determining attribute randomness of each characteristic attribute corresponding to different attack categories based on the value of the second randomness.
Optionally, the training feature vector obtaining module 33 is specifically configured to: dividing all the characteristic attributes into a plurality of second randomness sets based on the values of the attribute randomness; the second randomness sets are arranged in a descending order according to the values of the attribute randomness, and the characteristic attributes with the preset number are selected from the second randomness sets after the second randomness sets are arranged in the descending order as characteristic attribute vectors; converting the network traffic data into a characteristic vector to be trained based on the characteristic attribute vector; and sequentially carrying out dimensionality reduction processing and standard normal transformation processing on the training feature vector to be determined to obtain the training feature vector.
In order to cooperate with the network intrusion detection method provided in this embodiment, an embodiment of the present application further provides a network intrusion detection device 40. Referring to fig. 6, fig. 6 is a schematic block diagram of a network intrusion detection device according to an embodiment of the present disclosure.
The network intrusion detection device 40 includes:
a detection data obtaining module 41, configured to obtain network traffic data;
a second feature obtaining module 42, configured to convert the network traffic data into at least one feature vector, where each feature vector corresponds to at least one feature attribute;
an input feature vector obtaining module 43, configured to convert at least one feature vector into an input feature vector;
and the intrusion detection module 44 is configured to input the input feature vector into a network intrusion detection model obtained by the network intrusion detection model building device, and determine whether network intrusion exists in the network traffic data based on an output result of the network intrusion detection model.
The embodiment of the present application further provides an electronic device, which includes a memory and a processor, where the memory stores program instructions, and when the processor reads and runs the program instructions, the processor executes the steps in any one of the method for building a network intrusion detection model and the method for detecting network intrusion provided by this embodiment.
It should be understood that the electronic device may be a Personal Computer (PC), a tablet PC, a smart phone, a Personal Digital Assistant (PDA), or other electronic device having a logical computing function.
The embodiment of the application also provides a readable storage medium, wherein computer program instructions are stored in the readable storage medium, and the computer program instructions are read by a processor and run to execute the steps in the network intrusion detection model building method or the network intrusion detection method.
To sum up, the embodiment of the present application provides a network intrusion detection method, a model construction method, an apparatus and an electronic device, wherein the network intrusion model construction method includes: acquiring network flow data; converting the network traffic data into at least one feature vector, wherein each feature vector corresponds to at least one feature attribute; determining a training feature vector based on the randomness of each feature attribute; and performing machine learning model training by adopting the training characteristic vector to obtain a network intrusion detection model.
In the implementation mode, the feature attribute optimization is carried out aiming at the uncertainty of the feature attribute, the low-quality feature attribute is excluded, the feature attribute screening is carried out on the high-dimensional feature vector used in the network intrusion detection, then the feature selection and the subsequent processing are carried out according to the screening result, then, machine learning model training is carried out to obtain a network intrusion detection model for network intrusion detection, before processing the feature vector, feature attribute optimization is firstly carried out according to the uncertainty of the feature attribute, low-quality feature attributes are eliminated, the phenomenon that high-quality feature vectors are discarded due to the interference of the low-quality feature vectors in the stages of feature selection and subsequent processing is avoided, and therefore the feature vectors with higher quality can be extracted in a dimension reduction way, meanwhile, the algorithm is simple in calculation and good in performance, and the accuracy and the efficiency of the generated network intrusion detection model are improved.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. The apparatus embodiments described above are merely illustrative, and for example, the block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of devices according to various embodiments of the present application. In this regard, each block in the block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams, and combinations of blocks in the block diagrams, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Therefore, the present embodiment further provides a readable storage medium, in which computer program instructions are stored, and when the computer program instructions are read and executed by a processor, the computer program instructions perform the steps of any of the block data storage methods. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a RanDom Access Memory (RAM), a magnetic disk, or an optical disk.
The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Claims (10)

1. A method for constructing a network intrusion detection model is characterized by comprising the following steps:
acquiring network flow data;
converting the network traffic data into at least one feature vector, wherein each feature vector corresponds to at least one feature attribute;
determining a training feature vector based on the randomness of each feature attribute;
and performing machine learning model training by adopting the training characteristic vector to obtain a network intrusion detection model.
2. The method of claim 1, wherein converting the network traffic data into at least one feature vector comprises:
converting the network traffic data into the at least one feature vector in units of sessions;
and marking the characteristic attribute and the attack category of each characteristic vector in the at least one characteristic vector.
3. The method of claim 1 or 2, wherein the determining a training feature vector based on the randomness of the feature attributes comprises:
respectively determining attribute randomness of each characteristic attribute corresponding to each attack category;
and determining a training feature vector based on the attribute randomness of all the feature vectors corresponding to each feature attribute.
4. The method of claim 3, wherein the separately determining the attribute randomness of each feature attribute corresponding to the attack category comprises:
respectively determining first randomness of each feature vector corresponding to each attack category;
dividing the feature vector corresponding to each feature attribute into a plurality of first randomness sets based on the value of the first randomness;
determining second randomness, corresponding to each characteristic attribute, of each attack category based on the plurality of first randomness sets;
and determining the attribute randomness of each characteristic attribute corresponding to different attack categories based on the value of the second randomness.
5. The method of claim 4, wherein the determining training feature vectors based on the attribute randomness comprises:
dividing all the characteristic attributes into a plurality of second randomness sets based on the values of the attribute randomness;
the second randomness sets are arranged in a descending order according to the value of the attribute randomness, and a preset number of characteristic attributes are selected from the second randomness sets after the second randomness sets are arranged in the descending order as characteristic attribute vectors;
converting the network traffic data into a feature vector to be trained based on the feature attribute vector;
and sequentially carrying out dimensionality reduction processing and standard normal transformation processing on the characteristic vector to be trained to obtain the training characteristic vector.
6. A method for network intrusion detection, the method comprising:
acquiring network flow data;
converting the network traffic data into at least one feature vector, wherein each feature vector corresponds to at least one feature attribute;
converting the at least one feature vector into an input feature vector;
inputting the input feature vector into a network intrusion detection model obtained by the network intrusion detection model construction method according to any one of claims 1 to 5, and determining whether network intrusion exists in the network traffic data based on an output result of the network intrusion detection model.
7. A network intrusion detection model building apparatus, the apparatus comprising:
the training data acquisition module is used for acquiring network flow data;
the first characteristic acquisition module is used for converting the network traffic data into at least one characteristic vector, and each characteristic vector corresponds to at least one characteristic attribute;
the training feature vector acquisition module is used for determining a training feature vector based on the randomness of each feature attribute;
and the model training module is used for performing machine learning model training by adopting the training characteristic vectors to obtain a network intrusion detection model.
8. A network intrusion detection device, the device comprising:
the detection data acquisition module is used for acquiring network flow data;
the second characteristic acquisition module is used for converting the network flow data into at least one characteristic vector, and each characteristic vector corresponds to at least one characteristic attribute;
an input feature vector obtaining module, configured to convert the at least one feature vector into an input feature vector;
an intrusion detection module, configured to input the input feature vector into the network intrusion detection model obtained by the network intrusion detection model construction apparatus according to claim 7, and determine whether network intrusion exists in the network traffic data based on an output result of the network intrusion detection model.
9. An electronic device comprising a memory having stored therein program instructions and a processor that, when executed, performs the steps of the method of any of claims 1-6.
10. A storage medium having stored thereon computer program instructions for executing the steps of the method according to any one of claims 1 to 6 when executed by a processor.
CN202010655063.6A 2020-07-08 2020-07-08 Network intrusion detection method, model construction method, device and electronic equipment Pending CN111797997A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010655063.6A CN111797997A (en) 2020-07-08 2020-07-08 Network intrusion detection method, model construction method, device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010655063.6A CN111797997A (en) 2020-07-08 2020-07-08 Network intrusion detection method, model construction method, device and electronic equipment

Publications (1)

Publication Number Publication Date
CN111797997A true CN111797997A (en) 2020-10-20

Family

ID=72809753

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010655063.6A Pending CN111797997A (en) 2020-07-08 2020-07-08 Network intrusion detection method, model construction method, device and electronic equipment

Country Status (1)

Country Link
CN (1) CN111797997A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113553589A (en) * 2021-07-30 2021-10-26 江苏易安联网络技术有限公司 Extraction method, device and application of malicious software propagation characteristics
US20210374239A1 (en) * 2019-02-15 2021-12-02 Sophos Limited Augmented security recognition tasks

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100115618A1 (en) * 2008-11-03 2010-05-06 Korea University Industry And Academy Collaboration Foundation Method and device for detecting unknown network worms
CN102158486A (en) * 2011-04-02 2011-08-17 华北电力大学 Method for rapidly detecting network invasion
CN103944887A (en) * 2014-03-24 2014-07-23 西安电子科技大学 Intrusion event detection method based on hidden conditional random field
CN104219253A (en) * 2014-10-13 2014-12-17 吉林大学 Multi-step attack alarm associated network service interface development method
CN106656981A (en) * 2016-10-21 2017-05-10 东软集团股份有限公司 Network intrusion detection method and device
CN109962909A (en) * 2019-01-30 2019-07-02 大连理工大学 A kind of network intrusions method for detecting abnormality based on machine learning
CN110188883A (en) * 2019-04-22 2019-08-30 中国移动通信集团河北有限公司 Failure analysis methods, calculate equipment and computer storage medium at device
CN110602120A (en) * 2019-09-19 2019-12-20 国网江苏省电力有限公司信息通信分公司 Network-oriented intrusion data detection method
CN111314329A (en) * 2020-02-03 2020-06-19 杭州迪普科技股份有限公司 Traffic intrusion detection system and method

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100115618A1 (en) * 2008-11-03 2010-05-06 Korea University Industry And Academy Collaboration Foundation Method and device for detecting unknown network worms
CN102158486A (en) * 2011-04-02 2011-08-17 华北电力大学 Method for rapidly detecting network invasion
CN103944887A (en) * 2014-03-24 2014-07-23 西安电子科技大学 Intrusion event detection method based on hidden conditional random field
CN104219253A (en) * 2014-10-13 2014-12-17 吉林大学 Multi-step attack alarm associated network service interface development method
CN106656981A (en) * 2016-10-21 2017-05-10 东软集团股份有限公司 Network intrusion detection method and device
CN109962909A (en) * 2019-01-30 2019-07-02 大连理工大学 A kind of network intrusions method for detecting abnormality based on machine learning
CN110188883A (en) * 2019-04-22 2019-08-30 中国移动通信集团河北有限公司 Failure analysis methods, calculate equipment and computer storage medium at device
CN110602120A (en) * 2019-09-19 2019-12-20 国网江苏省电力有限公司信息通信分公司 Network-oriented intrusion data detection method
CN111314329A (en) * 2020-02-03 2020-06-19 杭州迪普科技股份有限公司 Traffic intrusion detection system and method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张克君等: ""基于DBN和TSVM的混合入侵检测模型研究"", 《计算机应用与软件》, vol. 35, no. 5, pages 313 - 317 *
朱文杰等: ""基于信息熵的SVM入侵检测技术"", 《计算机工程与科学》, pages 47 - 51 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210374239A1 (en) * 2019-02-15 2021-12-02 Sophos Limited Augmented security recognition tasks
US11681800B2 (en) * 2019-02-15 2023-06-20 Sophos Limited Augmented security recognition tasks
CN113553589A (en) * 2021-07-30 2021-10-26 江苏易安联网络技术有限公司 Extraction method, device and application of malicious software propagation characteristics

Similar Documents

Publication Publication Date Title
CN112398779B (en) Network traffic data analysis method and system
CN108737406B (en) Method and system for detecting abnormal flow data
CN112905421B (en) Container abnormal behavior detection method of LSTM network based on attention mechanism
CN111600919B (en) Method and device for constructing intelligent network application protection system model
CN110826060A (en) Visual classification method and device for malicious software of Internet of things and electronic equipment
CN113360912A (en) Malicious software detection method, device, equipment and storage medium
CN111431849B (en) Network intrusion detection method and device
Wang et al. Res-TranBiLSTM: An intelligent approach for intrusion detection in the Internet of Things
CN109033833B (en) Malicious code classification method based on multiple features and feature selection
EP4258610A1 (en) Malicious traffic identification method and related apparatus
WO2022180613A1 (en) Global iterative clustering algorithm to model entities' behaviors and detect anomalies
CN111556016A (en) Network flow abnormal behavior identification method based on automatic encoder
CN112738014A (en) Industrial control flow abnormity detection method and system based on convolution time sequence network
Ustebay et al. Cyber attack detection by using neural network approaches: shallow neural network, deep neural network and autoencoder
CN111259397A (en) Malware classification method based on Markov graph and deep learning
CN111797997A (en) Network intrusion detection method, model construction method, device and electronic equipment
CN108667678A (en) A kind of O&M Log security detection method and device based on big data
CN117220920A (en) Firewall policy management method based on artificial intelligence
Harbola et al. Improved intrusion detection in DDoS applying feature selection using rank & score of attributes in KDD-99 data set
Feng et al. Network protocol recognition based on convolutional neural network
Ao Using machine learning models to detect different intrusion on NSL-KDD
CN117914555A (en) Training and flow detection method and device for intelligent gateway
Hanafi et al. IDSX-Attention: Intrusion detection system (IDS) based hybrid MADE-SDAE and LSTM-Attention mechanism.
CN115622810A (en) Business application identification system and method based on machine learning algorithm
CN115842645A (en) UMAP-RF-based network attack traffic detection method and device and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination