CN111800389A - Port network intrusion detection method based on Bayesian network - Google Patents

Port network intrusion detection method based on Bayesian network Download PDF

Info

Publication number
CN111800389A
CN111800389A CN202010519908.9A CN202010519908A CN111800389A CN 111800389 A CN111800389 A CN 111800389A CN 202010519908 A CN202010519908 A CN 202010519908A CN 111800389 A CN111800389 A CN 111800389A
Authority
CN
China
Prior art keywords
network
bayesian
probability
data
formula
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010519908.9A
Other languages
Chinese (zh)
Inventor
王成
汤文韬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tongji University
Original Assignee
Tongji University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tongji University filed Critical Tongji University
Priority to CN202010519908.9A priority Critical patent/CN111800389A/en
Publication of CN111800389A publication Critical patent/CN111800389A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • G06F18/24155Bayesian classification
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Computer Hardware Design (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention relates to the field of industrial internet, and provides a Bayesian network-based port network intrusion detection method, which comprises the following steps: s1: collecting and preprocessing an abnormal port network flow data set to obtain a network flow characteristic set; s2: constructing and obtaining a Bayesian network model by using the network data packet feature set; s3: inputting a training set and training parameters of the Bayesian network model, and simultaneously obtaining conditional probability parameters of the Bayesian network model by using Bayesian theorem; s4: and detecting an input prediction set by using the conditional probability parameters and the Bayesian theorem to obtain a detection result. The network intrusion detection method based on the Bayesian network model is based on the Bayesian network model, realizes network intrusion detection by modeling network flow behaviors and characteristic attributes, and can perform online dynamic adjustment on the detection model to deal with changes of a network environment, so that the accuracy of detecting and protecting network intrusion and the robustness of the model are improved, and a remarkable effect is finally obtained.

Description

Port network intrusion detection method based on Bayesian network
Technical Field
The invention relates to the field of intrusion detection of industrial internet network security.
Background
In recent years, intrusion detection has become a research hotspot in the industry and academia, and many new technologies, algorithms and systems related to intrusion detection have appeared and will be continuously appeared. According to 2016 ICS-CERT industrial internet security situation report analysis, more than 80% of national key infrastructures depend on industrial internet to realize production process automation, but the existing industrial internet intrusion detection has many problems. With the rise of intelligent electronic terminal equipment, network traffic is increasing explosively. Huge network traffic promotes the convergence of internet economy and entity economy, and a series of network security challenges are faced while enjoying internet dividends. Especially in the field of industrial internet, network security intrusion detection is also important in national guidelines. As the TCP/IP protocol widely used by the current Internet is not designed with pertinence to the security problem at the beginning of design, the security events of the current Internet are layered endlessly. Intrusion detection is becoming the subject of research as an active security technology. At present, some anomaly detection models based on machine learning and even deep learning exist, wherein most learning models are discriminant models based on expectation maximization, and for online network intrusion detection models, the method using models such as deep learning as network intrusion detection is superior to other methods in effects such as numerical indicators, but the deep learning model is a typical black box model whose process is difficult to visualize and interpret, and the result has no good interpretability and does not have enough confidence.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a basic Bayesian network model-based port network intrusion detection method, which is based on a Bayesian network model, realizes intrusion attack detection by synthesizing and modeling port network flow data, can dynamically adjust the detection model on line, and improves the accuracy of intercepting intrusion attacks and the robustness of the model.
In order to achieve the above object, the present invention provides a basic port network intrusion detection method based on a bayesian network model, which comprises the following steps:
s1: firstly, collecting, acquiring and preprocessing a network data packet in a port data interface through packet capturing software (such as tcpdump, wireshark and the like) to obtain a network flow characteristic set;
s2: establishing and obtaining a Bayesian network model by using the port network data packet feature set;
s3: inputting a training set and training parameters of the Bayesian network model, and simultaneously obtaining conditional probability parameters of the Bayesian network model by using Bayesian theorem;
s4: and carrying out intrusion detection on an input prediction set by utilizing the conditional probability parameters and the Bayesian theorem to obtain a detection result.
Preferably, the step of S1 further comprises the steps of:
s11: a cleaning step of the data set is carried out,
solving data inconsistency by filling missing values, discretizing and sign numerating the network flow data to realize data formatting, abnormal data clearing error correction and repeated data deduplication;
s12: a step of data integration, namely a step of data integration,
the network flow data packets captured by the plurality of port data receiving interfaces are unified and stored to form a database;
s13: and normalizing the port network flow data set in the database to form the network flow characteristic set.
Preferably, the step of S2 further comprises the steps of:
s21: acquiring the port network flow characteristic set S, and inputting a candidate characteristic set S', a relation set R, a label attribute Y and a threshold lambda; wherein S' belongs to phi, R belongs to phi, and phi represents an empty set.
S22: calculating and obtaining the characteristic X of the port network flow characteristic set S according to a formula (1)iMutual information amount I with tag attribute Y:
Figure BDA0002531640800000021
wherein, XiDenoted as the ith feature; i is a natural number greater than or equal to 1; y represents a label attribute; x representsIs XiTaking the value of (A); y represents the value of Y; p (x, y) is expressed as the joint probability of x and y; p (x) edge probability expressed as x; p (y) edge probability expressed as y; i (X)i(ii) a Y) is represented by XiAnd the amount of mutual information between Y;
s23: determine the mutual information quantity I (X)i(ii) a Y) whether the value is larger than or equal to a preset threshold value lambda or not; if yes, continuing the subsequent steps;
s24: updating the candidate feature set S' according to equation (2):
S’∶=S’+Xi(2);
updating the network traffic feature set S according to formula (3):
S:=S-Xi(3);
s25: according to the obtained dependency relationship r, r: Xi→Y;
S26: updating the relation set R according to the formula (4);
s27: judging whether the number of the features in the current candidate feature set S' is greater than or equal to 2; if yes, continuing the subsequent steps, otherwise returning to the step S23;
s28: calculating the mutual information quantity between every two features in the current selected feature set S' according to a formula (5):
Figure BDA0002531640800000031
wherein, XiDenoted as the i-th feature, X, in SjExpressed as the jth feature in S', i, j are natural numbers greater than 0; x is represented by XiTaking the value of (A); x' is represented by XjTaking the value of (A); p (x, x ') is represented as the joint probability of x and x'; p (x) represents the edge probability of x; p (x ') represents the edge probability of x'; i (X)i;Xj) Is represented by XiAnd XjThe mutual information quantity between; updating the relation set R through formula (4);
s29: assigning the current S 'to S and clearing the set S'; calculating the mutual information quantity between every two characteristics of the set S through the formula (5), if I (X)i;Xj) ≧ λ, then based on a priori knowledge,determining a dependency relationship R between every two characteristics, and updating the relationship set R through a formula (4);
s30: repeat step S28 until S is null or all features of I (X)i;Xj) And (5) lambda is less than or equal to lambda, and the Bayesian network model is obtained according to the current relation set R.
Preferably, the step of S3 further comprises the steps of:
s31: inputting a training set, wherein the training set comprises characteristic attributes and label attributes;
s32: the conditional probability parameter is obtained by calculation according to the formula (6):
Figure BDA0002531640800000032
wherein A isiAn ith parent node representing the Bayesian network model; b is represented by AiA child node of (1); p is a radical oftrain(Ai| B) is represented as AiA conditional probability parameter between B and B; p (A)i) Is shown as AiThe edge probability of (1); p (B | A)i) Expressed under the condition AiProbability of occurrence of the next B event; a. thejDenoted as jth parent node; p (A)j) Is shown as AjThe edge probability of (1); p (B | A)j) Is shown in condition AjProbability of occurrence of temporal event B;
s33: the judgment formula (6) can judge whether to converge, if so, the subsequent steps are continued, otherwise, the step S31 is returned to.
Preferably, the step of S4 further comprises the steps of:
s41: inputting a test set, wherein the test set comprises a characteristic attribute Y';
s42: calculating to obtain posterior probability according to formula (7), and outputting the prediction result according to the posterior probability;
Figure BDA0002531640800000041
wherein, p (Y' | X)1,…,Xn) Is shown under condition X1,…,XnProbability of occurrence of the lower Y' event; p (X)1,…,XnY') is represented as event X at condition Y1,…,XnA joint probability of occurrence; p (Y ') is represented as the edge probability of event Y'; p (X)1,…,Xn) Represents an event X1,…,XnThe joint probability of (c).
Preferably, the step S4 is followed by the step of:
s5: and verifying the prediction result.
Preferably, the step of S5 further comprises the steps of:
s51: counting according to the detection result to obtain a total number TP of positive classes, a total number FP of positive classes, a total number FN of negative classes and a total number TN of negative classes of the model of formula (7);
s52: precision is calculated according to a formula (8):
Figure BDA0002531640800000042
calculating the recall recalls according to a formula (9):
Figure BDA0002531640800000043
the disturbance ratio disturbance is calculated according to a formula (10):
Figure BDA0002531640800000044
s53: and evaluating the detection result according to the accuracy, the recall rate and the disturbance rate.
Due to the adoption of the technical scheme, the method has the following beneficial effects:
compared with the black box property of deep learning, the Bayesian network model based on the Bayesian theorem often has strong interpretability and reasoning property when predicting port network data; the Bayesian network model trains the model by using the training set to obtain conditional probability parameters, and when the test set is predicted, the prior knowledge and the conditions of the test set can be used to obtain the conditional probability and finally deduce the posterior probability, so that the result has strong confidence and obedience; and the bayesian network model can handle situations where hidden variables are present. The Bayesian network model is used as a basis, so that the interpretability and the reasoning of the model are improved, the parameter adjustment of the model can be dynamically made according to the change of a specific network environment, and the detection of abnormal flow and intrusion detection and the protection of the network security of industrial internet enterprises are better guaranteed.
Based on the data characteristics of port network flow, the Bayesian network model is also the most suitable high-robustness model with strong reasoning capability and interpretation capability.
Drawings
Fig. 1 is a general flowchart of a network intrusion detection method based on a bayesian network model according to an embodiment of the present invention;
fig. 2 is a diagram illustrating a bayesian network model obtained by modeling network traffic data according to an embodiment of the present invention;
Detailed Description
The following description of the preferred embodiment of the present invention will be given in conjunction with the accompanying drawings 1-2, and will be given in detail to better understand the functions and features of the present invention. In this embodiment, the algorithm environment is based on: python, Pgmpy bayes network model, Pandas analytical library and Numpy library.
In this embodiment, the multiple data sources specifically refer to data sets obtained by multiple port data receiving interfaces; the network flow data specifically refers to network data packets captured from the port data interface through tcpdump and wireshark packets.
Referring to fig. 1 to fig. 2, a method for detecting an intrusion in a port network based on a bayesian network according to an embodiment of the present disclosure includes the steps of:
s1: firstly, network data packets in a port data interface are collected, acquired and preprocessed through packet capturing software (such as wireshark), and a network traffic characteristic set S is obtained.
Wherein the step of S1 further comprises the steps of:
s11: a data cleaning step, namely, the formatting of data, the removal error correction of abnormal data and the removal of repeated data are realized by filling missing values, smoothing noise and identifying and solving data inconsistency on network flow data;
s12: a data integration step, namely uniformly storing the network traffic data of a plurality of data sources to form a database;
s13: and normalizing the network traffic data in the database to form a required network traffic characteristic set.
The traffic occurring in the real network environment needs to be captured first, which requires tcpdump. tcpdump is a tool for intercepting network packets and outputting the contents of the packet. By means of powerful functions and flexible interception strategies, the method becomes a preferred tool for network analysis and problem troubleshooting under a UNIX-like system. The specific information is shown in the library official website https:// www.tcpdump.org/. Although a large number of network traffic data packets can be collected through the network traffic monitoring device and the packet capturing software of the port, the data in the real port network environment is mostly incomplete and inconsistent dirty data, and the characteristics and data formats necessary for establishing a Bayesian network model are lacked, so that the original data cannot be directly involved in the calculation of the Bayesian network, and therefore, the original data must be preprocessed. (1) Data cleaning: the data is cleaned up by filling in missing values, smoothing out noisy data, identifying or resolving inconsistencies. The following objectives are mainly achieved: formatting standard (such as time and the like) of data, clearing abnormal data, correcting errors and removing duplicate data; (2) data integration: the data integration mainly comprises the steps of merging and uniformly storing data in a plurality of port data receiving interfaces and establishing a database; (3) extracting characteristics: the characteristic form required by the learning model is extracted from the original network packet data by the modes of original network packet data, data calculation, characteristic extraction and the like. After the flow data of the port is simply preprocessed, the data is used for training a model to classify, and whether the flow data is abnormal or not is judged. Since the data set used by the training model must have some standard features, the captured traffic information needs to be converted to the format of the KDD99 data set according to the international standard format. Here, an open source software package kdd99_ feature _ extra is used, whose Github address is: https:// github. com/AI-IDS/kdd99_ feature _ extra. The file name of a pcap file generated by the tcpdump capturing flow is transmitted to a software package as a parameter of KDD99_ feature _ extra, and then the flow information conforming to the KDD99 data set format can be output. It is worth noting that kdd99_ feature _ extra is built in with a traffic capture tool based on libpcap, and tcpdump is not needed to be used separately for traffic capture, which simplifies the operation flow. (4) Data transformation: the data are converted into a form required by a learning model through modes of smooth aggregation, data generalization, normalization, symbol numeralization, discretization and the like. The data set fields after port flow characteristic extraction and the preprocessed types are shown in table 1.
TABLE 1 Port traffic feature extracted dataset fields
Name of field Data type Field description Type after pretreatment
Duration Continuous Duration of connection Dispersing
protocol_type Dispersing Type of protocol Dispersing
service Dispersing Service types of a network Dispersing
flag Dispersing Connected normal or wrong states Dispersing
src_bytes Continuous Number of source bytes Dispersing
dst_bytes Continuous Target number of bytes Dispersing
wrong_fragment Continuous Number of erroneous segments Dispersing
IP Dispersing Whether it is a white list IP Dispersing
urgent Continuous Label for transaction Dispersing
As can be seen from table 1, most of available original fields are continuous variables, and the bayesian network model itself requires that only discrete variables can be processed, so that the preprocessing includes data cleaning and data integration, and in the data transformation process, the continuous floating point number is converted into the discrete variables that the bayesian network model supports computation, so as to construct a network traffic feature set, and the network traffic feature set is divided into a training set S and a test set T.
S2: and establishing and obtaining a Bayesian network model by using the port network flow characteristic set.
And (3) constructing a complete Bayesian network by analyzing the dependency and independence among the features. The Bayesian network is constructed to construct a joint distribution among random variables of data characteristics, and the dependency and the independence are two main properties of the distribution. The independence property is important in answering queries and can be used to radically reduce computational costs of inferences.
Wherein the step of S2 further comprises the steps of:
s21: acquiring a network traffic feature set S from S1, and inputting a candidate feature set S', a relation set R, a label attribute Y and a threshold lambda; wherein S' belongs to phi, R belongs to phi, and phi represents an empty set.
S22: calculating and obtaining the characteristic X of the port network flow data set characteristic set S according to the formula (1)iMutual information amount I with tag attribute Y:
Figure BDA0002531640800000071
wherein, XiRepresents the ith feature; i is a natural number greater than or equal to 1; y represents a label attribute; x represents XiTaking the value of (A); y represents the value of Y; p (x, y) represents the joint probability of x and y; p (x) is the edge probability of x; p (y) is the edge probability of y; i (X)i(ii) a Y) represents XiThe amount of mutual information with Y;
s23: judgment of I (X)i(ii) a Y) is greater thanIs equal to a preset threshold lambda; if yes, continuing the subsequent steps;
s24: updating the candidate feature set S' according to equation (2):
S’∶=S’+Xi(2);
updating a network traffic feature set S according to formula (3):
S:=S-Xi(3);
s25: according to the obtained dependency relationship r, r: Xi→Y (4);
S26: updating the relation set R according to the formula (4);
s27: judging whether the number of the features in the current candidate feature set S' is more than or equal to 2; if yes, continuing the subsequent steps, otherwise returning to the step S23;
s28: calculating the mutual information quantity between every two characteristics in the current candidate characteristic set S' according to a formula (5):
Figure BDA0002531640800000081
wherein, XiDenoted as the i-th feature in S', XjExpressed as the jth feature in S', i, j are positive integers greater than 0; x is represented by XiTaking the value of (A); x' is then represented by XjTaking the value of (A); p (x, x ') distribution is represented as the joint probability of x and x'; p (x) is the edge probability denoted x; p (x ') represents the edge probability of x'; i (X)i;Xj) Is represented by XiAnd XjThe mutual information quantity between; updating the relation set R through the formula (4);
s29: assigning the current S 'to the set S, and clearing the set S'; calculating the mutual information quantity between every two characteristics of the set S through the formula (5), if I (X)i;Xj) The method comprises the following steps of (1) determining the dependency relationship R between every two characteristics according to prior knowledge, and updating a relationship set R through a formula (4);
s30: repeat step S28 until set S is empty or all features of I (X)i;Xj) And (5) lambda is less than or equal to lambda, and at the moment, a Bayesian network graph model is obtained according to the current relation set R.
S3: and inputting a training set and training parameters of the Bayesian network model, and obtaining conditional probability parameters of the Bayesian network graph model by using Bayesian theorem.
The main role of this step is to train the parameters in the model. The essence of the Bayesian network model training is that the conditional probability in the Bayesian network, namely the parameter of the model, is deduced by calculating the joint probability of the features, namely the posterior probability, as the condition through counting the edge probability of each feature in the training set and taking the edge probability as the condition and utilizing the Bayesian theorem.
Wherein the step of S3 further comprises the steps of:
s31: inputting a training set T provided by S1, wherein the training set comprises characteristic attributes and label attributes;
s32: and (3) calculating to obtain a conditional probability parameter according to the formula (6):
Figure BDA0002531640800000091
wherein A isiRepresented as the ith parent node of the bayesian network model; b is represented by AiA child node of (1); p is a radical oftrain(Ai| B) is represented as AiA conditional probability parameter between B and B; p (A)i) Is shown as AiThe edge probability of (1); p (B | A)i) Expressed under the condition AiProbability of occurrence of the lower event B; a. thejThen it is denoted as the jth parent node; p (A)j) Is shown as AjThe edge probability of (1); p (B | A)j) Represents the condition AjProbability of occurrence of the lower event B;
s33: and (4) judging whether the formula (6) converges, if so, continuing the subsequent steps, and otherwise, returning to the step S31.
S4: and detecting an input test set by using the Bayesian network model trained in the S3 to obtain a detection result.
The main function of this step is to make a judgment on the unknown record conveniently, that is, for a real-time access record, the model should give a detection result, that is, judge whether the data packet is normal or abnormal attack type. The detection process mainly uses Bayes 'theorem, that is, features in the access record are used as conditions, the conditional probability in the model is used, and the Bayes' theorem is used to infer the posterior probability of the record.
Wherein the step of S4 further comprises the steps of:
s41: inputting the test set T obtained in the S1, wherein the test set comprises a characteristic attribute Y';
s42: the inference by utilizing the Bayesian network is that the posterior probability is deduced by utilizing the conditional probability distribution obtained in the training process and the conditions in the test set; calculating according to a formula (7) to obtain a posterior probability, and outputting a prediction result according to the posterior probability;
Figure BDA0002531640800000092
wherein, p (Y' | X)1,…,Xn) Is shown under condition X1,…,XnProbability of occurrence of the lower event Y'; p (X)1,…,XnY') represents event X when the condition is Y1,…,XnA joint probability of occurrence; p (Y ') represents the edge probability of Y'; p (X)1,…,Xn) Represents X1,…,XnThe joint probability between them.
S5: and verifying the detection result.
Wherein the step of S5 further comprises the steps of:
s51: counting the total number TP of the positive classes, the total number FP of the negative classes, the total number FN of the positive classes and the total number TN of the negative classes according to the detection results, wherein the total number TP of the positive classes is judged as the positive classes, the total number FP of the negative classes is judged as the positive classes, and the total number TN of the negative classes is judged as the negative classes;
s52: obtaining precision through calculation according to the formula (8):
Figure BDA0002531640800000101
calculating and obtaining a recall rate call according to the formula (9):
Figure BDA0002531640800000102
calculating the disturbance ratio disturbance according to the formula (10):
Figure BDA0002531640800000103
s53: and evaluating the detection result according to the accuracy rate, the recall rate and the disturbance rate.
For example, the detection on the data set collected by the port flow detection device proves that the recall Rate (interception Rate) when the disturbance Rate (disturbance) is less than 1%, 0.5%, 0.1% and 0.05% is obtained, and the performance of the method is evaluated accordingly. According to the verification result of the method on the ocean mountain harbor network traffic data set, the performance of the port network intrusion detection classification method based on the Bayesian network is remarkable.
Referring to the bayesian network model example constructed based on the port traffic data set in fig. 2, in practical use, the method of this embodiment describes a joint probability model between an intrusion type and different data packet attribute characteristics in different port network environments, and when a port network data interface is accessed by different device ip addresses, the request times of the device ip may be a fixed number of times, so that different device ip addresses may present different request modes, and if the protocol and the request times of a certain socket are very frequent, the device ip may be determined as DDOS (denial of service attack); and the access frequency and the requested packet size may exhibit a relatively high correlation (with the label of the DDOS); meanwhile, the host side close to the ip address may also present higher correlation; whether the access process is a common IP also represents the fixed distribution correlation formed by the user when making a network request. In the port network environment, the request habits of different equipment hosts form the network behavior distribution of the different hosts, and if a behavior pattern which is not matched with the previous port equipment request behavior occurs, the port equipment request behavior is judged to be a network attack with a high probability. Compared with the black box property of the traditional deep learning model, the method of the embodiment combines the knowledge related to network security, intrusion detection and abnormal traffic and the assumption of similar network connection to construct the Bayesian network model for describing traffic behavior distribution, and the model has very good interpretable logic.
In addition, the Bayesian network model is used as a prediction model, the situation that hidden variables exist in characteristics can be well processed, a conventional priori assumption can be given through priori knowledge of a port network environment based on the Bayesian network model, namely when the model has unobserved variables, a reasonable estimation can be given through the Bayesian estimation by the state space model, and therefore the method has better robustness.
According to the port network intrusion detection method based on the Bayesian network, the Bayesian network model based on the Bayesian theorem usually has strong interpretability and confidence when data is predicted, and particularly has good prior judgment in an application scene with a stable network environment, such as a port; the Bayesian network model trains the model by using the port network flow training set after feature extraction to obtain conditional probability parameters, and when a test set is detected, the posterior probability is finally deduced by using the prior probability and the condition of the test set to obtain the conditional probability by using the stable characteristic of the port network environment, and the result has strong persuasion; the Bayesian network model can process the situation with hidden variables and deal with various new attacks in an unknown network environment, which cannot be achieved by the existing method based on the discrimination model; therefore, the port network intrusion detection method based on the bayesian network model in the research embodiment has advantages that the existing discrimination model does not have for the port internet. The method overcomes the defects of reasoning and interpretability of the traditional anomaly detection method based on deep learning, improves the interpretability, reasoning and robustness of the model, and better guarantees the detection of flow anomaly and intrusion detection and the protection of network security of port machinery enterprises.
While the present invention has been described in detail and with reference to the embodiments thereof as illustrated in the accompanying drawings, it will be apparent to one skilled in the art that various changes and modifications can be made therein. Therefore, certain details of the embodiments are not to be interpreted as limiting, and the scope of the invention is to be determined by the appended claims.

Claims (7)

1. A network intrusion detection method based on a Bayesian network model applied to an industrial Internet is characterized by comprising the following steps:
s1: firstly, collecting and acquiring and preprocessing a network data packet in a port data interface through packet capturing software (such as wireshark) to obtain a network flow characteristic set;
s2: establishing and obtaining a Bayesian network model by using the network data packet feature set;
s3: inputting a training set and training parameters of the Bayesian network model, and simultaneously obtaining conditional probability parameters of the Bayesian network model by using Bayesian theorem;
s4: and carrying out intrusion detection on an input prediction set by utilizing the conditional probability parameters and the Bayesian theorem to obtain a detection result.
2. The bayesian network model-based network intrusion detection method for industrial internet according to claim 1, wherein the step of S1 further comprises the steps of:
s11: a step of cleaning a data set, which is to solve data inconsistency by filling missing values, discretizing and sign numerating the network flow data to realize data formatting, abnormal data clearing error correction and repeated data deduplication;
s12: integrating data, namely unifying and storing the network flow data of a plurality of data sources to form a database;
s13: and carrying out normalization processing on the network traffic data set in the database to form the network traffic characteristic set.
3. The bayesian network model-based network intrusion detection method for industrial internet according to claim 1, wherein the step of S2 further comprises the steps of:
s21: acquiring the network traffic characteristic set S, and inputting a candidate characteristic set S', a relation set R, a label attribute Y and a threshold lambda; wherein S' belongs to phi, R belongs to phi, and phi represents a null set;
s22: calculating and obtaining the characteristics X of the network flow characteristic set S according to the formula (1)iMutual information amount I with tag attribute Y:
Figure FDA0002531640790000011
wherein, XiDenoted as the ith feature; i is a natural number greater than or equal to 1; y represents a label attribute; x is represented by XiTaking the value of (A); y represents the value of Y; p (x, y) is expressed as the joint probability of x and y; p (x) edge probability expressed as x; p (y) edge probability expressed as y; i (X)i(ii) a Y) is represented by XiAnd the amount of mutual information between Y;
s23: determine the mutual information quantity I (X)i(ii) a Y) whether the value is larger than or equal to a preset threshold value lambda or not; if yes, continuing the subsequent steps;
s24: updating the candidate feature set S' according to equation (2):
S’∶=S’+Xi(2);
updating the network traffic feature set S according to formula (3):
S:=S-Xi(3);
s25: according to the obtained dependency relationship r, r: Xi→Y (4);
S26: updating the relation set R according to the formula (4);
s27: judging whether the number of the features in the current candidate feature set S' is greater than or equal to 2; if yes, continuing the subsequent steps, otherwise returning to the step S23;
s28: calculating the mutual information quantity between every two features in the current selected feature set S' according to a formula (5):
Figure FDA0002531640790000021
wherein, XiDenoted as the i-th feature, X, in SjExpressed as the jth feature in S', i, j are natural numbers greater than 0; x is represented by XiTaking the value of (A); x' is represented by XjTaking the value of (A); p (x, x ') is represented as the joint probability of x and x'; p (x) represents the edge probability of x; p (x ') represents the edge probability of x'; i (X)i;Xj) Is represented by XiAnd XjThe mutual information quantity between; updating the relation set R through formula (4);
s29: assigning the current S 'to S and clearing the set S'; calculating the mutual information quantity between every two characteristics of the set S through the formula (5), if I (X)i;Xj) The method comprises the following steps of (1) determining a dependency relationship R between every two characteristics according to prior knowledge, and updating a relationship set R through a formula (4);
s30: repeat step S28 until S is null or all features of I (X)i;Xj) And (5) lambda is less than or equal to lambda, and the Bayesian network model is obtained according to the current relation set R.
4. The bayesian network model-based network intrusion detection method for industrial internet according to claim 1, wherein the step of S3 further comprises the steps of:
s31: inputting a training set, wherein the training set comprises characteristic attributes and label attributes;
s32: the conditional probability parameter is obtained by calculation according to the formula (6):
Figure FDA0002531640790000031
wherein A isiRepresenting the above Bayesian networkThe ith parent node of the network model; b is represented by AiA child node of (1); p is a radical oftrain(Ai| B) is represented as AiA conditional probability parameter between B and B; p (A)i) Is shown as AiThe edge probability of (1); p (B | A)i) Expressed under the condition AiProbability of occurrence of the next B event; a. thejDenoted as jth parent node; p (A)j) Is shown as AjThe edge probability of (1); p (B | A)j) Is shown in condition AjProbability of occurrence of temporal event B;
s33: the judgment formula (6) can judge whether to converge, if so, the subsequent steps are continued, otherwise, the step S31 is returned to.
5. The bayesian network model-based network intrusion detection method for industrial internet according to claim 1, wherein the step of S4 further comprises the steps of:
s41: inputting a test set, wherein the test set comprises a characteristic attribute Y';
s42: calculating to obtain posterior probability according to formula (7), and outputting the prediction result according to the posterior probability;
Figure FDA0002531640790000032
wherein, p (Y' | X)1,…,Xn) Is shown under condition X1,…,XnProbability of occurrence of the lower Y' event; p (X)1,…,XnY') is represented as event X at condition Y1,…,XnA joint probability of occurrence; p (Y ') is represented as the edge probability of event Y'; p (X)1,…,Xn) Represents an event X1,…,XnThe joint probability of (c).
6. The method for detecting network intrusion based on the bayesian network model on the industrial internet as claimed in claim 1, wherein the step of S4 is followed by the step of:
s5: and verifying the prediction result.
7. The Bayesian network model-based network intrusion detection method for industrial Internet application according to claim 6, wherein the S5 further comprises the steps of:
s51: counting according to the detection result to obtain a total number TP of positive classes, a total number FP of positive classes, a total number FN of negative classes and a total number TN of negative classes of the model of formula (7);
s52: precision is calculated according to a formula (8):
Figure FDA0002531640790000033
calculating the recall recalls according to a formula (9):
Figure FDA0002531640790000041
the disturbance ratio disturbance is calculated according to a formula (10):
Figure FDA0002531640790000042
s53: and evaluating the detection result according to the accuracy, the recall rate and the disturbance rate.
CN202010519908.9A 2020-06-09 2020-06-09 Port network intrusion detection method based on Bayesian network Pending CN111800389A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010519908.9A CN111800389A (en) 2020-06-09 2020-06-09 Port network intrusion detection method based on Bayesian network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010519908.9A CN111800389A (en) 2020-06-09 2020-06-09 Port network intrusion detection method based on Bayesian network

Publications (1)

Publication Number Publication Date
CN111800389A true CN111800389A (en) 2020-10-20

Family

ID=72802908

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010519908.9A Pending CN111800389A (en) 2020-06-09 2020-06-09 Port network intrusion detection method based on Bayesian network

Country Status (1)

Country Link
CN (1) CN111800389A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113407907A (en) * 2021-06-04 2021-09-17 电子科技大学 Hierarchical system structure function learning method fusing incomplete monitoring sequence
CN113807453A (en) * 2021-09-24 2021-12-17 沈阳理工大学 Abnormal behavior detection method based on weighted probability fusion parallel Bayesian network
CN114237180A (en) * 2021-12-17 2022-03-25 内蒙古工业大学 Industrial control system attack detection method and device
CN114531283A (en) * 2022-01-27 2022-05-24 西安电子科技大学 Method, system, storage medium and terminal for measuring robustness of intrusion detection model

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101136809A (en) * 2007-09-27 2008-03-05 南京大学 Conditional mutual information based network intrusion classification method of double-layer semi-idleness Bayesian
CN106124175A (en) * 2016-06-14 2016-11-16 电子科技大学 A kind of compressor valve method for diagnosing faults based on Bayesian network
CN109894495A (en) * 2019-01-11 2019-06-18 广东工业大学 A kind of extruder method for detecting abnormality and system based on energy consumption data and Bayesian network
CN109993538A (en) * 2019-02-28 2019-07-09 同济大学 Identity theft detection method based on probability graph model

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101136809A (en) * 2007-09-27 2008-03-05 南京大学 Conditional mutual information based network intrusion classification method of double-layer semi-idleness Bayesian
CN106124175A (en) * 2016-06-14 2016-11-16 电子科技大学 A kind of compressor valve method for diagnosing faults based on Bayesian network
CN109894495A (en) * 2019-01-11 2019-06-18 广东工业大学 A kind of extruder method for detecting abnormality and system based on energy consumption data and Bayesian network
CN109993538A (en) * 2019-02-28 2019-07-09 同济大学 Identity theft detection method based on probability graph model

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113407907A (en) * 2021-06-04 2021-09-17 电子科技大学 Hierarchical system structure function learning method fusing incomplete monitoring sequence
CN113407907B (en) * 2021-06-04 2022-04-12 电子科技大学 Hierarchical system structure function learning method fusing incomplete monitoring sequence
CN113807453A (en) * 2021-09-24 2021-12-17 沈阳理工大学 Abnormal behavior detection method based on weighted probability fusion parallel Bayesian network
CN113807453B (en) * 2021-09-24 2024-01-30 沈阳理工大学 Abnormal behavior detection method based on weighted probability fusion parallel Bayesian network
CN114237180A (en) * 2021-12-17 2022-03-25 内蒙古工业大学 Industrial control system attack detection method and device
CN114237180B (en) * 2021-12-17 2023-10-13 内蒙古工业大学 Industrial control system attack detection method and device
CN114531283A (en) * 2022-01-27 2022-05-24 西安电子科技大学 Method, system, storage medium and terminal for measuring robustness of intrusion detection model
CN114531283B (en) * 2022-01-27 2023-02-28 西安电子科技大学 Method, system, storage medium and terminal for measuring robustness of intrusion detection model

Similar Documents

Publication Publication Date Title
CN111800389A (en) Port network intrusion detection method based on Bayesian network
WO2021073114A1 (en) Abnormal traffic monitoring method, apparatus and device based on statistics, and storage medium
US10637744B2 (en) Complementary workflows for identifying one-hop network behavior and multi-hop network dependencies
CN110995508B (en) KPI mutation-based adaptive unsupervised online network anomaly detection method
WO2021072887A1 (en) Abnormal traffic monitoring method and apparatus, and device and storage medium
CN110895526A (en) Method for correcting data abnormity in atmosphere monitoring system
CN107493277B (en) Large data platform online anomaly detection method based on maximum information coefficient
WO2017087440A1 (en) Anomaly fusion on temporal casuality graphs
JP6564799B2 (en) Threshold determination device, threshold determination method and program
CN113762377B (en) Network traffic identification method, device, equipment and storage medium
CN107528734A (en) A kind of abnormal host group's detection method based on Dynamic Graph
CN113904881B (en) Intrusion detection rule false alarm processing method and device
Alevizos et al. Complex event recognition under uncertainty: A short survey
CN115021997A (en) Network intrusion detection system based on machine learning
Hinder et al. Contrasting Explanation of Concept Drift.
CN117171157B (en) Clearing data acquisition and cleaning method based on data analysis
CN111031042A (en) Network anomaly detection method based on improved D-S evidence theory
CN116545679A (en) Industrial situation security basic framework and network attack behavior feature analysis method
Dong et al. Traffic Characteristic Map-based Intrusion Detection Model for Industrial Internet.
CN114580534A (en) Industrial data anomaly detection method and device, electronic equipment and storage medium
CN117675230A (en) Knowledge-graph-based oil well data integrity identification method
CN114710344A (en) Intrusion detection method based on tracing graph
CN110493264B (en) Internal threat discovery method based on internal network entity relationship and behavior chain
CN115700553A (en) Anomaly detection method and related device
CN117473571B (en) Data information security processing method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20201020

RJ01 Rejection of invention patent application after publication