CN114465784A - Honeypot identification method and device of industrial control system - Google Patents

Honeypot identification method and device of industrial control system Download PDF

Info

Publication number
CN114465784A
CN114465784A CN202210071252.8A CN202210071252A CN114465784A CN 114465784 A CN114465784 A CN 114465784A CN 202210071252 A CN202210071252 A CN 202210071252A CN 114465784 A CN114465784 A CN 114465784A
Authority
CN
China
Prior art keywords
honeypot
bayesian network
industrial control
model
probability
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210071252.8A
Other languages
Chinese (zh)
Inventor
王钢
张立芳
姚旭
孙叶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inner Mongolia University of Technology
Original Assignee
Inner Mongolia University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inner Mongolia University of Technology filed Critical Inner Mongolia University of Technology
Priority to CN202210071252.8A priority Critical patent/CN114465784A/en
Publication of CN114465784A publication Critical patent/CN114465784A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/1491Countermeasures against malicious traffic using deception as countermeasure, e.g. honeypots, honeynets, decoys or entrapment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks

Abstract

The invention relates to the technical field of attack and defense of industrial control systems, and provides a honeypot identification method and a honeypot identification device of an industrial control system, which comprise the following steps: acquiring data of honeypot equipment and industrial control equipment; preprocessing the honeypot equipment and industrial control equipment data to obtain a data attribute column; constructing a Bayesian network structure model according to the data attribute column, and determining the corresponding relation between honeypot characteristics and node states of the Bayesian network structure model; parameter learning is carried out on the Bayesian network structure model, and first probability values of honeypots of the Bayesian network model in different node states are obtained; calculating by inputting the characteristic evidence, using a Bayesian network inference algorithm, and obtaining a probability of identifying the honeypot according to the honeypot first probability value. The invention conforms to honeypot identification through Bayesian network parameter modeling to solve the uncertain problem, and is combined with a Bayesian network reasoning algorithm to greatly improve the accuracy of honeypot identification.

Description

Honeypot identification method and device of industrial control system
Technical Field
The invention relates to the technical field of attack and defense of industrial control systems, in particular to a honeypot identification method and device of an industrial control system.
Background
In the industrial internet era, more and more industrial control devices originally in isolated environments are exposed to the public internet and can be found by anyone, which brings great risks and challenges to safety belts of industrial control systems; at present, the industrial control system and the corresponding communication protocol hardly consider the network and communication security problems at the beginning of design, so that many industrial control systems and the vulnerable devices thereof, especially the devices exposed in the public network, face increasingly serious attack threats.
In a special period, the strength and the severity of national level network attack are increased, and key infrastructure represented by the power industry becomes a key target of the network attack; a series of attack threats possibly have great influence on our lives, and the traditional defense method adopts passive defense technologies such as intrusion detection and firewalls; while new unknown threats have increased in recent years, passive defense techniques have been inadequate to combat these risks; the honeypot is used as a decoy technology, is a safe resource, aims to detect, monitor and capture attack behaviors, and finally realizes the protection of a real host, is an active defense technology introduced by a defense party for changing the asymmetrical situation of a network attack and defense game, wherein a security researcher can monitor, observe and research the attacker and the attack behaviors of the attacker without the awareness of the attacker, so that the attacker can be found in a very short time, and the attacker can be effectively tracked and traced. With the continuous development of honeypot technology, attackers begin to research methods for identifying honeypots in order to attack real industrial control systems.
However, existing honeypot identification mainly aims at single feature identification, for example, a data capture tool Sebek is a kernel-based data capture mechanism and is often used for building high-interaction honeypots, and therefore, traditional honeypot identification technology has incompleteness and uncertainty, which cause inaccurate analysis and identification.
Disclosure of Invention
The problem solved by the invention is how to improve the accuracy of honeypot identification of the industrial control system.
In order to solve the above problem, in a first aspect, the present invention provides a honeypot identification method for an industrial control system, including the following steps:
acquiring data of honeypot equipment and industrial control equipment;
preprocessing the honeypot equipment and industrial control equipment data to obtain a data attribute column;
constructing a Bayesian network structure model according to the data attribute column, and determining the corresponding relation between honeypot characteristics and node states of the Bayesian network structure model; wherein the data attribute column comprises honeypot features and the Bayesian network structure model comprises node states of the Bayesian network structure model;
parameter learning is carried out on the Bayesian network structure model, training of the Bayesian network parameter model is completed, and first probability values of honeypots of the Bayesian network parameter model in different node states are perfected;
calculating by inputting the characteristic evidence, using a Bayesian network inference algorithm, and obtaining a probability of identifying the honeypot according to the honeypot first probability value.
Therefore, when honeypots in the industrial control system are identified, real honeypot equipment and industrial control equipment data can be collected and obtained firstly, and parameter bases are provided for identification of honeypots in the later period; preprocessing the honeypot equipment and industrial control equipment data so as to obtain a data attribute column; building a Bayesian network structure model according to the data attribute column, so that the honeypot characteristics of each column in the data attribute column correspond to the node states of the Bayesian network structure model, and generating the corresponding relation between the honeypot characteristics and the node states of the Bayesian network structure model, in other words, various uncertain and incomplete honeypot equipment and industrial control equipment data can be processed through the Bayesian network structure model in the Bayesian network model so as to be consistent with the data attribute column adopted by honeypot identification; parameter learning is carried out on the Bayesian network structure model, training of the Bayesian network parameter model is completed, first probability values of honeypots of the Bayesian network parameter model in different node states are obtained, and reference is provided for probability of honeypot identification in the later period; finally, calculating by inputting a characteristic evidence and using a Bayesian network inference algorithm, and obtaining a first probability value of the honeypot, so that the probability of identifying the honeypot can be obtained; in other words, the accuracy of identifying honeypots can be greatly improved by combining bayesian network parameter modeling with bayesian network inference algorithms, wherein the bayesian network is utilized to solve uncertain problems and is combined with bayesian network inference algorithms.
Optionally, the preprocessing the honeypot device and industrial control device data to obtain a data attribute column includes:
carrying out format conversion on the honeypot equipment and industrial control equipment data;
carrying out honeypot characteristic analysis on the data of the honeypot equipment and the industrial control equipment after format conversion and screening;
and splitting and splicing a plurality of honeypot characteristics in the screened honeypot equipment and industrial control equipment data to obtain the data attribute column.
Optionally, the building a bayesian network structure model according to the data attribute column includes:
and building the Bayesian network structure model by adopting a Bayesian tool Hugin according to the preprocessed data attribute column.
Optionally, the performing parameter learning on the bayesian network structure model, completing training of the bayesian network parameter model, and obtaining the first probability values of the honeypot of the bayesian network model in different node states includes:
performing parameter learning on the Bayesian network structure model by using the data attribute column, and completing the training of the Bayesian network parameter model;
and calculating to obtain probability values of the Bayesian network parameter model in different node states according to the corresponding relation between the honeypot characteristics and the node states of the Bayesian network structure model and the data attribute column, and taking the probability value with a honeypot state of 1 as a first probability value of the honeypot.
Optionally, the parameter learning of the bayesian network structure model using the data attribute column comprises:
and performing parameter learning on the Bayesian network structure model by using the data attribute column and adopting an EM algorithm.
Optionally, the performing parameter learning on the bayesian network structure model, completing training of the bayesian network parameter model, and obtaining the first probability values of the honeypot of the bayesian network parameter model in different node states further includes: and evaluating the performance of the EM algorithm training model learned by using the Bayesian network parameters by adopting a machine learning algorithm comparison mode.
Optionally, the evaluating the performance of the EM algorithm training model learned by using bayesian network parameters by machine learning algorithm comparison includes: selecting a plurality of different machine learning algorithm training models under the same data attribute column to generate a plurality of training models; respectively drawing a first ROC curve and a second ROC curve according to each training model and the Bayesian network model; and according to the comparison result of the AUC value in the first ROC curve and the AUC value in the second ROC curve, evaluating the performance of the EM algorithm training model using Bayesian network parameter learning.
Optionally, said entering the evidence of characteristics, calculating using a bayesian network inference algorithm, and obtaining a probability of identifying the honeypot based on the honeypot first probability value comprises:
adopting a Hugin tool, according to the corresponding relation between the honeypot characteristics and the node state of the Bayesian network structure model, inputting the characteristic evidence, and calculating by using a junction tree algorithm of Bayesian network inference, wherein the probability of the Bayesian network parameter model in a honeypot state of 1 is used as a second probability value;
and obtaining the probability of identifying the honeypots according to the comparison result of the second probability value and the first probability value of the honeypots.
Optionally, the obtaining the probability of identifying honeypots according to the comparison result of the second probability value and the honeypot first probability value comprises:
if the second probability value is greater than or equal to the honeypot first probability value, identifying the probability of honeypots as the second probability value;
and if the second probability value is smaller than the first probability value of the honeypot, judging that the honeypot is not the honeypot.
In a second aspect, the present invention further provides a honeypot identification apparatus for an industrial control system, including:
the acquisition module is used for acquiring data of the honeypot equipment and the industrial control equipment;
the processing module is used for preprocessing the data of the honeypot equipment and the industrial control equipment to obtain a data attribute column;
the first determining module is used for building a Bayesian network structure model according to the data attribute column and determining the corresponding relation between the honeypot characteristics and the node state of the Bayesian network structure model; wherein the data attribute column comprises honeypot features and the Bayesian network structure model comprises node states of the Bayesian network structure model;
the training module is used for performing parameter learning on the Bayesian network structure model, completing the training of the Bayesian network parameter model and obtaining first probability values of the Bayesian network parameter model in honeypots in different node states;
and the second determining module is used for calculating by inputting the characteristic evidence and using a Bayesian network inference algorithm and obtaining the probability of identifying the honeypots according to the first probability value of the honeypots.
Therefore, since the honeypot identification device of the industrial control system is used for realizing the honeypot identification method of the industrial control system, at least all technical effects of the honeypot identification method of the industrial control system are achieved, and detailed description is omitted here.
Drawings
FIG. 1 is a schematic diagram illustrating steps of an identification method according to an embodiment of the present invention;
FIG. 2 is a logic diagram of an identification method in an embodiment of the invention;
FIG. 3 is a diagram illustrating a structural model of a Foss network according to an embodiment of the present invention;
FIG. 4 is a ROC curve drawn by an EM algorithm training model using Bayesian network parameter learning in an embodiment of the present invention;
FIG. 5 is a ROC curve plotted for other machine learning algorithm training models in an embodiment of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order; it is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein.
In the description of the present invention, it should be noted that, unless otherwise explicitly specified or limited, the terms "disposed," "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; may be a mechanical connection; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.
In the description herein, references to the terms "an embodiment," "one embodiment," and "one implementation," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or implementation is included in at least one embodiment or example implementation of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or implementation. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or implementations.
Referring to fig. 1, an embodiment of the present invention provides a honeypot identification method for an industrial control system, including the following steps:
s1, acquiring data of the honeypot equipment and the industrial control equipment;
s2, preprocessing the honeypot equipment and industrial control equipment data to obtain a data attribute list;
s3, building a Bayesian network structure model according to the data attribute column, and determining the corresponding relation between honeypot characteristics and node states of the Bayesian network structure model; wherein the data attribute column comprises honeypot features and the Bayesian network structure model comprises node states of the Bayesian network structure model;
s4, parameter learning is carried out on the Bayesian network structure model, training of the Bayesian network parameter model is completed, and first probability values of honeypots of the Bayesian network parameter model in different node states are obtained;
s5, calculating by inputting characteristic evidences and using a Bayesian network inference algorithm, and obtaining a probability of identifying the honeypots according to the honeypot first probability value.
It should be noted that the industrial control system includes the honeypot device and the industrial control device, and in step S1, the honeypot device and the industrial control device data are obtained, for example, the honeypot device and the industrial control device data may be obtained through a network search engine such as Shodan; shodan is a well-known network space search engine and is used for searching network equipment, and a plurality of known honeypot servers and a large number of industrial control equipment are marked on a platform of Shodan, so that data of the honeypot equipment and the industrial control equipment can be acquired from the platform of Shodan as a data set; in step S2, the honeypot device and the industrial control device data may be preprocessed on the terminal device of the industrial control system to obtain a data attribute list meeting the requirement; in step S3, a bayesian network structure model is built according to the data attribute column, so that the strong processing capability and self-learning updating capability of the bayesian network model are utilized, in other words, the bayesian network model can process various uncertainty information and conform to honeypot features in the data attribute column adopted for completing honeypot identification, thereby determining the correspondence between the honeypot features and the node states of the bayesian network structure model, so as to improve the accuracy of honeypot identification on the target device. Wherein, the Bayesian network, also called belief network, is composed of a directed acyclic graph and a conditional probability table; the structure and parameters of the Bayesian network can be learned through a large amount of training data, such as a data attribute column, a Bayesian network structure model can be built through the data attribute column, and a Bayesian network parameter model can be generated through the learning and training of the parameters of the Bayesian network structure model; also, bayesian networks are prone to handling incomplete data sets.
In step S4, the bayesian network parameter model is generated by performing parameter learning on the bayesian network structure model, in other words, training on the bayesian network parameter model can be completed, and the first probability values of the honeypots of the bayesian network parameter model in different node states are obtained by training, so as to provide data references for the honeypot identification in the subsequent step. In step S5, the feature evidence is input, a bayesian network inference algorithm is used to perform calculation, the probability that the node state of the bayesian network parameter model is 1 is calculated, and the result is compared with the first probability value of the honeypot, and whether the input feature evidence is a real honeypot is determined according to the comparison result.
In the invention, when the honeypots in the industrial control system are identified, the real honeypot equipment and industrial control equipment data can be collected and obtained firstly, so as to provide parameter basis for the identification of the honeypots in the later period; preprocessing the honeypot equipment and industrial control equipment data so as to obtain a data attribute column; building a Bayesian network structure model according to the data attribute columns, so that honeypot characteristics of each column in the data attribute columns correspond to node states of the Bayesian network model, and generating a corresponding relation between the honeypot characteristics and the node states of the Bayesian network structure model, in other words, various uncertain and incomplete honeypot equipment and industrial control equipment data can be processed through the Bayesian network structure model in the Bayesian network model so as to be consistent with the data attribute columns adopted by honeypot identification; parameter learning is carried out on the Bayesian network structure model, training of the Bayesian network parameter model is completed, first probability values of honeypots of the Bayesian network parameter model in different node states are obtained, and reference is provided for probability of honeypot identification in the later period; finally, calculating by inputting a characteristic evidence and using a Bayesian network inference algorithm, and obtaining a first probability value of the honeypot, so that the probability of identifying the honeypot can be obtained; in other words, the accuracy of identifying honeypots can be greatly improved by combining bayesian network parameter modeling with bayesian network inference algorithms, wherein the bayesian network is utilized to solve uncertain problems and is combined with bayesian network inference algorithms.
In an embodiment of the present invention, the preprocessing the honeypot device and industrial control device data to obtain the data attribute column includes:
s21, converting the formats of the honeypot equipment and the industrial control equipment data;
s22, carrying out honeypot characteristic analysis on the honeypot equipment and industrial control equipment data after format conversion and screening;
s23, splitting and splicing the plurality of honeypot characteristics in the honeypot equipment and industrial control equipment data after screening to obtain the data attribute column.
It should be noted that, in step S21, the honeypot device and industrial control device data are converted, for example, by using a shodan conversion file command, the data format of the honeypot device and industrial control device data is converted into a csv format, so as to facilitate the analysis and screening of honeypot features in the subsequent steps, where the csv format is a comma separated value file format, and the csv format file stores table data (numbers and texts) in a plain text form, and the plain text means that the file is a character sequence and does not contain data that must be interpreted as binary digits.
In step S22, since the honeypot equipment and industrial control equipment data after format conversion are CSV format files and have plain text forms, honeypot feature analysis is performed and screening is performed, for example, the analysis refers to checking honeypot equipment and industrial control equipment data and identifying feature data that can identify honeypots; screening refers to deleting useless data attribute columns without distinguishing features to select some feature evidences capable of identifying honeypots so as to ensure that the identification of honeypot features at the later stage is more accurate, wherein the data attribute columns can be deleted by using a drop () method.
In step S23, the honeypot devices and the industrial control device data are subjected to splitting and then splicing on the plurality of honeypot features after being screened, so as to generate a relatively regular data attribute column; for example, the split plurality of honeypot features can be spliced by a concat () method of Pandas, wherein Pandas is a tool based on Numpy, the tool is created for solving a data analysis task, Pandas incorporates a large number of libraries and some standard data models, a tool required for efficiently operating a large data set is provided, and Pandas provides a large number of functions and methods capable of enabling us to quickly and conveniently process data, so Pandas is the prior art, and is not described herein again, and certainly, the split plurality of honeypot features can be spliced by other methods, which are not specifically limited herein.
In an embodiment of the present invention, the building a bayesian network structure model according to the data attribute column includes:
and building the Bayesian network structure model by adopting a Bayesian tool Hugin according to the preprocessed data attribute column.
It should be noted that, according to the preprocessed data attribute column, a bayesian tool Hugin is adopted to construct the bayesian network structure model, so that the data attribute column can be presented on the bayesian network structure model, in other words, the bayesian network structure model can reflect the current honeypot equipment and industrial control equipment data, so as to identify honeypots more accurately in the later period; hugin is a decision development tool based on a Bayesian network and mainly used for business intelligence, risk prediction, pricing prediction, risk analysis, insurance fraud detection, risk management, criminal behavior analysis, weather and climate analysis and the like.
In an embodiment of the present invention, with reference to fig. 1 and fig. 2, the performing parameter learning on the bayesian network structure model, completing training of the bayesian network parameter model, and obtaining first probability values of honeypots of the bayesian network parameter model in different node states includes:
s401, parameter learning is carried out on the Bayesian network structure model by using the data attribute column, and training of the Bayesian network parameter model is completed;
s402, calculating and obtaining probability values of the Bayesian network parameter model in different node states according to the corresponding relation between the honeypot characteristics and the node states of the Bayesian network structure model and the data attribute column, and taking the probability value with the honeypot state of 1 as a first probability value of the honeypot.
It should be noted that, in step S401, the bayesian network structure model is parameter-learned by using the data attribute columns to generate a bayesian network parameter model, so that all data attribute columns are presented on the bayesian network parameter model, in other words, a feature evidence of each of the data attribute columns corresponds to a node state of the bayesian network parameter model, as shown in fig. 3, a honeypot is a honeypot feature, the data attribute columns include isp, asn, port, S _ id, PLC _ name, and Module _ name, isp is an internet service provider, Module _ name is a device name, asn is an autonomous system number, a port is a port such as a data port, S _ id is a device serial number, and PLC _ name is a system name; the node states of the Bayesian network model have two states of 0 and 1, wherein 0 represents not honeypot, and 1 represents honeypot.
In step S402, according to the correspondence between the honeypot features and the node states of the bayesian network structure model, the input data is a preprocessed data attribute column, the output data is the probability values of the bayesian network parameter model in different node states, in other words, the probability values of the bayesian network parameter model in different node states are calculated by using the correspondence between the honeypot features and the probability values of the bayesian network parameter model in different node states and the input data attribute column, and the probability value with the honeypot state of 1 is used as a honeypot first probability value, which provides a data reference or comparison reference value for later-stage identification honeypots.
In an embodiment of the present invention, the parameter learning of the bayesian network structure model using the data attribute column comprises: and performing parameter learning on the Bayesian network structure model by using the data attribute column and adopting an EM algorithm.
It should be noted that, the data attribute column is used to perform parameter learning on the bayesian network structure model by using the EM algorithm to generate the bayesian network parameter model, in other words, the data attribute column is used to perform training, i.e. parameter learning, on the bayesian network structure model by using the EM algorithm to complete the training of the bayesian network parameter model, so that the data attribute column presented on the bayesian network parameter model is more complete and accurate; after parameter learning, the probability updating of node states of the Bayesian network parameter model is completed; the EM algorithm is also called Expectation-maximization (EM) algorithm, is a basic algorithm in the Bayesian network inference algorithm, and is a basis of algorithms in a plurality of machine learning fields; the EM algorithm is an algorithm for solving the maximum value in an iteration mode, and meanwhile the algorithm is divided into two steps, namely step E and step M, in each iteration. And updating the implicit data and the model distribution parameters in a round of iteration until convergence, namely obtaining the needed model parameters, wherein the EM algorithm is the prior art and is not described herein any more.
In an embodiment of the present invention, with reference to fig. 1 and fig. 2, performing parameter learning on the bayesian network structure model, completing training of the bayesian network parameter model, and obtaining first probability values of honeypots of the bayesian network parameter model in different node states further includes: and evaluating the performance of the EM algorithm training model learned by using the Bayesian network parameters by adopting a machine learning algorithm comparison mode.
It should be noted that, by adopting a machine learning algorithm comparison mode, the performance of the EM algorithm training model using bayesian network parameter learning is evaluated, so that it can be verified that the performance of the model trained by using the EM algorithm in the present application is better, and fig. 4 is an ROC curve drawn after the model is trained by using the EM algorithm using bayesian network parameter learning.
In an embodiment of the present invention, the evaluating the performance of the EM algorithm training model using bayesian network parameters by machine learning algorithm comparison comprises:
s4111, selecting multiple different machine learning algorithm training models under the same data attribute column to generate multiple training models;
s4112, respectively drawing each first ROC curve and each second ROC curve according to each training model and the Bayesian network model;
s4113, according to a comparison result of the AUC value in the first ROC curve and the AUC value in the second ROC curve, evaluating the performance of the EM algorithm training model using the Bayesian network parameters.
In step S4111, under the same data attribute column, selecting a plurality of different machine learning algorithm training models to generate a plurality of training models, wherein the training models are trained under the same data attribute column, so that the formed training models have the same evaluation criteria due to the same data attribute column; the multiple different machine learning algorithms are more than two, and can be three, four and the like, and as shown in fig. 5, fig. 5 is a model trained by adopting four different machine learning algorithms, and the trained model is defined as each training model, so that a model comparison object is provided for the subsequent evaluation of the performance of the EM algorithm training model which uses Bayesian network parameter learning; in the embodiment, the training of different machine learning algorithms is 4, namely four machine learning algorithms of SVM, KNN, random forest and Native bayer.
In step S4112, according to each of the training models, a plurality of first ROC curves corresponding to each of the training models may be plotted; according to the Bayesian network parameter model, a second ROC curve corresponding to the Bayesian network parameter model can be drawn; the ROC curve is a receiver operation characteristic curve, also called sensitivity curve (sensitivity curve), because each point on the curve reflects the same sensitivity, and they are both responses to the same signal, e.g., data attribute column stimuli, but results obtained under several different criteria.
Each first ROC curve of each training model can be drawn by calling a Matplotlib library through python3, for example, each first ROC curve can be drawn in a coordinate system as shown in fig. 5, where a horizontal axis false positive rate in the coordinate system refers to a true class rate, and a vertical axis true positive rate refers to a false positive class rate, also called a recall rate; in fig. 5, the first ROC curve corresponding to each training model is identified by using different lines. Referring to fig. 4, a second ROC curve corresponding to the model trained by the EM algorithm may be shown by a Hugin tool.
Wherein: FIG. 5 dotted line- - ∙ - -is the ROC curve for the random forest algorithm, AUC value 0.9556, solid line- -is the ROC curve for the SVM algorithm, AUC value 0.9333, dotted line ∙ ∙ ∙ ∙ ∙ ∙ is the KNN ROC curve for the KNN algorithm, AUC value 0.9333, dotted line- -is the ROC curve for the Native bayes algorithm, AUC value 0.9556.
In step 4113, comparing the AUC values in each first ROC curve in fig. 5 with the AUC values in the second ROC curve to obtain a value that the AUC in the second ROC curve is always stable and close to 1, so that the bayesian network parameter model trained by using the EM algorithm using the data attribute column has the best performance; the performance of each training model can be judged according to the size of the AUC value in the ROC curve, and the AUC value are in positive correlation.
In an embodiment of the present invention, said entering the evidence of characteristics, calculating using a bayesian network inference algorithm, and obtaining the probability of identifying the honeypot based on the honeypot first probability value comprises:
s51, calculating by using a connection tree algorithm of Bayesian network inference according to the correspondence between the honeypot characteristics and the node states of the Bayesian network structure model and the input acquired characteristic evidence by adopting a Hugin tool, wherein the probability of the Bayesian network parameter model in a honeypot state of 1 is used as a second probability value;
and S52, obtaining the probability of identifying the honeypots according to the comparison result of the second probability value and the first probability value of the honeypots.
It should be noted that, in step S51, according to the correspondence between the honeypot features and the node states of the bayesian network structure model, the collected feature evidence is input, and calculation is performed by using a join tree algorithm of bayesian network inference through a Hugin tool, so that the probability that the bayesian network parameter model is 1 in the honeypot state can be obtained and used as a second probability value; the connection tree algorithm is one of Bayesian network inference algorithms, the connection tree algorithm is the most rapid calculation speed at present, the most widely applied Bayesian network accurate inference algorithm is applied, and the method is suitable for inference of single-connection and multi-connection networks, so that the calculation speed and the accuracy of the process of calculating the probability of the Bayesian network model in the Honeypot state of 1 through the connection tree algorithm of Bayesian network inference are higher. The honeypot node has two states of 0 and 1, the state of the honeypot node is 0 and represents not to be a honeypot, and the state of the honeypot node is 1 and represents to be a honeypot.
In addition, by inputting the collected characteristic Evidence, a Hugin tool is adopted, and a Bayesian network inference algorithm is used for calculating the probability that the Bayesian network model is in a honeypot state as 1 as a second probability value, wherein the Evidence refers to the characteristic Evidence, and the corresponding relation between the collected characteristic Evidence and the second probability value is shown in the following table 1.
Figure 813458DEST_PATH_IMAGE002
TABLE 1
As can be seen from the data in table 1, assuming that the state of the input evidence s _ id is 88111222, through the calculation of the join tree inference algorithm in the bayesian network inference algorithm, the probability that the state of the hoypot label is 1 at this time is changed from 73.48% to 75.34%, which indicates that the probability that the industrial honeypot is predicted to be 75.34% when the characteristic state occurs. Meanwhile, the state of the input evidence s _ id is 88111222, and the state of the PLC _ name is Technodrome, and the probability that the state of the honeypot label is 1 at the moment is changed from original 73.48% to 77.57%. When the two states occur simultaneously, the possibility of predicting the probability of the industrial control honeypot is increased; the above table only shows that the predicted probability value after inputting a part of the feature evidences is the second probability value, and the second probability values corresponding to another part of the feature evidences refer to the above table 1, which is not described herein again.
In step S52, the probability of identifying the honeypot is determined according to the comparison result between the second probability value and the honeypot first probability value, in other words, it can be determined whether the collected characteristic evidence can be used to determine the honeypot and the probability of determining the honeypot.
In an embodiment of the present invention, the obtaining the probability of identifying honeypots according to the comparison result of the second probability value and the honeypot first probability value comprises:
s521, if the second probability value is larger than or equal to the first probability value of the honeypot, identifying the probability of the honeypot as the second probability value;
and S522, if the second probability value is smaller than the first probability value of the honeypot, judging that the honeypot is not the honeypot.
It should be noted that, in step S521, if the second probability value is greater than or equal to the first probability value of the honeypot, the probability of the honeypot is identified as the second probability value, in other words, the device corresponding to the collected characteristic evidence is a real honeypot; in step S522, if the second probability value is smaller than the first probability value of the honeypot, it is determined that the device is not a honeypot, in other words, the device corresponding to the collected characteristic evidence is not a real honeypot.
Another embodiment of the present invention provides a honeypot identification apparatus for an industrial control system, including:
the acquisition module is used for acquiring data of the honeypot equipment and the industrial control equipment;
the processing module is used for preprocessing the data of the honeypot equipment and the industrial control equipment to obtain a data attribute column;
the first determining module is used for building a Bayesian network structure model according to the data attribute column and determining the corresponding relation between the honeypot characteristics and the node state of the Bayesian network structure model; wherein the data attribute column comprises honeypot features and the Bayesian network structure model comprises node states of the Bayesian network structure model;
the training module is used for performing parameter learning on the Bayesian network structure model, completing the training of the Bayesian network parameter model and obtaining first probability values of the Bayesian network parameter model in honeypots in different node states;
and the second determining module is used for calculating by inputting the characteristic evidence and using a Bayesian network inference algorithm and obtaining the probability of identifying the honeypots according to the first probability value of the honeypots.
It should be noted that, since the honeypot identification apparatus of the industrial control system is used for implementing the honeypot identification method of the industrial control system, the honeypot identification apparatus of the industrial control system at least has all technical effects of the honeypot identification method of the industrial control system, and details are not repeated herein.
The embodiment of the invention provides a server, which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor executes the program to realize the steps in the honeypot identification method of the industrial control system.
The processor is a control center of the server, and executes various functions of the server and processes data by running or executing software programs and/or modules stored in the memory and calling data stored in the memory, so as to perform overall monitoring on the honeypot device and the industrial control device in the industrial control system.
It will be understood by those skilled in the art that all or part of the steps of the methods of the above embodiments may be performed by instructions or by associated hardware controlled by the instructions, which may be stored in a computer readable storage medium and loaded and executed by a processor.
Meanwhile, the embodiment of the application provides a computer-readable storage medium, wherein a plurality of instructions are stored in the computer-readable storage medium, and the instructions are suitable for being loaded by a processor to execute the steps in the honeypot identification method of the industrial control system.
It should be noted that the storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.
Since the instructions stored in the storage medium can execute the steps in any method provided in the embodiments of the present application, the beneficial effects that can be achieved by any method provided in the embodiments of the present application can be achieved, for details, see the foregoing embodiments, and are not described herein again.
Although the present disclosure has been described above, the scope of the present disclosure is not limited thereto. Various changes and modifications may be effected therein by one of ordinary skill in the pertinent art without departing from the spirit and scope of the present disclosure, and these changes and modifications are intended to be within the scope of the present disclosure.

Claims (10)

1. A honeypot identification method of an industrial control system is characterized by comprising the following steps:
acquiring data of honeypot equipment and industrial control equipment;
preprocessing the honeypot equipment and industrial control equipment data to obtain a data attribute column;
constructing a Bayesian network structure model according to the data attribute column, and determining the corresponding relation between honeypot characteristics and node states of the Bayesian network structure model; wherein the data attribute column comprises honeypot features and the Bayesian network structure model comprises node states of the Bayesian network structure model;
parameter learning is carried out on the Bayesian network structure model, training of the Bayesian network parameter model is completed, and first probability values of honeypots of the Bayesian network parameter model in different node states are obtained;
calculating by inputting the characteristic evidence, using a Bayesian network inference algorithm, and obtaining a probability of identifying the honeypot according to the honeypot first probability value.
2. The honeypot identification method of industrial control system of claim 1, wherein the preprocessing the honeypot device and industrial control device data to obtain a data attribute column comprises:
carrying out format conversion on the honeypot equipment and industrial control equipment data;
carrying out honeypot characteristic analysis on the data of the honeypot equipment and the industrial control equipment after format conversion and screening;
and splitting and splicing a plurality of honeypot characteristics in the screened honeypot equipment and industrial control equipment data to obtain the data attribute column.
3. The honeypot identification method of the industrial control system according to claim 1, wherein the building of the bayesian network structure model according to the data attribute column comprises:
and building the Bayesian network structure model by adopting a Bayesian tool Hugin according to the preprocessed data attribute column.
4. The honeypot identification method of the industrial control system as claimed in claim 1, wherein the parameter learning of the bayesian network structure model, the training of the bayesian network parameter model, and the obtaining of the honeypot first probability values of the bayesian network parameter model in different node states comprises:
performing parameter learning on the Bayesian network structure model by using the data attribute column, and completing the training of the Bayesian network parameter model;
and calculating to obtain probability values of the Bayesian network parameter model in different node states according to the corresponding relation between the honeypot characteristics and the node states of the Bayesian network structure model and the data attribute column, and taking the probability value with a honeypot state of 1 as a first probability value of the honeypot.
5. The honeypot identification method of industrial control system of claim 4, wherein the parameter learning the Bayesian network structure model using the column of data attributes comprises:
and performing parameter learning on the Bayesian network structure model by using the data attribute column and adopting an EM algorithm.
6. The honeypot identification method of the industrial control system as recited in claim 5, wherein the parameter learning of the bayesian network structure model, the training of the bayesian network parameter model, and the obtaining of the first probability values of honeypots of the bayesian network parameter model in different node states further comprises:
and evaluating the performance of the EM algorithm training model which is learned by using the Bayesian network parameters in a machine learning algorithm comparison mode.
7. The honeypot identification method of industrial control system as claimed in claim 6 wherein evaluating the performance of EM algorithm training models using Bayesian network parameter learning by way of machine learning algorithm comparison comprises:
selecting a plurality of different machine learning algorithm training models under the same data attribute column to generate a plurality of training models;
respectively drawing a first ROC curve and a second ROC curve according to the training models and the Bayesian network model;
and according to the comparison result of the AUC value in the first ROC curve and the AUC value in the second ROC curve, evaluating the performance of the model trained by using the Bayesian network parameter learning EM algorithm.
8. Method for honeypot identification of an industrial control system according to any of claims 1 to 7, characterized in that said entering of characteristic evidence, computing using a Bayesian network inference algorithm, and deriving from said honeypot first probability values probabilities of identifying honeypots comprises:
adopting a Hugin tool, according to the corresponding relation between the honeypot characteristics and the node state of the Bayesian network structure model, inputting the characteristic evidence, and calculating by using a junction tree algorithm of Bayesian network inference, wherein the probability of the Bayesian network parameter model in a honeypot state of 1 is used as a second probability value;
and obtaining the probability of identifying the honeypots according to the comparison result of the second probability value and the first probability value of the honeypots.
9. The honeypot identification method of the industrial control system as claimed in claim 8, wherein the obtaining the probability of identifying honeypots according to the comparison result of the second probability value and the honeypot first probability value comprises:
if the second probability value is greater than or equal to the honeypot first probability value, identifying the probability of honeypots as the second probability value;
and if the second probability value is smaller than the first probability value of the honeypot, judging that the honeypot is not a honeypot.
10. A honeypot identification apparatus of an industrial control system, comprising:
the acquisition module is used for acquiring data of the honeypot equipment and the industrial control equipment;
the processing module is used for preprocessing the data of the honeypot equipment and the industrial control equipment to obtain a data attribute column;
the first determining module is used for building a Bayesian network structure model according to the data attribute column and determining the corresponding relation between the honeypot characteristics and the node state of the Bayesian network structure model; wherein the data attribute column comprises honeypot features and the Bayesian network structure model comprises node states of the Bayesian network structure model;
the training module is used for performing parameter learning on the Bayesian network structure model, completing the training of the Bayesian network parameter model and obtaining first probability values of the Bayesian network parameter model in honeypots in different node states;
and the second determining module is used for calculating by inputting the characteristic evidence and using a Bayesian network inference algorithm and obtaining the probability of identifying the honeypots according to the first probability value of the honeypots.
CN202210071252.8A 2022-01-21 2022-01-21 Honeypot identification method and device of industrial control system Pending CN114465784A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210071252.8A CN114465784A (en) 2022-01-21 2022-01-21 Honeypot identification method and device of industrial control system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210071252.8A CN114465784A (en) 2022-01-21 2022-01-21 Honeypot identification method and device of industrial control system

Publications (1)

Publication Number Publication Date
CN114465784A true CN114465784A (en) 2022-05-10

Family

ID=81410320

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210071252.8A Pending CN114465784A (en) 2022-01-21 2022-01-21 Honeypot identification method and device of industrial control system

Country Status (1)

Country Link
CN (1) CN114465784A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150135315A1 (en) * 2013-11-11 2015-05-14 National University of Computer and Emerging Sciences System and method for botnet detection
KR102259732B1 (en) * 2019-11-28 2021-06-02 광주과학기술원 A honeypot deployment method on a network
CN113536678A (en) * 2021-07-19 2021-10-22 中国人民解放军国防科技大学 XSS risk analysis method and device based on Bayesian network and STRIDE model
CN113783881A (en) * 2021-09-15 2021-12-10 浙江工业大学 Network honeypot deployment method facing penetration attack

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150135315A1 (en) * 2013-11-11 2015-05-14 National University of Computer and Emerging Sciences System and method for botnet detection
KR102259732B1 (en) * 2019-11-28 2021-06-02 광주과학기술원 A honeypot deployment method on a network
CN113536678A (en) * 2021-07-19 2021-10-22 中国人民解放军国防科技大学 XSS risk analysis method and device based on Bayesian network and STRIDE model
CN113783881A (en) * 2021-09-15 2021-12-10 浙江工业大学 Network honeypot deployment method facing penetration attack

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MOHAMED HAMMAD;WAEL EL-MEDANY;YASSER ISMAIL: "Intrusion Detection System using Feature Selection With Clustering and Classification Machine Learning Algorithms on the UNSW-NB15 dataset" *
孙叶; 王钢; 魏东: "贝叶斯网络在智能电网研究中的应用" *

Similar Documents

Publication Publication Date Title
Khan et al. HML-IDS: A hybrid-multilevel anomaly prediction approach for intrusion detection in SCADA systems
Gwon et al. Network intrusion detection based on LSTM and feature embedding
US11444876B2 (en) Method and apparatus for detecting abnormal traffic pattern
De Souza et al. Two-step ensemble approach for intrusion detection and identification in IoT and fog computing environments
Tabash et al. Intrusion detection model using naive bayes and deep learning technique.
CN110768971B (en) Confrontation sample rapid early warning method and system suitable for artificial intelligence system
CN111600919A (en) Web detection method and device based on artificial intelligence
Lazzarini et al. A stacking ensemble of deep learning models for IoT intrusion detection
CN117220920A (en) Firewall policy management method based on artificial intelligence
Rajesh et al. Evaluation of machine learning algorithms for detection of malicious traffic in scada network
Hegazy Tag Eldien, AS; Tantawy, MM; Fouda, MM; TagElDien, HA Real-time locational detection of stealthy false data injection attack in smart grid: Using multivariate-based multi-label classification approach
Berghout et al. EL-NAHL: Exploring labels autoencoding in augmented hidden layers of feedforward neural networks for cybersecurity in smart grids
Teixeira et al. Flow‐based intrusion detection algorithm for supervisory control and data acquisition systems: A real‐time approach
Do Xuan et al. Optimization of network traffic anomaly detection using machine learning.
Hariprasad et al. Detection of DDoS Attack in IoT Networks Using Sample Selected RNN-ELM.
CN112925805A (en) Big data intelligent analysis application method based on network security
CN116737850A (en) Graph neural network model training method for APT entity relation prediction
CN114465784A (en) Honeypot identification method and device of industrial control system
Alqurashi et al. On the performance of isolation forest and multi layer perceptron for anomaly detection in industrial control systems networks
CN114915446A (en) Intelligent network security detection method fusing priori knowledge
CN113468540A (en) Security portrait processing method based on network security big data and network security system
Pan Iot network behavioral fingerprint inference with limited network traces for cyber investigation
Narengbam et al. Harris hawk optimization trained artificial neural network for anomaly based intrusion detection system
Lazzarini et al. A Stacking Ensemble of Deep Learning Models for IoT Network Intrusion Detection
Ismaeel et al. Anomaly-based detection technique using deep learning for Internet of Things: A Survey

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination