CN111126440A - Integrated industrial control honeypot identification system and method based on deep learning - Google Patents

Integrated industrial control honeypot identification system and method based on deep learning Download PDF

Info

Publication number
CN111126440A
CN111126440A CN201911166903.6A CN201911166903A CN111126440A CN 111126440 A CN111126440 A CN 111126440A CN 201911166903 A CN201911166903 A CN 201911166903A CN 111126440 A CN111126440 A CN 111126440A
Authority
CN
China
Prior art keywords
identification
industrial control
honeypot
data
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911166903.6A
Other languages
Chinese (zh)
Other versions
CN111126440B (en
Inventor
孙彦斌
田志宏
崔翔
姜誉
苏申
鲁辉
谭庆丰
李默涵
李玉莹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou University
Original Assignee
Guangzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou University filed Critical Guangzhou University
Priority to CN201911166903.6A priority Critical patent/CN111126440B/en
Publication of CN111126440A publication Critical patent/CN111126440A/en
Application granted granted Critical
Publication of CN111126440B publication Critical patent/CN111126440B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/1491Countermeasures against malicious traffic using deception as countermeasure, e.g. honeypots, honeynets, decoys or entrapment
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Abstract

The invention discloses an integrated industrial control honeypot identification system based on deep learning, which comprises a characteristic data acquisition module, a model training module and an online characteristic identification module. The characteristic data acquisition module acquires original data of different types of industrial control equipment according to honeypot identification requirements, and extracts strong characteristic data and general characteristic data from the original data. The model training module identifies the honeypots and the industrial control equipment based on the strong features, and inputs corresponding general features and identification results of the honeypots and the industrial control equipment as training data to the deep learning model for training so as to construct a honeypot identification model. The online characteristic identification module identifies first characteristic data of different types of industrial control equipment on line and outputs a first identification result. The honeypot identification model also carries out incremental training by taking the identification result corrected after the online identification error as training data. Through the incremental training and the fusion of various types of characteristics, the identification of various industrial control honeypots is supported, and the accuracy of the identification of the industrial control honeypots is effectively improved.

Description

Integrated industrial control honeypot identification system and method based on deep learning
Technical Field
The invention relates to the technical field of image identification and industrial control safety, in particular to an integrated industrial control honeypot identification system and method based on deep learning.
Background
With the alternate promotion of industrial control defense technology and attack technology, the industrial control safety protection is gradually transited from initial passive defense to active defense. Industrial honey pots have been receiving attention in recent years as an active defense technology, and are being deployed in large quantities. The industrial control honeypot is essentially a deception method, and an attacker is induced to attack the industrial control honeypot by simulating industrial control equipment and scenes with bugs and weak points, so that the attack behavior and tools of the attacker are monitored and analyzed. Particularly, as the openness of the industrial control system is continuously enhanced, a large number of industrial control devices start to be interconnected with an external network, so that attack events aiming at the industrial control network are endless, such as network-shaking viruses in 2010. To cope with security threats from the outside, industrial control of honeypots has become particularly important.
The industrial control honeypots are divided into three categories according to the interaction characteristics of the industrial control honeypots. Low-interaction industrial control honeypot: only some services and ports are open and can be detected by attackers, but the interaction degree with the attackers is low, and the honeypots are easy to identify. And (3) carrying out medium-interaction industrial control honey pot: the method is between a low-interaction mode and a high-interaction mode, is more like real industrial control equipment than a low-interaction honeypot, and can simulate more services such as an operating system. High-interaction industrial control honeypot: a real operating system and a real service environment are deployed, and the system can interact with an attacker like a real industrial control system.
Although the industrial honeypot is camouflaged, the honeypot may still be identified by an attacker, resulting in the honeypot failing. Meanwhile, the honeypot has springboard property, and an attacker can take the honeypot as the springboard after identifying the honeypot, attack a real system and cause unnecessary loss. In order to improve the authenticity of the industrial control honeypots and enhance the identification resistance of the honeypots, the identification research of the industrial control honeypots is very necessary. Meanwhile, due to the existence of the industrial control honeypots, the accuracy of the network detection system facing the industrial control equipment is influenced to a certain extent, and the industrial control honeypots need to be accurately screened from the real equipment. Therefore, the industrial control honeypot identification is the key point of the safety research of the industrial control system from the viewpoint of attack resistance or improvement of detection accuracy.
The existing integrated industrial control honey pot identification system based on deep learning and honey pot identification method based on data packet fragmentation of machine learning mainly have three defects:
(1) few and too simple feature types
The existing industrial control identification method generally selects four types of characteristics: IP address basic position information, TCP/IP operating system fingerprints, industrial control protocol deep interaction characteristics and configuration program debugging and running characteristics. Because the industrial control honey pot technology is continuously developed, the honey pot is more and more close to real equipment or a real system, only four characteristic values are selected, and for identifying a complex honey pot system, the characteristic types are insufficient, and meanwhile, the characteristics are simple, so that the simulation of the industrial control honey pot is easy. Therefore, it is difficult to distinguish between honeypots and real devices.
(2) Identifying object singleness
Many honeypot identification technologies do not integrate low-interaction, medium-interaction and high-interaction honeypot identification methods, and also do not integrate Windows-based industrial control honeypots and Linux-based deep learning-based integrated industrial control honeypot identification systems, and more identification methods are only used for identifying certain types of honeypots.
(3) Lack of training data
The existing integrated industrial control honeypot identification system based on machine learning and based on deep learning does not provide how to acquire training data, and a public industrial control honeypot data set is lacking at present. Meanwhile, because different identification methods have different requirements on identification characteristics, the industrial control honeypot data set is difficult to adapt to different identification methods.
(4) Lack of expansion and adaptation capabilities
The existing integrated industrial control honeypot identification system based on deep learning is fixed and is difficult to dynamically adjust according to different requirements. Even for the relatively flexible integrated industrial control honeypot identification system based on machine learning and deep learning, the identification features are manually selected according to expert knowledge on the premise that the features can well identify the industrial control honeypot, but some hidden features are difficult to find and use. Meanwhile, the characteristics selected by the machine learning method are fixed and the model is fixed, and when new identification characteristics are found, the model cannot automatically add characteristic information and only can retrain a new model.
The reasons for the above disadvantages are: the main reasons for the above disadvantages include two aspects: (1) the industrial control scenes are complex and diverse, so that the types of industrial control honeypots are more, the existing identification method mainly aims at identifying one or more types of honeypots, and a general identification method for honeypots is lacked. (2) With the development of the industrial control honeypot technology, the hiding method and technology of the industrial control honeypot are continuously improved, the effective identification characteristics of the industrial control honeypot are dynamically changed, and therefore dynamic requirements are provided for an integrated industrial control honeypot identification system based on deep learning.
Disclosure of Invention
The embodiment of the invention aims to provide an integrated industrial control honeypot identification system based on deep learning, which supports the identification of various industrial control honeypots by fusing a deep learning model and various types of characteristics together and effectively improves the accuracy of the identification of the industrial control honeypots.
In order to achieve the above object, an embodiment of the present invention provides an integrated industrial control honeypot identification system based on deep learning, including: the system comprises a characteristic data acquisition module, a model training module and an online characteristic identification module;
the characteristic data acquisition module is used for acquiring original data of different types of industrial control equipment according to honeypot identification requirements, and performing characteristic extraction from the acquired original data to obtain strong characteristic data capable of accurately identifying honeypots and general characteristic data for training and learning;
the model training module is used for identifying the honeypots and the industrial control equipment according to the strong feature data, inputting the identified general features and identification results corresponding to the honeypots and the industrial control equipment as a training set to a deep learning model for training so as to construct a trained honeypot identification model;
the online feature identification module is used for performing feature extraction on the original data of different types of industrial control equipment on line, extracting first feature data, inputting the first feature data into the honeypot identification model, and obtaining a first identification result output by the honeypot identification model.
Further, the characteristic data acquisition module comprises an IP address acquisition unit, a detection message construction unit, a detection result generation unit, and a characteristic extraction unit;
the IP address acquisition unit is used for reading the IP address information of a plurality of honeypots from the honeypot IP address library to be scanned;
the detection message construction unit is used for constructing different types of detection messages according to the honeypot identification requirements and sending the different types of detection messages to the corresponding networking industrial control equipment;
the detection result generation unit is used for writing a response message returned by the networking industrial control equipment as a detection result back to the IP address library and taking the detection result as original data;
and the feature extraction unit is used for extracting features from the original data and extracting strong feature data capable of accurately identifying honeypots and general feature data for training and learning.
Further, the strong characteristic data comprises a strong characteristic industrial control honeypot and a strong characteristic normal industrial control device;
the strong-characteristic industrial control honeypot is a device, wherein ISP is a cloud platform or colleges or research institutes, operating system fingerprints are windows, the number of open ports is greater than Port threshold and violates Port opening rules, and OS fingerprints are windows;
the strong-characteristic normal industrial control equipment is equipment with an ISP (Internet service provider) being a non-cloud platform or a non-college institute of university, an open port being smaller than Porttthreshold and not violating a port open rule, and an operating system fingerprint being an embedded OS (operating system).
Furthermore, the integrated industrial control honey pot identification system based on deep learning also comprises a model updating module;
and the model updating module is used for correcting the error result when an error is identified on line, inputting the corrected identification result into the honeypot identification model in an off-line state as a training set for training, and applying the trained honeypot identification model to on-line identification again after the training is finished.
Further, the raw data of the different types of industrial control equipment comprise defense characteristic data, intrinsic characteristic data and network attribute characteristic data.
As a preferred embodiment of the invention, the invention also provides integrated industrial control honeypot identification based on deep learning, which comprises the following steps:
acquiring original data of different types of industrial control equipment according to honeypot identification requirements, and performing feature extraction from the acquired original data to obtain strong feature data capable of accurately identifying honeypots and general feature data for training and learning;
identifying honeypots and industrial control equipment according to the strong feature data, inputting the identified general features and identification results corresponding to the honeypots and the industrial control equipment as a training set to a deep learning model for training to construct a trained honeypot identification model;
feature extraction is carried out on line from original data of different types of industrial control equipment, first feature data are extracted, the first feature data are input into the honeypot identification model, and a first identification result output by the honeypot identification model is obtained.
Further, according to the honeypot identification requirement, original data of different types of industrial control equipment are obtained, feature extraction is performed on the obtained original data, and strong feature data capable of accurately identifying honeypots and general feature data used for training and learning are obtained, and the method specifically comprises the following steps:
reading IP address information of a plurality of industrial control devices from an industrial control device IP address library to be scanned;
constructing different types of detection messages according to the honeypot identification requirements, and sending the different types of detection messages to corresponding networked industrial control equipment;
taking a response message returned by the networking industrial control equipment as a detection result, writing the response message back to an IP address library, and taking the detection result as original data;
and extracting features from the original data, and extracting strong feature data capable of accurately identifying honeypots and general feature data for training and learning.
Further, the strong characteristic data comprises a strong characteristic industrial control honeypot and a strong characteristic normal industrial control device;
the strong-characteristic industrial control honeypot is a device, wherein ISP is a cloud platform or colleges or research institutes, operating system fingerprints are windows, the number of open ports is greater than Port threshold and violates Port opening rules, and OS fingerprints are windows;
the strong-characteristic normal industrial control equipment is equipment with an ISP (Internet service provider) being a non-cloud platform or a non-college institute of university, an open port being smaller than Porttthreshold and not violating a port open rule, and an operating system fingerprint being an embedded OS (operating system).
Further, the integrated industrial control honey pot identification method based on deep learning further comprises the following steps:
when an error is identified on line, correcting the error result, inputting the corrected identification result as a training set into an offline honeypot identification model for training, and reapplying the trained honeypot identification model to the online identification after the training is finished
Further, the raw data of the different types of industrial control equipment comprise defense characteristic data, intrinsic characteristic data and network attribute characteristic data.
Compared with the prior art, the method has the following beneficial effects:
the integrated industrial control honey pot recognition system based on deep learning provided by the embodiment of the invention comprises a characteristic data acquisition module, a model training module and an online characteristic recognition module, wherein the characteristic data acquisition module acquires original data of different types of industrial control equipment according to honey pot recognition requirements, performs characteristic extraction from the acquired original data to obtain strong characteristic data capable of accurately recognizing honey pots and general characteristic data for training and learning, the model training module recognizes honey pots and industrial control equipment according to the strong characteristic data, inputs the recognized general characteristics and recognition results corresponding to the honey pots and the industrial control equipment as training sets into a deep learning model for training to construct a trained honey pot recognition model, the online characteristic recognition module performs characteristic extraction from the original data of the different types of industrial control equipment on line to extract first characteristic data, the first feature data are input into the honeypot identification model to obtain a honeypot identification model output first identification result, the deep learning model and the multiple types of features are fused together to support identification of multiple industrial control honeypots, and accuracy of industrial control honeypot identification is effectively improved.
Drawings
FIG. 1 is a schematic structural diagram of an embodiment of an integrated industrial-control honeypot identification system based on deep learning provided by the invention;
FIG. 2 is a working schematic diagram of an embodiment of the integrated industrial-control honeypot identification system based on deep learning provided by the invention;
FIG. 3 is a flowchart illustrating the iterative updating of one embodiment of the deep learning-based integrated industrial control honeypot identification system provided by the present invention;
FIG. 4 is a flow chart of one-time industrial control honey pot feature acquisition and online identification of an embodiment of the integrated industrial control honey pot identification system based on deep learning provided by the invention;
FIG. 5 is a schematic flow chart of an embodiment of the integrated industrial control honeypot identification method based on deep learning provided by the invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, an embodiment of the present invention provides an integrated industrial control honeypot identification system based on deep learning, including: the system comprises a characteristic data acquisition module, a model training module and an online characteristic identification module;
the characteristic data acquisition module is used for acquiring original data of different types of industrial control equipment according to honeypot identification requirements, and performing characteristic extraction from the acquired original data to obtain strong characteristic data capable of accurately identifying honeypots and general characteristic data used for training and learning.
In the embodiment of the present invention, the feature data obtaining module includes an IP address obtaining unit, a detection packet constructing unit, a detection result generating unit, and a feature extracting unit;
the IP address acquisition unit is used for reading IP address information of a plurality of industrial control devices from an industrial control device IP address library to be scanned; the detection message construction unit is used for constructing different types of detection messages according to the honeypot identification requirements and sending the different types of detection messages to the corresponding networking industrial control equipment; the detection result generation unit is used for writing a response message returned by the networking industrial control equipment as a detection result back to the IP address library and taking the detection result as original data; and the feature extraction unit is used for extracting features from the original data and extracting strong feature data capable of accurately identifying honeypots and general feature data for training and learning.
Referring to fig. 2, fig. 2 is a working schematic diagram of an embodiment of the integrated industrial-control honeypot identification system based on deep learning provided by the present invention, specifically, IP address information is read from an IP address library to be scanned by an industrial-control scanner, different types of detection messages are constructed according to honeypot identification requirements and sent to a target networked industrial-control system/device, after a detected target returns a corresponding response message, a detection result is written back to the IP address library, the response message is used as original data, and strong feature data capable of accurately identifying honeypots and general feature data used for training and learning are extracted from the response message.
According to the preferred embodiment of the invention, the raw data of the different types of industrial control equipment comprises defense characteristic data, intrinsic characteristic data and network attribute characteristic data.
The characteristic data can be used as the input of an industrial control honeypot recognition deep learning model. It is first necessary to determine the various features obtained by scanning with the industrial control equipment and to construct them as acceptable inputs for deep learning. Besides traditional industrial control honeypot identification features (such as ISP information, port features, operating system fingerprints and protocol interaction features), various industrial control honeypot features can be synthesized:
a. a defensive feature. The industrial honeypot must have some kind of hole to induce attackers to attack, otherwise the honeypot will become meaningless, so the general honeypot is weak in defense. The industrial control honeypot can be judged by detecting whether the target system has the specific security loophole of the industrial control honeypot.
b. Intrinsic characteristics. In order to complete the functions of monitoring and analyzing, the industrial honey pot is generally provided with some auxiliary software. If certain features of these ancillary software are present in an industrial honeypot, the likelihood that the target system is a honeypot is extremely high.
c. And (4) network attributes. Because of the industrial control honeypot, the response time, the connection quantity and other network attributes of the industrial control honeypot when receiving the request can be used as identification characteristics.
It should be noted that the strong feature data includes a strong feature industrial control honeypot and a strong feature normal industrial control device;
specifically, because no industrial control honeypot data set is disclosed and available at present, the invention provides two methods for marking training data by self: the industrial control honeypot identification method based on the strong characteristics and the online identification data correction based on the feedback are realized.
(1) An industrial control honeypot labeling method based on strong characteristics. Some industrial honeypots may have strong features different from real devices due to deployment, such as virtual cloud platforms of ISPs, too many open ports, windows of operating systems, and the like, and the industrial honeypots can be accurately identified based on the features. However, not all honeypots have these strong features, which can only be used as a basis for labeling, and cannot completely identify the industrial honeypots.
Specifically, the ISP is a cloud platform, a college and a research institute, the operating system fingerprint is industrial control equipment in window, the number of open ports is > Port threshold and violates Port opening rules (the special ports of different equipment manufacturers are simultaneously open), and the OS fingerprint is equipment in window and is labeled as industrial control honeypots.
The ISP is a non-cloud platform and a non-college institute of colleges, a Port threshold is opened, a Port opening rule is not violated, and the fingerprint of the operating system is equipment of an embedded OS system and is marked as normal industrial control equipment.
Therefore, by adopting the industrial control honeypot labeling method based on the strong characteristics, the equipment is labeled to obtain strong characteristic data, and then the industrial control honeypot can be accurately identified according to the strong characteristic data.
(2) And correcting the online identification data based on the feedback. The data based on the strong characteristic standard still has the problems of wrong labeling or incomplete honeypot type coverage, so that the online identification result is wrong. In order to cope with such a situation, the recognition result may be corrected by means of target feedback or manual feedback.
Specifically, the precondition of target feedback is that we have explicitly known whether the target to be identified is a honeypot or a real device, so that a large number of industrial honeypots or real devices can be deployed in the network by themselves for scanning and identification, and the accuracy of identification is judged based on the known target. And the correctness of the identification result is judged by manual feedback in a manual penetration mode or by information fed back by trusted equipment or a honeypot owner.
For the devices or honeypots with identification errors, the characteristic data and the corrected identification result can be used as training data and added into a training data set.
In another embodiment of the present invention, the model training module is configured to identify the honeypot and the industrial control device according to the strong feature data, and input the identified general features and the identification results corresponding to the honeypot and the industrial control device as a training set to the deep learning model for training, so as to construct the trained honeypot identification model.
The online feature identification module is used for performing feature extraction on the original data of different types of industrial control equipment on line, extracting first feature data, inputting the first feature data into the honeypot identification model, and obtaining a first identification result output by the honeypot identification model.
As another preferred embodiment of the present invention, the integrated industrial control honey pot recognition system based on deep learning further comprises a model updating module;
and the model updating module is used for correcting the error result when an error is identified on line, inputting the corrected identification result into the honeypot identification model in an off-line state as a training set for training, and applying the trained honeypot identification model to on-line identification again after the training is finished.
By combining deep learning model training and online industrial control honeypot identification, various identification characteristics can be fused, and accurate identification of various industrial control honeypots is supported.
Continuing to refer to fig. 2, specifically, after strong feature data capable of accurately identifying honeypots and general feature data used for training and learning are taken out, easily identified honeypots and industrial control equipment are identified based on the strong feature data, the identified general feature data and identification results corresponding to the honeypots and the industrial control equipment are taken as training data and added into a training data set, offline training of deep learning models is performed, and the trained models are periodically copied and used for online industrial control honeypot identification; and during online identification, extracting available feature data of the deep learning model from the original data, inputting the feature data into the trained model, outputting an identification result, correcting the identification result when the identification result is wrong, adding the corrected result into a training data set as training data, and iteratively updating the deep learning model.
Referring to fig. 3, fig. 3 is an iterative update flowchart of an embodiment of the integrated industrial-control honeypot identification system based on deep learning according to the present invention, in which training data set information is first obtained to determine whether a data set is an initial data set. If the initial data set is the initial data set, deep learning model training is directly carried out, and the trained model is copied for online honeypot identification; otherwise, the data set is an incrementally updated data set, whether the number of the added data sets reaches a threshold value or not can be judged, if so, the deep learning model is incrementally trained, whether the model is copied for online identification or a new iteration is restarted is judged according to the model accuracy, and if not, the training data set is redetected.
And (3) training around a deep learning model, and providing a data labeling method based on strong features and a data set labeling method based on feedback online identification data correction. The former accurately marks industrial control honeypots and industrial control equipment based on strong characteristics, and the latter corrects error identification data through a target feedback or manual feedback mode. Enough training data can be obtained and continuously provided with corrected training data for the deep learning model
Around the integrated architecture, the deep learning model training method based on iterative updating is provided, the continuous updating of the deep learning model is supported, and the accuracy of model identification is effectively improved.
In a preferred embodiment of the present invention, please refer to fig. 4, fig. 4 is a flowchart of a primary industrial control honey feature acquisition and online identification process of an embodiment of the integrated industrial control honey recognition system based on deep learning provided by the present invention, and for detected alive online devices, first, a corresponding detection message template is constructed for features to be acquired; constructing a plurality of detection messages based on the template, such as abnormal messages for detecting bugs, detection messages for auxiliary software, detection messages for network attributes and the like; after the scanner receives the multiple response messages, extracting a part containing effective characteristics from the response messages, and combining the multiple characteristics to construct acceptable input of a deep learning model, such as converting the acceptable input into a binary form to be represented as a picture; then, inputting the characteristic data into the trained deep learning model to obtain a networking device identification result; and verifying whether the identification result is correct or not based on user/expert feedback or experiments, if the identification result is correct, storing the identification result, if the identification result is incorrect, correcting the result based on feedback information, and using the characteristic data and the corrected result as training data for updating the deep learning model.
In summary, in the integrated industrial control honeypot identification system based on deep learning provided by the embodiment of the present invention, the feature data acquisition module acquires original data of different types of industrial control devices according to honeypot identification requirements, and performs feature extraction from the acquired original data to obtain strong feature data capable of accurately identifying honeypots and general feature data for training and learning, the model training module identifies honeypots and industrial control devices according to the strong feature data, and inputs the identified general features and identification results corresponding to honeypots and industrial control devices as a training set into the deep learning model for training to construct a trained honeypot identification model, the online feature identification module performs feature extraction from the original data of different types of industrial control devices online, extracts first feature data, and inputs the first feature data into the honeypot identification model, the first recognition result is output by the honey pot recognition model, multiple types of features are fused together, recognition of multiple industrial control honey pots is supported, the model is guaranteed to be continuously updated to support the dynamic change requirement of honey pot recognition, and the accuracy of industrial control honey pot recognition is effectively improved;
in addition, the integrated industrial control honey pot identification system based on deep learning and the honey pot technology provided by the embodiment of the invention can mutually promote and improve, on one hand, the system is beneficial to discovering on-line industrial control honey pot equipment and improving the accuracy of network equipment detection, and on the other hand, the system has a promoting effect on industrial control honey pot research and improves the authenticity of industrial control honey pots.
Compared with the prior art, the embodiment provided by the invention has the following beneficial effects:
(1) through the integrated industrial control honeypot identification framework based on deep learning, deep learning model training and online industrial control honeypot identification are combined, various identification characteristics can be fused, and accurate identification of various industrial control honeypots is supported.
(2) By surrounding the integrated framework and based on the iterative updating deep learning model training method, the continuous updating of the deep learning model is supported, and the accuracy of model identification is effectively improved.
(3) The method comprises the steps of training around a deep learning model, and performing data annotation based on strong features and a data set annotation method based on feedback online identification data correction. The former accurately marks industrial control honeypots and industrial control equipment based on strong characteristics, and the latter corrects error identification data through a target feedback or manual feedback mode. Sufficient training data is available and continuously provides modified training data for the deep learning model.
(4) By surrounding the integrated framework, the online scanning based characteristic data acquisition and industrial control honeypot identification are realized, the online scanning batch acquisition of comprehensive characteristic data is realized, the honeypot identification is realized based on the trained deep learning model, and the requirements on characteristic diversity and online identification are met.
It should be noted that the above-described system embodiments are merely illustrative, where the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. In addition, in the drawings of the embodiment of the system provided by the present invention, the connection relationship between the modules indicates that there is a communication connection between them, and may be specifically implemented as one or more communication buses or signal lines. One of ordinary skill in the art can understand and implement it without inventive effort.
As a preferred embodiment of the present invention, the present invention further provides an integrated industrial-control honeypot identification method based on deep learning, please refer to fig. 5, fig. 5 is a flowchart of an embodiment of the integrated industrial-control honeypot identification method based on deep learning, which includes steps S1-S3;
and S1, acquiring original data of different types of industrial control equipment according to the honeypot identification requirements, and performing feature extraction from the acquired original data to obtain strong feature data capable of accurately identifying honeypots and general feature data for training and learning.
As a preferred embodiment of the present invention, step S1 specifically includes: reading IP address information of a plurality of industrial control devices from an industrial control device IP address library to be scanned; constructing different types of detection messages according to the honeypot identification requirements, and sending the different types of detection messages to corresponding networked industrial control equipment; taking a response message returned by the networking industrial control equipment as a detection result, writing the response message back to an IP address library, and taking the detection result as original data; and extracting features from the original data, and extracting strong feature data capable of accurately identifying honeypots and general feature data for training and learning.
The strong characteristic data comprises a strong characteristic industrial control honeypot and strong characteristic normal industrial control equipment;
the strong-characteristic industrial control honeypot is a device, wherein ISP is a cloud platform or colleges or research institutes, operating system fingerprints are windows, the number of open ports is greater than Port threshold and violates Port opening rules, and OS fingerprints are windows; the strong-characteristic normal industrial control equipment is equipment with an ISP (Internet service provider) being a non-cloud platform or a non-college institute of university, an open Port being smaller than a Port threshold and not violating a Port opening rule, and an operating system fingerprint being an embedded OS (operating system).
It should be noted that the raw data of the different types of industrial control devices includes defense characteristic data, intrinsic characteristic data, and network attribute characteristic data.
And S2, recognizing the honeypots and the industrial control equipment according to the strong feature data, inputting the recognized general features and recognition results corresponding to the honeypots and the industrial control equipment as a training set to a deep learning model for training, and constructing a trained honeypot recognition model.
S3, performing feature extraction on the original data of different types of industrial control equipment on line, extracting first feature data, inputting the first feature data into the honeypot identification model to obtain a honeypot identification model and output a first identification result.
It should be noted that the honeypot identification method provided by the present invention further includes step S4;
and S4, when an error is identified on line, correcting the error result, inputting the corrected identification result into the honeypot identification model in an off-line state as a training set for training, and applying the trained honeypot identification model to the on-line identification again after the training is finished.
Compared with the prior art, the embodiment provided by the invention has the following difference:
the traditional industrial control honeypot identification method only aims at a certain type of honeypots, and a small number of characteristic values are used for identification, so that the method is lack of expandability and adaptability; the integrated industrial control honeypot identification method based on deep learning provided by the invention integrates deep learning model training and online industrial control honeypot identification, adopts an iterative updating deep learning model training method and characteristic data acquisition and industrial control honeypot identification based on online scanning, utilizes various identification characteristics, supports identification of various industrial control honeypots, has the expandability and adaptability, and effectively improves the accuracy of industrial control honeypot identification.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims (10)

1. The utility model provides an integration industry control honeypot identification system based on degree of deep learning which characterized in that includes: the system comprises a characteristic data acquisition module, a model training module and an online characteristic identification module;
the characteristic data acquisition module is used for acquiring original data of different types of industrial control equipment according to honeypot identification requirements, and performing characteristic extraction from the acquired original data to obtain strong characteristic data capable of accurately identifying honeypots and general characteristic data for training and learning;
the model training module is used for identifying the honeypots and the industrial control equipment according to the strong feature data, inputting the identified general features and identification results corresponding to the honeypots and the industrial control equipment as a training set to a deep learning model for training so as to construct a trained honeypot identification model;
the online feature identification module is used for performing feature extraction on the original data of different types of industrial control equipment on line, extracting first feature data, inputting the first feature data into the honeypot identification model, and obtaining a first identification result output by the honeypot identification model.
2. The integrated industrial-control honeypot identification system based on deep learning of claim 1, wherein the characteristic data acquisition module comprises an IP address acquisition unit, a detection message construction unit, a detection result generation unit and a characteristic extraction unit;
the IP address acquisition unit is used for reading IP address information of a plurality of industrial control devices from an industrial control device IP address library to be scanned;
the detection message construction unit is used for constructing different types of detection messages according to the honeypot identification requirements and sending the different types of detection messages to the corresponding networking industrial control equipment;
the detection result generation unit is used for writing a response message returned by the networking industrial control equipment as a detection result back to the IP address library and taking the detection result as original data;
and the feature extraction unit is used for extracting features from the original data and extracting strong feature data capable of accurately identifying honeypots and general feature data for training and learning.
3. The integrated industrial-control honeypot identification system based on deep learning of claim 2 wherein the strong feature data comprises strong-feature industrial-control honeypots and strong-feature normal industrial-control devices;
the strong-characteristic industrial control honeypot is a device, wherein ISP is a cloud platform or colleges or research institutes, operating system fingerprints are windows, the number of open ports is greater than Port threshold and violates Port opening rules, and OS fingerprints are windows;
the strong-characteristic normal industrial control equipment is equipment with an ISP (Internet service provider) being a non-cloud platform or a non-college institute of university, an open port being smaller than Porttthreshold and not violating a port open rule, and an operating system fingerprint being an embedded OS (operating system).
4. The integrated industrial-control honeypot identification system based on deep learning of claim 1 further comprising a model update module;
and the model updating module is used for correcting the error result when an error is identified on line, inputting the corrected identification result into the honeypot identification model in an off-line state as a training set for training, and applying the trained honeypot identification model to on-line identification again after the training is finished.
5. The integrated industrial-control honeypot identification system based on deep learning of claim 1, wherein the raw data of the different types of industrial-control devices comprises defense feature data, intrinsic feature data, and network attribute feature data.
6. An integrated industrial control honeypot identification method based on deep learning is characterized by comprising the following steps:
acquiring original data of different types of industrial control equipment according to honeypot identification requirements, and performing feature extraction from the acquired original data to obtain strong feature data capable of accurately identifying honeypots and general feature data for training and learning;
identifying honeypots and industrial control equipment according to the strong feature data, inputting the identified general features and identification results corresponding to the honeypots and the industrial control equipment as a training set to a deep learning model for training to construct a trained honeypot identification model;
feature extraction is carried out on line from original data of different types of industrial control equipment, first feature data are extracted, the first feature data are input into the honeypot identification model, and a first identification result output by the honeypot identification model is obtained.
7. The integrated industrial-control honeypot identification method based on deep learning of claim 6, wherein the original data of different types of industrial-control equipment are obtained according to honeypot identification requirements, and feature extraction is performed from the obtained original data to obtain strong feature data capable of accurately identifying honeypots and general feature data for training and learning, specifically:
reading IP address information of a plurality of industrial control devices from an industrial control device IP address library to be scanned;
constructing different types of detection messages according to the honeypot identification requirements, and sending the different types of detection messages to corresponding networked industrial control equipment;
taking a response message returned by the networking industrial control equipment as a detection result, writing the response message back to an IP address library, and taking the detection result as original data;
and extracting features from the original data, and extracting strong feature data capable of accurately identifying honeypots and general feature data for training and learning.
8. The integrated industrial-control honeypot identification method based on deep learning of claim 7, wherein the strong feature data comprises strong-feature industrial-control honeypots and strong-feature normal industrial-control devices;
the strong-characteristic industrial control honeypot is a device, wherein ISP is a cloud platform or colleges or research institutes, operating system fingerprints are windows, the number of open ports is greater than Port threshold and violates Port opening rules, and OS fingerprints are windows;
the strong-characteristic normal industrial control equipment is equipment with an ISP (Internet service provider) being a non-cloud platform or a non-college institute of university, an open port being smaller than Porttthreshold and not violating a port open rule, and an operating system fingerprint being an embedded OS (operating system).
9. The integrated industrial-control honeypot identification method based on deep learning of claim 6, further comprising:
and when an error is identified on line, correcting the error result, inputting the corrected identification result as a training set into the honeypot identification model in an off-line state for training, and applying the trained honeypot identification model to on-line identification again after the training is finished.
10. The integrated industrial-control honeypot identification method based on deep learning of claim 6, wherein the raw data of the different types of industrial-control devices comprises defense characteristic data, intrinsic characteristic data and network attribute characteristic data.
CN201911166903.6A 2019-11-25 2019-11-25 Integrated honey control tank identification system and method based on deep learning Active CN111126440B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911166903.6A CN111126440B (en) 2019-11-25 2019-11-25 Integrated honey control tank identification system and method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911166903.6A CN111126440B (en) 2019-11-25 2019-11-25 Integrated honey control tank identification system and method based on deep learning

Publications (2)

Publication Number Publication Date
CN111126440A true CN111126440A (en) 2020-05-08
CN111126440B CN111126440B (en) 2023-12-22

Family

ID=70496538

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911166903.6A Active CN111126440B (en) 2019-11-25 2019-11-25 Integrated honey control tank identification system and method based on deep learning

Country Status (1)

Country Link
CN (1) CN111126440B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111639680A (en) * 2020-05-09 2020-09-08 西北工业大学 Identity recognition method based on expert feedback mechanism
CN111756742A (en) * 2020-06-24 2020-10-09 广州锦行网络科技有限公司 Honeypot deception defense system and deception defense method thereof
CN112235241A (en) * 2020-09-08 2021-01-15 广州大学 Industrial control honeypot feature extraction method, system and medium based on fuzzy test
CN113472819A (en) * 2021-09-03 2021-10-01 国际关系学院 Honeypot detection and identification method and device based on fingerprint characteristics

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108600193A (en) * 2018-04-03 2018-09-28 北京威努特技术有限公司 A kind of industry control honey jar recognition methods based on machine learning
CN108768934A (en) * 2018-04-11 2018-11-06 北京立思辰新技术有限公司 Rogue program issues detection method, device and medium
CN109067778A (en) * 2018-09-18 2018-12-21 东北大学 A kind of industry control scanner fingerprint identification method based on sweet network data

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108600193A (en) * 2018-04-03 2018-09-28 北京威努特技术有限公司 A kind of industry control honey jar recognition methods based on machine learning
CN108768934A (en) * 2018-04-11 2018-11-06 北京立思辰新技术有限公司 Rogue program issues detection method, device and medium
CN109067778A (en) * 2018-09-18 2018-12-21 东北大学 A kind of industry control scanner fingerprint identification method based on sweet network data

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111639680A (en) * 2020-05-09 2020-09-08 西北工业大学 Identity recognition method based on expert feedback mechanism
CN111639680B (en) * 2020-05-09 2022-08-09 西北工业大学 Identity recognition method based on expert feedback mechanism
CN111756742A (en) * 2020-06-24 2020-10-09 广州锦行网络科技有限公司 Honeypot deception defense system and deception defense method thereof
CN111756742B (en) * 2020-06-24 2021-07-13 广州锦行网络科技有限公司 Honeypot deception defense system and deception defense method thereof
CN112235241A (en) * 2020-09-08 2021-01-15 广州大学 Industrial control honeypot feature extraction method, system and medium based on fuzzy test
CN112235241B (en) * 2020-09-08 2023-02-24 广州大学 Industrial control honeypot feature extraction method, system and medium based on fuzzy test
CN113472819A (en) * 2021-09-03 2021-10-01 国际关系学院 Honeypot detection and identification method and device based on fingerprint characteristics
CN113472819B (en) * 2021-09-03 2021-11-30 国际关系学院 Honeypot detection and identification method and device based on fingerprint characteristics

Also Published As

Publication number Publication date
CN111126440B (en) 2023-12-22

Similar Documents

Publication Publication Date Title
CN111126440A (en) Integrated industrial control honeypot identification system and method based on deep learning
CN107360145B (en) Multi-node honeypot system and data analysis method thereof
CN110875920B (en) Network threat analysis method and device, electronic equipment and storage medium
CN107659543B (en) Protection method for APT (android packet) attack of cloud platform
Gascon et al. Pulsar: Stateful black-box fuzzing of proprietary network protocols
CN108200030A (en) Detection method, system, device and the computer readable storage medium of malicious traffic stream
CN112383538B (en) Hybrid high-interaction industrial honeypot system and method
CN107667505A (en) System for monitoring and managing data center
CN109600362A (en) Zombie host recognition methods, identification equipment and medium based on identification model
CN111488587A (en) Automatic penetration test system based on AI
CN111049784B (en) Network attack detection method, device, equipment and storage medium
CN104852916A (en) Social engineering-based webpage verification code recognition method and system
Al-Daweri et al. An adaptive method and a new dataset, UKM-IDS20, for the network intrusion detection system
CN110096013A (en) A kind of intrusion detection method and device of industrial control system
CN114422271B (en) Data processing method, device, equipment and readable storage medium
CN109995751B (en) Internet access equipment marking method and device, storage medium and computer equipment
KR20190028880A (en) Method and appratus for generating machine learning data for botnet detection system
CN104852921A (en) Test system and method for protecting open port from attacking for network equipment
Dehlaghi-Ghadim et al. Anomaly detection dataset for industrial control systems
US11909754B2 (en) Security assessment system
CN108566380B (en) Proxy internet surfing behavior identification and detection method
CN111401067B (en) Honeypot simulation data generation method and device
CN114817928A (en) Network space data fusion analysis method and system, electronic device and storage medium
Antunes et al. Automatically complementing protocol specifications from network traces
Deptula Automation of cyber penetration testing using the detect, identify, predict, react intelligence automation model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant