CN110365636B - Method and device for judging attack data source of industrial control honeypot - Google Patents

Method and device for judging attack data source of industrial control honeypot Download PDF

Info

Publication number
CN110365636B
CN110365636B CN201910436006.6A CN201910436006A CN110365636B CN 110365636 B CN110365636 B CN 110365636B CN 201910436006 A CN201910436006 A CN 201910436006A CN 110365636 B CN110365636 B CN 110365636B
Authority
CN
China
Prior art keywords
attack
address
source
attack source
unknown
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910436006.6A
Other languages
Chinese (zh)
Other versions
CN110365636A (en
Inventor
孙利民
牛梦瑶
吕世超
游建舟
李红
石志强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Information Engineering of CAS
Original Assignee
Institute of Information Engineering of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Information Engineering of CAS filed Critical Institute of Information Engineering of CAS
Priority to CN201910436006.6A priority Critical patent/CN110365636B/en
Publication of CN110365636A publication Critical patent/CN110365636A/en
Application granted granted Critical
Publication of CN110365636B publication Critical patent/CN110365636B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2463/00Additional details relating to network architectures or network communication protocols for network security covered by H04L63/00
    • H04L2463/146Tracing the source of attacks

Abstract

The embodiment of the invention provides a method and a device for judging an industrial control honeypot attack data source, wherein the method comprises the following steps: extracting the original characteristics of the IP address of the unknown attack source based on the data captured by the industrial control honeypot; performing dimensionality reduction, normalization and reconstruction processing on the original features to obtain IP features of the IP address of the unknown attack source; and calculating the distance between the IP features and each training sample in a pre-constructed training data set by using a KNN classification algorithm, selecting three training samples with the closest distances as the nearest samples, and obtaining the attack source corresponding to the IP address of the unknown attack source according to the main attack source corresponding to the nearest samples. According to the embodiment of the invention, the attack source of the IP address of the unknown attack source can be effectively judged by extracting the IP characteristics corresponding to the IP address of the unknown attack source and adopting the KNN classification algorithm to obtain the attack source of the IP address of the unknown attack source according to the IP characteristics.

Description

Method and device for judging attack data source of industrial control honeypot
Technical Field
The invention relates to the technical field of industrial control safety, in particular to a method and a device for judging an industrial control honeypot attack data source.
Background
Industrial control systems, which are important components of national key infrastructure, have become the primary target of national-level network defense, and face more serious network security threats. The industrial control honeypot can actively trap attackers and deeply analyze attack sources and attack means characteristics. The industrial control honeypot not only can improve the discovery, analysis and handling capacity of operation and maintenance personnel of an industrial control system to the security threat, but also can guide relevant management personnel to implement effective security action decision before a security event occurs.
At present, due to the lack of a proper data processing method, the extraction and analysis of attack sources have certain difficulty based on the massive original attack logs and flow data packets captured by the industrial control honeypots. The source of the attack is the organization or individual that initiates the attack, which is the most important part of the threat environment. Attackers often use different IP addresses or victim hosts to hide their identity, which presents certain difficulties in identifying the source of the attack. In order to effectively identify the attack source of the industrial control honeypot data, it is necessary to provide a method for determining the attack source of the industrial control honeypot data.
Disclosure of Invention
The embodiment of the invention provides a method and a device for judging the source of industrial control honeypot attack data, which overcome the problems or at least partially solve the problems.
In a first aspect, an embodiment of the present invention provides a method for determining an attack data source of an industrial control honeypot, including:
for an IP address of an unknown attack source, extracting original characteristics of the IP address of the unknown attack source based on log information and flow data packets which are captured by an industrial control honeypot and are related to the IP address of the unknown attack source;
carrying out dimensionality reduction, normalization and reconstruction processing on the original characteristics of the IP address of the unknown attack source to obtain the IP characteristics of the IP address of the unknown attack source;
calculating the distance between the IP characteristics of the IP address of the unknown attack source and each training sample in a pre-constructed training data set by utilizing a KNN classification algorithm, selecting three training samples with the closest distance as the nearest samples of the IP address of the unknown attack source, and obtaining the attack source corresponding to the IP address of the unknown attack source according to the main attack source corresponding to the nearest samples;
wherein each training sample in the pre-constructed training dataset is an IP feature of an IP address of a known attack source.
In a second aspect, an embodiment of the present invention provides an apparatus for determining an attack data source of an industrial control honeypot, including:
the characteristic extraction module is used for extracting the original characteristics of the IP address of an unknown attack source based on log information and flow data packets which are captured by an industrial control honeypot and are related to the IP address of the unknown attack source;
the characteristic processing module is used for carrying out dimensionality reduction, normalization and reconstruction processing on the original characteristic of the IP address of the unknown attack source to obtain the IP characteristic of the IP address of the unknown attack source;
the class judgment module is used for calculating the distance between the IP characteristics of the IP address of the unknown attack source and each training sample in a pre-constructed training data set by utilizing a KNN classification algorithm, selecting three training samples with the closest distance as the nearest samples of the IP address of the unknown attack source, and obtaining the attack source corresponding to the IP address of the unknown attack source according to the main attack source corresponding to the nearest samples;
wherein each training sample in the pre-constructed training dataset is an IP feature of an IP address of a known attack source.
In a third aspect, an embodiment of the present invention provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the method for determining the source of industrial control honeypot attack data according to the first aspect when executing the program.
In a fourth aspect, an embodiment of the present invention provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the method for determining the source of industrial honeypot attack data as provided in the first aspect.
According to the method and the device for judging the source of the industrial control honeypot attack data, provided by the embodiment of the invention, the attack source of the IP address of the unknown attack source can be effectively judged by extracting the IP characteristics corresponding to the IP address of the unknown attack source and adopting a KNN classification algorithm according to the IP characteristics.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a schematic flow chart of a method for determining sources of industrial control honeypot attack data according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart illustrating the steps of extracting the original features of the IP address of an unknown attack source based on log information and traffic data packets, which are captured by an industrial control honeypot and are related to the IP address of the unknown attack source, for the IP address of the unknown attack source according to the present invention;
fig. 3 is a schematic structural diagram of a device for determining an attack data source of an industrial control honeypot according to an embodiment of the present invention;
fig. 4 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, a schematic flow chart of a method for determining an attack data source of an industrial honeypot provided by an embodiment of the present invention includes:
step 100, for an IP address of an unknown attack source, extracting original characteristics of the IP address of the unknown attack source based on log information and flow data packets which are captured by an industrial control honeypot and are related to the IP address of the unknown attack source;
the embodiment of the invention provides an attack source distinguishing method based on collected industrial control honeypot trapping data, and can provide accurate identification of an IP address organization source.
Specifically, for an IP address of an unknown attack source, log information and a flow data packet related to the IP address of the unknown attack source are searched from data captured by an industrial control honeypot, and the original characteristics of the IP address of the unknown attack source are extracted according to the log information and the flow data packet.
Wherein the original characteristics of the IP address comprise: the IP address, whether the Modbus protocol, the S7comm protocol and the IEC104 protocol are included, a Modbus attack mode, an S7comm attack mode, an IEC104 attack mode, the total amount of data packets from the IP, the frequency of attacking different protocols, the total amount of attack areas, attack time intervals and attack protocol time intervals.
It will be appreciated that the original characteristics of an IP address have the following data structure:
Figure BDA0002070525500000041
Figure BDA0002070525500000051
101, performing dimensionality reduction, normalization and reconstruction processing on the original characteristics of the IP address of the unknown attack source to obtain the IP characteristics of the IP address of the unknown attack source;
in order to reduce memory occupation and accelerate classification learning, the extracted original features of the IP address of the unknown attack source are subjected to dimensionality reduction, normalization and reconstruction processing.
The IP address of the unknown attack source can be subjected to dimensionality reduction by adopting the existing dimensionality reduction method to obtain the dimensionality-reduced characteristic. And then, carrying out normalization processing on the reduced-dimension features, namely carrying out linear transformation on the reduced-dimension features, and mapping the reduced-dimension features into a value range of 0-1. And reconstructing the normalized features, and taking the reconstructed features as the IP features corresponding to the IP address of the unknown attack source.
102, calculating the distance between the IP characteristics of the IP address of the unknown attack source and each training sample in a pre-constructed training data set by using a KNN classification algorithm, selecting three training samples with the closest distance as the nearest samples of the IP address of the unknown attack source, and obtaining the attack source corresponding to the IP address of the unknown attack source according to the main attack source corresponding to the nearest samples;
wherein each training sample in the pre-constructed training dataset is an IP feature of an IP address of a known attack source.
Specifically, the attack source corresponding to the IP address of the unknown attack source is determined by using the KNN classification algorithm in combination with the IP characteristics corresponding to the IP address of the known attack source, so that the attack source corresponding to the IP address of the unknown attack source can be effectively identified.
Firstly, calculating the distance between the IP characteristics of the IP address of the unknown attack source and each training sample in a pre-constructed training data set, selecting three training samples with the shortest distance as the nearest samples of the IP address of the unknown attack source, and obtaining the attack source corresponding to the IP address of the unknown attack source according to the main attack source corresponding to the nearest samples;
the main attack source corresponding to the nearest neighbor sample is the class which has the highest frequency of occurrence in the three nearest neighbor samples and is used as the class of the IP address of the unknown attack source, that is, the attack source corresponding to the IP address of the unknown attack source is determined.
In KNN, the distance between training data and test data is calculated as an indicator of non-similarity between data, where the distance generally uses euclidean distance or manhattan distance.
The calculation formula of the Euclidean distance is as follows:
Figure BDA0002070525500000061
the formula for calculating the manhattan distance is as follows:
Figure BDA0002070525500000062
according to the method for judging the source of the industrial control honeypot attack data, provided by the embodiment of the invention, the attack source of the IP address of the unknown attack source can be effectively judged by extracting the IP characteristics corresponding to the IP address of the unknown attack source and adopting a KNN classification algorithm according to the IP characteristics.
Based on the content of the foregoing embodiment, as shown in fig. 2, the step of extracting, for an IP address of an unknown attack source, an original feature of the IP address of the unknown attack source based on log information and a traffic data packet, which are captured by an industrial control honeypot and are related to the IP address of the unknown attack source, specifically includes:
step 200, judging whether the IP address attacks a Modbus protocol, an S7comm protocol or an IEC104 protocol, and if so, setting the value of the index corresponding to the protocol to be 1;
specifically, whether the Modbus protocol, the S7comm protocol and the IEC104 protocol are included or not is set to [0,0,0], and if the IP address attacks the Modbus protocol, the S7comm protocol or the IEC104 protocol, the value at the corresponding index of the protocols is set to 1. For example, if the IP address only attacks the Modbus protocol, but not the S7comm protocol or the IEC104 protocol, then whether the Modbus protocol, the S7comm protocol, and the IEC104 protocol are included is set to [1,0,0 ].
Step 201, analyzing the data packet related to the IP address, and respectively matching a Modbus attack mode, an S7comm attack mode and an IEC104 attack mode of the data packet according to the function code and related fields obtained by the analysis;
specifically, a data packet related to the IP address is analyzed, a Modbus attack mode of the data packet is matched according to the function code and the related field obtained through analysis, an S7comm attack mode of the data packet is matched according to the function code and the related field obtained through analysis, and an IEC104 attack mode of the data packet is matched according to the function code and the related field obtained through analysis.
The Modbus attack mode is set to be [0.0,0.0,0.0,0.0], the Modbus attack mode of the data packet is matched according to the function code and the relevant field, if the Modbus attack mode is matched with the function code, the corresponding mode is added by one, and finally the Modbus attack mode is divided by the total amount of the data packet corresponding to the Modbus attack mode and then the data packet is put into the corresponding list position.
In the embodiment of the invention, attack modes of Modbus are divided into four types: if function code 90 is used and all or part of the substation information is scanned, it is classified as a first type; if the function codes 17 and 43 scan the 0 substation and the 255 substation, the function codes are classified into a second class; if the function code 03 reads the holding register or the function code 04 reads the input register, the register is classified into a third class; others fall into the fourth category.
And setting the S7comm attack mode as 0.0,0.0 and 0.0, matching the S7comm attack mode of the data packet according to the function code and the related field, if the S7comm attack mode is matched with the related field, adding one to the corresponding mode, and finally dividing the sum of the data packets corresponding to the S7comm attack mode and putting the sum into a corresponding list position.
The attack patterns of S7comm are divided into three categories: if only ISO _ TP connection and S7 communication are established with the honeypot, the honeypot is classified as a first type; if the honeypot information is requested by using the 0x00 function code, classifying the honeypot information into a second class; other cases fall into the third category.
And the IEC104 attack mode is set to be [0.0,0.0,0.0,0.0], the IEC104 attack mode of the data packet is matched according to the function code and the related field, if the IEC104 attack mode is matched with the related field, the corresponding mode is added by one, and finally the IEC104 attack mode is divided by the total amount of the data packet corresponding to the IEC104 attack mode and then is placed in the corresponding list position.
The IEC104 attack patterns are divided into four classes, and if the test connection is a test connection, the transmission starting and the total calling are activated, namely the test connection is matched with a 5c2353540a field or a 474554202f20485454502f312e310d0a557365722d4167656e743a204d6f7a696c6c612f35 field, the test connection is classified into a first class; if the data packet matches the 0d0a0d0a field, the data packet is classified as a second type; if 680443000000 field, 680407000000 field or 680e0000000064010600ffff00000000 field is matched, classifying as a third class; if the CONNECTION status is indicated, the CONNECTION is matched to NEW _ CONNECTION, CONNECTION _ TERMINATED or CONNECTION _ LOST, and the CONNECTION is classified into the fourth category.
Step 202, the total amount of data packets of which the source IP is the IP address is calculated, that is, the total amount of data packets from the IP is calculated.
Step 203, calculating the ratio of the data packets subjected to Modbus attack, S7comm attack and IEC104 attack in the total data packets respectively;
specifically, the frequency of attacking different protocols is set to [0.0,0.0,0.0], which means the ratio of the total packet value of Modbus, S7comm and IEC104 attack packets.
Step 204, acquiring the total number of different honeypot IP addresses in the target IP, and taking the total number of the different honeypot IP addresses in the target IP as the total number of the attack area;
step 205, calculating an attack time difference value between the first attack data packet and the last attack data packet;
the attack time interval is the time difference between the first related data packet and the last related data packet, if the time difference is concentrated in one day, the time interval is set to be 1, otherwise, the time interval is set to be the specific days of the time difference.
And step 206, calculating the time difference value between the first relevant data packet time and the last relevant data packet in the data packets subjected to the Modbus attack, calculating the time difference value between the first relevant data packet time and the last relevant data packet in the data packets subjected to the S7comm attack, and calculating the time difference value between the first relevant data packet time and the last relevant data packet in the data packets subjected to the IEC104 attack.
Specifically, the attack protocol time interval is set to [0,0,0], the time difference value between the first relevant data packet time and the last relevant data packet in the data packets of the Modbus attack is calculated, and if the time difference value is concentrated on one day, the time interval is set to be 1. Similarly, the time interval of S7comm and IEC104 is calculated.
It should be noted that, the steps 200 to 206 have no chronological sequence relationship, that is, the embodiment of the present invention does not limit the chronological relationship between the steps 200 to 206.
In an embodiment, the performing dimensionality reduction, normalization, and reconstruction processing on the original feature of the IP address of the unknown attack source to obtain the IP feature of the IP address of the unknown attack source specifically includes:
carrying out dimensionality reduction on the original characteristics of the IP address of the unknown attack source through a Principal Component Analysis (PCA) algorithm to obtain four-dimensional data, and carrying out normalization processing to obtain processed characteristics;
reconstructing the processed characteristics by using the IP address of the unknown attack source to obtain the IP characteristics corresponding to the IP address of the unknown attack source
Based on the content of the foregoing embodiments, before extracting, for an IP address of an unknown attack source, original features of the IP address of the unknown attack source based on log information and traffic data packets, which are captured by an industrial control honeypot and are related to the IP address of the unknown attack source, the method further includes:
extracting the original characteristics of the IP address of the known attack source based on the massive original attack logs and flow data packets captured by the industrial control honeypot;
carrying out dimensionality reduction, normalization and reconstruction processing on the original characteristics of the IP address of the known attack source to obtain the characteristics of the IP address of the known attack source;
and taking the characteristics of the IP address of each known attack source as a training sample, generating a training data set, and establishing a class label for each training sample according to the attack source to which each training sample belongs.
Specifically, before identifying an attack source corresponding to an IP address of an unknown attack source, a training sample set with a class label needs to be constructed.
Extracting the original characteristics of the IP address of the known attack source by adopting the same method as the embodiment, and performing dimensionality reduction, normalization and reconstruction processing on the original characteristics of the IP address of the known attack source by adopting the same method as the embodiment to obtain the characteristics of the IP address of the known attack source.
And taking the characteristics of the IP address of each known attack source as a training sample, constructing a training data set, and establishing a class label for each training sample according to the attack source to which each training sample belongs.
The category label includes: shodan, umich, Nagravision SA, reverse, plcscan, amazonaws, Alibaba, adsl, neu, linode, Unicom, Telecom, DataService, 360, and others.
As shown in fig. 3, a schematic structural diagram of an apparatus for determining an attack data source of an industrial honeypot provided in an embodiment of the present invention includes: a feature extraction module 301, a feature processing module 302, and a category determination module 303, wherein,
the feature extraction module 301 is configured to, for an IP address of an unknown attack source, extract an original feature of the IP address of the unknown attack source based on log information and a traffic data packet, which are captured by an industrial control honeypot and are related to the IP address of the unknown attack source;
the embodiment of the invention provides an attack source distinguishing method based on collected industrial control honeypot trapping data, and can provide accurate identification of an IP address organization source.
Specifically, for an IP address of an unknown attack source, the feature extraction module 301 searches log information and a traffic data packet related to the IP address of the unknown attack source from data captured by the industrial control honeypot, and extracts an original feature of the IP address of the unknown attack source according to the log information and the traffic data packet.
Wherein the original characteristics of the IP address comprise: the IP address, whether the Modbus protocol, the S7comm protocol and the IEC104 protocol are included, a Modbus attack mode, an S7comm attack mode, an IEC104 attack mode, the total amount of data packets from the IP, the frequency of attacking different protocols, the total amount of attack areas, attack time intervals and attack protocol time intervals.
It will be appreciated that the original characteristics of an IP address have the following data structure:
Figure BDA0002070525500000101
the feature extraction module 301 is specifically configured to:
judging whether the IP address attacks a Modbus protocol, an S7comm protocol or an IEC104 protocol, and if so, setting the value of the index corresponding to the protocol to be 1;
analyzing the data packet related to the IP address, and respectively matching a Modbus attack mode, an S7comm attack mode and an IEC104 attack mode of the data packet according to the function code obtained by analysis and the related field;
calculating the total amount of data packets of which the source IP is the IP address;
calculating the ratio of the data packets subjected to Modbus attack, S7comm attack and IEC104 attack in the total data packets respectively;
acquiring the total number of different honeypot IP addresses in a target IP;
calculating an attack time difference value between the first attack data packet and the last attack data packet;
and calculating the time difference value between the first relevant data packet time and the last relevant data packet in the data packets subjected to the Modbus attack, calculating the time difference value between the first relevant data packet time and the last relevant data packet in the data packets subjected to the S7comm attack, and calculating the time difference value between the first relevant data packet time and the last relevant data packet in the data packets subjected to the IEC104 attack.
A feature processing module 302, configured to perform dimension reduction, normalization, and reconstruction processing on the original feature of the IP address of the unknown attack source to obtain an IP feature of the IP address of the unknown attack source;
in order to reduce the memory usage and speed up the classification learning, the feature processing module 302 performs dimension reduction, normalization and reconstruction processing on the extracted original features of the IP address of the unknown attack source.
The feature processing module 302 performs dimension reduction on the IP address of the unknown attack source by using the existing dimension reduction method to obtain a feature after dimension reduction. And then, carrying out normalization processing on the reduced-dimension features, namely carrying out linear transformation on the reduced-dimension features, and mapping the reduced-dimension features into a value range of 0-1. And reconstructing the normalized features, and taking the reconstructed features as the IP features corresponding to the IP address of the unknown attack source.
The class determination module 303 is configured to calculate a distance between an IP feature of the IP address of the unknown attack source and each training sample in a pre-constructed training data set by using a KNN classification algorithm, select three training samples with the closest distance as nearest samples of the IP address of the unknown attack source, and obtain an attack source corresponding to the IP address of the unknown attack source according to a main attack source corresponding to the nearest samples;
wherein each training sample in the pre-constructed training dataset is an IP feature of an IP address of a known attack source.
Specifically, the category determination module 303 determines the attack source corresponding to the IP address of the unknown attack source by using the KNN classification algorithm in combination with the IP features corresponding to the IP addresses of the known attack sources, and can effectively identify the attack source corresponding to the IP address of the unknown attack source.
Firstly, the category determination module 303 calculates the distance between the IP feature of the IP address of the unknown attack source and each training sample in the pre-constructed training data set, selects three training samples with the closest distance as the nearest samples of the IP address of the unknown attack source, and obtains the attack source corresponding to the IP address of the unknown attack source according to the main attack source corresponding to the nearest samples.
The main attack source corresponding to the nearest neighbor sample is the class which has the highest frequency of occurrence in the three nearest neighbor samples and is used as the class of the IP address of the unknown attack source, that is, the attack source corresponding to the IP address of the unknown attack source is determined.
In KNN, the distance between training data and test data is calculated as an indicator of non-similarity between data, where the distance generally uses euclidean distance or manhattan distance.
The calculation formula of the Euclidean distance is as follows:
Figure BDA0002070525500000121
the formula for calculating the manhattan distance is as follows:
Figure BDA0002070525500000122
according to the device for judging the source of the industrial control honeypot attack data, which is provided by the embodiment of the invention, the attack source of the IP address of the unknown attack source can be effectively judged by extracting the IP characteristics corresponding to the IP address of the unknown attack source and adopting a KNN classification algorithm according to the IP characteristics.
Fig. 4 is a schematic entity structure diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 4, the electronic device may include: a processor (processor)410, a communication Interface 420, a memory (memory)430 and a communication bus 440, wherein the processor 410, the communication Interface 420 and the memory 430 are communicated with each other via the communication bus 440. The processor 410 may call a computer program stored in the memory 430 and operable on the processor 410 to execute the method for determining the source of industrial honeypot attack data provided by the above embodiments of the method, for example, including: for an IP address of an unknown attack source, extracting original characteristics of the IP address of the unknown attack source based on log information and flow data packets which are captured by an industrial control honeypot and are related to the IP address of the unknown attack source; carrying out dimensionality reduction, normalization and reconstruction processing on the original characteristics of the IP address of the unknown attack source to obtain the IP characteristics of the IP address of the unknown attack source; calculating the distance between the IP characteristics of the IP address of the unknown attack source and each training sample in a pre-constructed training data set by utilizing a KNN classification algorithm, selecting three training samples with the closest distance as the nearest samples of the IP address of the unknown attack source, and obtaining the attack source corresponding to the IP address of the unknown attack source according to the main attack source corresponding to the nearest samples; wherein each training sample in the pre-constructed training dataset is an IP feature of an IP address of a known attack source.
In addition, the logic instructions in the memory 430 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solutions of the embodiments of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
An embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a method for determining an attack data source of an industrial honeypot, which is provided in the foregoing method embodiments, and includes: for an IP address of an unknown attack source, extracting original characteristics of the IP address of the unknown attack source based on log information and flow data packets which are captured by an industrial control honeypot and are related to the IP address of the unknown attack source; carrying out dimensionality reduction, normalization and reconstruction processing on the original characteristics of the IP address of the unknown attack source to obtain the IP characteristics of the IP address of the unknown attack source; calculating the distance between the IP characteristics of the IP address of the unknown attack source and each training sample in a pre-constructed training data set by utilizing a KNN classification algorithm, selecting three training samples with the closest distance as the nearest samples of the IP address of the unknown attack source, and obtaining the attack source corresponding to the IP address of the unknown attack source according to the main attack source corresponding to the nearest samples; wherein each training sample in the pre-constructed training dataset is an IP feature of an IP address of a known attack source.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (6)

1. A method for judging the source of industrial control honeypot attack data is characterized by comprising the following steps:
for an IP address of an unknown attack source, extracting original characteristics of the IP address of the unknown attack source based on log information and flow data packets which are captured by an industrial control honeypot and are related to the IP address of the unknown attack source;
carrying out dimensionality reduction, normalization and reconstruction processing on the original characteristics of the IP address of the unknown attack source to obtain the IP characteristics of the IP address of the unknown attack source;
calculating the distance between the IP characteristics of the IP address of the unknown attack source and each training sample in a pre-constructed training data set by utilizing a KNN classification algorithm, selecting three training samples with the closest distance as the nearest samples of the IP address of the unknown attack source, and obtaining the attack source corresponding to the IP address of the unknown attack source according to the main attack source corresponding to the nearest samples;
wherein each training sample in the pre-constructed training data set is an IP characteristic of an IP address of a known attack source;
wherein the original characteristics of the IP address comprise: the IP address, whether the Modbus protocol, the S7comm protocol and the IEC104 protocol are included, a Modbus attack mode, an S7comm attack mode, an IEC104 attack mode, the total amount of data packets from the IP, the frequency of attacking different protocols, the total amount of attack areas, attack time intervals and attack protocol time intervals;
the step of extracting the original characteristics of the IP address of the unknown attack source based on log information and traffic data packets, which are captured by an industrial control honeypot and are related to the IP address of the unknown attack source, specifically comprises the following steps:
judging whether the IP address attacks a Modbus protocol, an S7comm protocol or an IEC104 protocol, and if so, setting the value of the index corresponding to the protocol to be 1;
analyzing the data packet related to the IP address, and respectively matching a Modbus attack mode, an S7comm attack mode and an IEC104 attack mode of the data packet according to the function code obtained by analysis and the related field;
calculating the total amount of data packets of which the source IP is the IP address;
calculating the ratio of the data packets subjected to Modbus attack, S7comm attack and IEC104 attack in the total data packets respectively;
acquiring the total number of different honeypot IP addresses in a target IP;
calculating an attack time difference value between the first attack data packet and the last attack data packet;
and calculating the time difference value between the first relevant data packet time and the last relevant data packet in the data packets subjected to the Modbus attack, calculating the time difference value between the first relevant data packet time and the last relevant data packet in the data packets subjected to the S7comm attack, and calculating the time difference value between the first relevant data packet time and the last relevant data packet in the data packets subjected to the IEC104 attack.
2. The method according to claim 1, wherein the original feature of the IP address of the unknown attack source is subjected to dimensionality reduction, normalization, and reconstruction processing to obtain the IP feature of the IP address of the unknown attack source, and specifically includes:
carrying out dimensionality reduction on the original characteristics of the IP address of the unknown attack source through a Principal Component Analysis (PCA) algorithm to obtain four-dimensional data, and carrying out normalization processing to obtain processed characteristics;
and reconstructing the processed characteristics by using the IP address of the unknown attack source to obtain the IP characteristics corresponding to the IP address of the unknown attack source.
3. The method according to claim 1, wherein before extracting the original characteristics of the IP address of the unknown attack source based on log information and traffic data packets captured by the industrial honeypot and related to the IP address of the unknown attack source, for the IP address of the unknown attack source, the method further comprises:
extracting the original characteristics of the IP address of the known attack source based on the massive original attack logs and flow data packets captured by the industrial control honeypot;
carrying out dimensionality reduction, normalization and reconstruction processing on the original characteristics of the IP address of the known attack source to obtain the characteristics of the IP address of the known attack source;
and taking the characteristics of the IP address of each known attack source as a training sample, generating a training data set, and establishing a class label for each training sample according to the attack source to which each training sample belongs.
4. The utility model provides a discriminating gear of industrial control honeypot attack data source which characterized in that includes:
the characteristic extraction module is used for extracting the original characteristics of the IP address of an unknown attack source based on log information and flow data packets which are captured by an industrial control honeypot and are related to the IP address of the unknown attack source;
the characteristic processing module is used for carrying out dimensionality reduction, normalization and reconstruction processing on the original characteristic of the IP address of the unknown attack source to obtain the IP characteristic of the IP address of the unknown attack source;
the class judgment module is used for calculating the distance between the IP characteristics of the IP address of the unknown attack source and each training sample in a pre-constructed training data set by utilizing a KNN classification algorithm, selecting three training samples with the closest distance as the nearest samples of the IP address of the unknown attack source, and obtaining the attack source corresponding to the IP address of the unknown attack source according to the main attack source corresponding to the nearest samples;
wherein each training sample in the pre-constructed training data set is an IP characteristic of an IP address of a known attack source;
wherein, the original characteristics of the IP address comprise the following information: the IP address, whether the Modbus protocol, the S7comm protocol and the IEC104 protocol are included, a Modbus attack mode, an S7comm attack mode, an IEC104 attack mode, the total amount of data packets from the IP, the frequency of attacking different protocols, the total amount of attack areas, attack time intervals and attack protocol time intervals;
wherein the feature extraction module is specifically configured to:
judging whether the IP address attacks a Modbus protocol, an S7comm protocol or an IEC104 protocol, and if so, setting the value of the index corresponding to the protocol to be 1;
analyzing the data packet related to the IP address, and respectively matching a Modbus attack mode, an S7comm attack mode and an IEC104 attack mode of the data packet according to the function code obtained by analysis and the related field;
calculating the total amount of data packets of which the source IP is the IP address;
calculating the ratio of the data packets subjected to Modbus attack, S7comm attack and IEC104 attack in the total data packets respectively;
acquiring the total number of different honeypot IP addresses in a target IP;
calculating an attack time difference value between the first attack data packet and the last attack data packet;
and calculating the time difference value between the first relevant data packet time and the last relevant data packet in the data packets subjected to the Modbus attack, calculating the time difference value between the first relevant data packet time and the last relevant data packet in the data packets subjected to the S7comm attack, and calculating the time difference value between the first relevant data packet time and the last relevant data packet in the data packets subjected to the IEC104 attack.
5. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method for determining the source of industrial honeypot attack data according to any one of claims 1 to 3 when executing the program.
6. A non-transitory computer readable storage medium, having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the steps of the method for determining the source of industrial honeypot attack data according to any one of claims 1 to 3.
CN201910436006.6A 2019-05-23 2019-05-23 Method and device for judging attack data source of industrial control honeypot Active CN110365636B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910436006.6A CN110365636B (en) 2019-05-23 2019-05-23 Method and device for judging attack data source of industrial control honeypot

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910436006.6A CN110365636B (en) 2019-05-23 2019-05-23 Method and device for judging attack data source of industrial control honeypot

Publications (2)

Publication Number Publication Date
CN110365636A CN110365636A (en) 2019-10-22
CN110365636B true CN110365636B (en) 2020-09-11

Family

ID=68215285

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910436006.6A Active CN110365636B (en) 2019-05-23 2019-05-23 Method and device for judging attack data source of industrial control honeypot

Country Status (1)

Country Link
CN (1) CN110365636B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111212053B (en) * 2019-12-27 2022-03-11 太原理工大学 Industrial control honeypot-oriented homologous attack analysis method
CN112367315B (en) * 2020-11-03 2021-09-28 浙江大学 Endogenous safe WAF honeypot deployment method
CN112804374B (en) * 2021-01-06 2023-11-03 光通天下网络科技股份有限公司 Threat IP identification method, threat IP identification device, threat IP identification equipment and threat IP identification medium
CN114301629A (en) * 2021-11-26 2022-04-08 北京六方云信息技术有限公司 IP detection method, device, terminal equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104158800A (en) * 2014-07-21 2014-11-19 南京邮电大学 Detection method of DDoS (Distributed Denial of Service) attack for software defined network (SDN)
CN108629183A (en) * 2018-05-14 2018-10-09 南开大学 Multi-model malicious code detecting method based on Credibility probability section

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102739647A (en) * 2012-05-23 2012-10-17 国家计算机网络与信息安全管理中心 High-interaction honeypot based network security system and implementation method thereof
CN103281177B (en) * 2013-04-10 2016-09-14 广东电网公司信息中心 Detection method and system to Internet information system malicious attack
CN103618744B (en) * 2013-12-10 2017-01-11 华东理工大学 Intrusion detection method based on fast k-nearest neighbor (KNN) algorithm
KR20190050521A (en) * 2017-11-03 2019-05-13 주식회사 윈스 Apparatus and method for detecting anomalous signs using profiling-based machine learning
CN109120627B (en) * 2018-08-29 2021-07-13 重庆邮电大学 6LoWPAN network intrusion detection method based on improved KNN
CN109274677B (en) * 2018-10-11 2021-04-27 四川长虹电器股份有限公司 IP classification method and system based on machine learning
CN109711547A (en) * 2018-12-24 2019-05-03 武汉邦拓信息科技有限公司 A kind of pollution sources disorder data recognition method based on deep learning algorithm

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104158800A (en) * 2014-07-21 2014-11-19 南京邮电大学 Detection method of DDoS (Distributed Denial of Service) attack for software defined network (SDN)
CN108629183A (en) * 2018-05-14 2018-10-09 南开大学 Multi-model malicious code detecting method based on Credibility probability section

Also Published As

Publication number Publication date
CN110365636A (en) 2019-10-22

Similar Documents

Publication Publication Date Title
CN110365636B (en) Method and device for judging attack data source of industrial control honeypot
CN110380896B (en) Network security situation awareness system and method based on attack graph
CN108881265B (en) Network attack detection method and system based on artificial intelligence
CN108471429B (en) Network attack warning method and system
CN108683687B (en) Network attack identification method and system
CN108833185B (en) Network attack route restoration method and system
CN107360118B (en) Advanced persistent threat attack protection method and device
CN111131260B (en) Mass network malicious domain name identification and classification method and system
CN112468347B (en) Security management method and device for cloud platform, electronic equipment and storage medium
CN114666162B (en) Flow detection method, device, equipment and storage medium
CN110611640A (en) DNS protocol hidden channel detection method based on random forest
CN114553523A (en) Attack detection method and device based on attack detection model, medium and equipment
CN113706100B (en) Real-time detection and identification method and system for Internet of things terminal equipment of power distribution network
JP2015222471A (en) Malicious communication pattern detecting device, malicious communication pattern detecting method, and malicious communication pattern detecting program
CN111523588A (en) Method for classifying APT attack malicious software traffic based on improved LSTM
CN113923003A (en) Attacker portrait generation method, system, equipment and medium
CN107209834A (en) Malicious communication pattern extraction apparatus, malicious communication schema extraction system, malicious communication schema extraction method and malicious communication schema extraction program
CN117478433B (en) Network and information security dynamic early warning system
Harbola et al. Improved intrusion detection in DDoS applying feature selection using rank & score of attributes in KDD-99 data set
CN113709176A (en) Threat detection and response method and system based on secure cloud platform
CN112070161A (en) Network attack event classification method, device, terminal and storage medium
CN112153062A (en) Multi-dimension-based suspicious terminal equipment detection method and system
CN109600361B (en) Hash algorithm-based verification code anti-attack method and device, electronic equipment and non-transitory computer readable storage medium
CN112953948A (en) Real-time network transverse worm attack flow detection method and device
CN115828245A (en) Malicious file identification method based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant