CN110365636B

CN110365636B - Method and device for judging attack data source of industrial control honeypot

Info

Publication number: CN110365636B
Application number: CN201910436006.6A
Authority: CN
Inventors: 孙利民; 牛梦瑶; 吕世超; 游建舟; 李红; 石志强
Original assignee: Institute of Information Engineering of CAS
Current assignee: Institute of Information Engineering of CAS
Priority date: 2019-05-23
Filing date: 2019-05-23
Publication date: 2020-09-11
Anticipated expiration: 2039-05-23
Also published as: CN110365636A

Abstract

The embodiment of the invention provides a method and a device for judging an industrial control honeypot attack data source, wherein the method comprises the following steps: extracting the original characteristics of the IP address of the unknown attack source based on the data captured by the industrial control honeypot; performing dimensionality reduction, normalization and reconstruction processing on the original features to obtain IP features of the IP address of the unknown attack source; and calculating the distance between the IP features and each training sample in a pre-constructed training data set by using a KNN classification algorithm, selecting three training samples with the closest distances as the nearest samples, and obtaining the attack source corresponding to the IP address of the unknown attack source according to the main attack source corresponding to the nearest samples. According to the embodiment of the invention, the attack source of the IP address of the unknown attack source can be effectively judged by extracting the IP characteristics corresponding to the IP address of the unknown attack source and adopting the KNN classification algorithm to obtain the attack source of the IP address of the unknown attack source according to the IP characteristics.

Description

Method and device for judging attack data source of industrial control honeypot

Technical Field

The invention relates to the technical field of industrial control safety, in particular to a method and a device for judging an industrial control honeypot attack data source.

Background

Industrial control systems, which are important components of national key infrastructure, have become the primary target of national-level network defense, and face more serious network security threats. The industrial control honeypot can actively trap attackers and deeply analyze attack sources and attack means characteristics. The industrial control honeypot not only can improve the discovery, analysis and handling capacity of operation and maintenance personnel of an industrial control system to the security threat, but also can guide relevant management personnel to implement effective security action decision before a security event occurs.

At present, due to the lack of a proper data processing method, the extraction and analysis of attack sources have certain difficulty based on the massive original attack logs and flow data packets captured by the industrial control honeypots. The source of the attack is the organization or individual that initiates the attack, which is the most important part of the threat environment. Attackers often use different IP addresses or victim hosts to hide their identity, which presents certain difficulties in identifying the source of the attack. In order to effectively identify the attack source of the industrial control honeypot data, it is necessary to provide a method for determining the attack source of the industrial control honeypot data.

Disclosure of Invention

The embodiment of the invention provides a method and a device for judging the source of industrial control honeypot attack data, which overcome the problems or at least partially solve the problems.

In a first aspect, an embodiment of the present invention provides a method for determining an attack data source of an industrial control honeypot, including:

for an IP address of an unknown attack source, extracting original characteristics of the IP address of the unknown attack source based on log information and flow data packets which are captured by an industrial control honeypot and are related to the IP address of the unknown attack source;

carrying out dimensionality reduction, normalization and reconstruction processing on the original characteristics of the IP address of the unknown attack source to obtain the IP characteristics of the IP address of the unknown attack source;

calculating the distance between the IP characteristics of the IP address of the unknown attack source and each training sample in a pre-constructed training data set by utilizing a KNN classification algorithm, selecting three training samples with the closest distance as the nearest samples of the IP address of the unknown attack source, and obtaining the attack source corresponding to the IP address of the unknown attack source according to the main attack source corresponding to the nearest samples;

wherein each training sample in the pre-constructed training dataset is an IP feature of an IP address of a known attack source.

In a second aspect, an embodiment of the present invention provides an apparatus for determining an attack data source of an industrial control honeypot, including:

the characteristic extraction module is used for extracting the original characteristics of the IP address of an unknown attack source based on log information and flow data packets which are captured by an industrial control honeypot and are related to the IP address of the unknown attack source;

the characteristic processing module is used for carrying out dimensionality reduction, normalization and reconstruction processing on the original characteristic of the IP address of the unknown attack source to obtain the IP characteristic of the IP address of the unknown attack source;

the class judgment module is used for calculating the distance between the IP characteristics of the IP address of the unknown attack source and each training sample in a pre-constructed training data set by utilizing a KNN classification algorithm, selecting three training samples with the closest distance as the nearest samples of the IP address of the unknown attack source, and obtaining the attack source corresponding to the IP address of the unknown attack source according to the main attack source corresponding to the nearest samples;

In a third aspect, an embodiment of the present invention provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the method for determining the source of industrial control honeypot attack data according to the first aspect when executing the program.

In a fourth aspect, an embodiment of the present invention provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the method for determining the source of industrial honeypot attack data as provided in the first aspect.

According to the method and the device for judging the source of the industrial control honeypot attack data, provided by the embodiment of the invention, the attack source of the IP address of the unknown attack source can be effectively judged by extracting the IP characteristics corresponding to the IP address of the unknown attack source and adopting a KNN classification algorithm according to the IP characteristics.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

FIG. 1 is a schematic flow chart of a method for determining sources of industrial control honeypot attack data according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart illustrating the steps of extracting the original features of the IP address of an unknown attack source based on log information and traffic data packets, which are captured by an industrial control honeypot and are related to the IP address of the unknown attack source, for the IP address of the unknown attack source according to the present invention;

fig. 3 is a schematic structural diagram of a device for determining an attack data source of an industrial control honeypot according to an embodiment of the present invention;

fig. 4 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As shown in fig. 1, a schematic flow chart of a method for determining an attack data source of an industrial honeypot provided by an embodiment of the present invention includes:

step 100, for an IP address of an unknown attack source, extracting original characteristics of the IP address of the unknown attack source based on log information and flow data packets which are captured by an industrial control honeypot and are related to the IP address of the unknown attack source;

the embodiment of the invention provides an attack source distinguishing method based on collected industrial control honeypot trapping data, and can provide accurate identification of an IP address organization source.

Specifically, for an IP address of an unknown attack source, log information and a flow data packet related to the IP address of the unknown attack source are searched from data captured by an industrial control honeypot, and the original characteristics of the IP address of the unknown attack source are extracted according to the log information and the flow data packet.

Wherein the original characteristics of the IP address comprise: the IP address, whether the Modbus protocol, the S7comm protocol and the IEC104 protocol are included, a Modbus attack mode, an S7comm attack mode, an IEC104 attack mode, the total amount of data packets from the IP, the frequency of attacking different protocols, the total amount of attack areas, attack time intervals and attack protocol time intervals.

It will be appreciated that the original characteristics of an IP address have the following data structure:

101, performing dimensionality reduction, normalization and reconstruction processing on the original characteristics of the IP address of the unknown attack source to obtain the IP characteristics of the IP address of the unknown attack source;

in order to reduce memory occupation and accelerate classification learning, the extracted original features of the IP address of the unknown attack source are subjected to dimensionality reduction, normalization and reconstruction processing.

The IP address of the unknown attack source can be subjected to dimensionality reduction by adopting the existing dimensionality reduction method to obtain the dimensionality-reduced characteristic. And then, carrying out normalization processing on the reduced-dimension features, namely carrying out linear transformation on the reduced-dimension features, and mapping the reduced-dimension features into a value range of 0-1. And reconstructing the normalized features, and taking the reconstructed features as the IP features corresponding to the IP address of the unknown attack source.

102, calculating the distance between the IP characteristics of the IP address of the unknown attack source and each training sample in a pre-constructed training data set by using a KNN classification algorithm, selecting three training samples with the closest distance as the nearest samples of the IP address of the unknown attack source, and obtaining the attack source corresponding to the IP address of the unknown attack source according to the main attack source corresponding to the nearest samples;

Specifically, the attack source corresponding to the IP address of the unknown attack source is determined by using the KNN classification algorithm in combination with the IP characteristics corresponding to the IP address of the known attack source, so that the attack source corresponding to the IP address of the unknown attack source can be effectively identified.

Firstly, calculating the distance between the IP characteristics of the IP address of the unknown attack source and each training sample in a pre-constructed training data set, selecting three training samples with the shortest distance as the nearest samples of the IP address of the unknown attack source, and obtaining the attack source corresponding to the IP address of the unknown attack source according to the main attack source corresponding to the nearest samples;

the main attack source corresponding to the nearest neighbor sample is the class which has the highest frequency of occurrence in the three nearest neighbor samples and is used as the class of the IP address of the unknown attack source, that is, the attack source corresponding to the IP address of the unknown attack source is determined.

In KNN, the distance between training data and test data is calculated as an indicator of non-similarity between data, where the distance generally uses euclidean distance or manhattan distance.

The calculation formula of the Euclidean distance is as follows:

the formula for calculating the manhattan distance is as follows:

according to the method for judging the source of the industrial control honeypot attack data, provided by the embodiment of the invention, the attack source of the IP address of the unknown attack source can be effectively judged by extracting the IP characteristics corresponding to the IP address of the unknown attack source and adopting a KNN classification algorithm according to the IP characteristics.

Based on the content of the foregoing embodiment, as shown in fig. 2, the step of extracting, for an IP address of an unknown attack source, an original feature of the IP address of the unknown attack source based on log information and a traffic data packet, which are captured by an industrial control honeypot and are related to the IP address of the unknown attack source, specifically includes:

step 200, judging whether the IP address attacks a Modbus protocol, an S7comm protocol or an IEC104 protocol, and if so, setting the value of the index corresponding to the protocol to be 1;

specifically, whether the Modbus protocol, the S7comm protocol and the IEC104 protocol are included or not is set to [0,0,0], and if the IP address attacks the Modbus protocol, the S7comm protocol or the IEC104 protocol, the value at the corresponding index of the protocols is set to 1. For example, if the IP address only attacks the Modbus protocol, but not the S7comm protocol or the IEC104 protocol, then whether the Modbus protocol, the S7comm protocol, and the IEC104 protocol are included is set to [1,0,0 ].

Step 201, analyzing the data packet related to the IP address, and respectively matching a Modbus attack mode, an S7comm attack mode and an IEC104 attack mode of the data packet according to the function code and related fields obtained by the analysis;

specifically, a data packet related to the IP address is analyzed, a Modbus attack mode of the data packet is matched according to the function code and the related field obtained through analysis, an S7comm attack mode of the data packet is matched according to the function code and the related field obtained through analysis, and an IEC104 attack mode of the data packet is matched according to the function code and the related field obtained through analysis.

The Modbus attack mode is set to be [0.0,0.0,0.0,0.0], the Modbus attack mode of the data packet is matched according to the function code and the relevant field, if the Modbus attack mode is matched with the function code, the corresponding mode is added by one, and finally the Modbus attack mode is divided by the total amount of the data packet corresponding to the Modbus attack mode and then the data packet is put into the corresponding list position.

In the embodiment of the invention, attack modes of Modbus are divided into four types: if function code 90 is used and all or part of the substation information is scanned, it is classified as a first type; if the function codes 17 and 43 scan the 0 substation and the 255 substation, the function codes are classified into a second class; if the function code 03 reads the holding register or the function code 04 reads the input register, the register is classified into a third class; others fall into the fourth category.

And setting the S7comm attack mode as 0.0,0.0 and 0.0, matching the S7comm attack mode of the data packet according to the function code and the related field, if the S7comm attack mode is matched with the related field, adding one to the corresponding mode, and finally dividing the sum of the data packets corresponding to the S7comm attack mode and putting the sum into a corresponding list position.

The attack patterns of S7comm are divided into three categories: if only ISO _ TP connection and S7 communication are established with the honeypot, the honeypot is classified as a first type; if the honeypot information is requested by using the 0x00 function code, classifying the honeypot information into a second class; other cases fall into the third category.

And the IEC104 attack mode is set to be [0.0,0.0,0.0,0.0], the IEC104 attack mode of the data packet is matched according to the function code and the related field, if the IEC104 attack mode is matched with the related field, the corresponding mode is added by one, and finally the IEC104 attack mode is divided by the total amount of the data packet corresponding to the IEC104 attack mode and then is placed in the corresponding list position.

The IEC104 attack patterns are divided into four classes, and if the test connection is a test connection, the transmission starting and the total calling are activated, namely the test connection is matched with a 5c2353540a field or a 474554202f20485454502f312e310d0a557365722d4167656e743a204d6f7a696c6c612f35 field, the test connection is classified into a first class; if the data packet matches the 0d0a0d0a field, the data packet is classified as a second type; if 680443000000 field, 680407000000 field or 680e0000000064010600ffff00000000 field is matched, classifying as a third class; if the CONNECTION status is indicated, the CONNECTION is matched to NEW _ CONNECTION, CONNECTION _ TERMINATED or CONNECTION _ LOST, and the CONNECTION is classified into the fourth category.

Step 202, the total amount of data packets of which the source IP is the IP address is calculated, that is, the total amount of data packets from the IP is calculated.

Step 203, calculating the ratio of the data packets subjected to Modbus attack, S7comm attack and IEC104 attack in the total data packets respectively;

specifically, the frequency of attacking different protocols is set to [0.0,0.0,0.0], which means the ratio of the total packet value of Modbus, S7comm and IEC104 attack packets.

Step 204, acquiring the total number of different honeypot IP addresses in the target IP, and taking the total number of the different honeypot IP addresses in the target IP as the total number of the attack area;

step 205, calculating an attack time difference value between the first attack data packet and the last attack data packet;

the attack time interval is the time difference between the first related data packet and the last related data packet, if the time difference is concentrated in one day, the time interval is set to be 1, otherwise, the time interval is set to be the specific days of the time difference.

And step 206, calculating the time difference value between the first relevant data packet time and the last relevant data packet in the data packets subjected to the Modbus attack, calculating the time difference value between the first relevant data packet time and the last relevant data packet in the data packets subjected to the S7comm attack, and calculating the time difference value between the first relevant data packet time and the last relevant data packet in the data packets subjected to the IEC104 attack.

Specifically, the attack protocol time interval is set to [0,0,0], the time difference value between the first relevant data packet time and the last relevant data packet in the data packets of the Modbus attack is calculated, and if the time difference value is concentrated on one day, the time interval is set to be 1. Similarly, the time interval of S7comm and IEC104 is calculated.

It should be noted that, the steps 200 to 206 have no chronological sequence relationship, that is, the embodiment of the present invention does not limit the chronological relationship between the steps 200 to 206.

In an embodiment, the performing dimensionality reduction, normalization, and reconstruction processing on the original feature of the IP address of the unknown attack source to obtain the IP feature of the IP address of the unknown attack source specifically includes:

carrying out dimensionality reduction on the original characteristics of the IP address of the unknown attack source through a Principal Component Analysis (PCA) algorithm to obtain four-dimensional data, and carrying out normalization processing to obtain processed characteristics;

reconstructing the processed characteristics by using the IP address of the unknown attack source to obtain the IP characteristics corresponding to the IP address of the unknown attack source

Based on the content of the foregoing embodiments, before extracting, for an IP address of an unknown attack source, original features of the IP address of the unknown attack source based on log information and traffic data packets, which are captured by an industrial control honeypot and are related to the IP address of the unknown attack source, the method further includes:

extracting the original characteristics of the IP address of the known attack source based on the massive original attack logs and flow data packets captured by the industrial control honeypot;

carrying out dimensionality reduction, normalization and reconstruction processing on the original characteristics of the IP address of the known attack source to obtain the characteristics of the IP address of the known attack source;

and taking the characteristics of the IP address of each known attack source as a training sample, generating a training data set, and establishing a class label for each training sample according to the attack source to which each training sample belongs.

Specifically, before identifying an attack source corresponding to an IP address of an unknown attack source, a training sample set with a class label needs to be constructed.

Extracting the original characteristics of the IP address of the known attack source by adopting the same method as the embodiment, and performing dimensionality reduction, normalization and reconstruction processing on the original characteristics of the IP address of the known attack source by adopting the same method as the embodiment to obtain the characteristics of the IP address of the known attack source.

And taking the characteristics of the IP address of each known attack source as a training sample, constructing a training data set, and establishing a class label for each training sample according to the attack source to which each training sample belongs.

The category label includes: shodan, umich, Nagravision SA, reverse, plcscan, amazonaws, Alibaba, adsl, neu, linode, Unicom, Telecom, DataService, 360, and others.

As shown in fig. 3, a schematic structural diagram of an apparatus for determining an attack data source of an industrial honeypot provided in an embodiment of the present invention includes: a feature extraction module 301, a feature processing module 302, and a category determination module 303, wherein,

the feature extraction module 301 is configured to, for an IP address of an unknown attack source, extract an original feature of the IP address of the unknown attack source based on log information and a traffic data packet, which are captured by an industrial control honeypot and are related to the IP address of the unknown attack source;

Specifically, for an IP address of an unknown attack source, the feature extraction module 301 searches log information and a traffic data packet related to the IP address of the unknown attack source from data captured by the industrial control honeypot, and extracts an original feature of the IP address of the unknown attack source according to the log information and the traffic data packet.

the feature extraction module 301 is specifically configured to:

judging whether the IP address attacks a Modbus protocol, an S7comm protocol or an IEC104 protocol, and if so, setting the value of the index corresponding to the protocol to be 1;

analyzing the data packet related to the IP address, and respectively matching a Modbus attack mode, an S7comm attack mode and an IEC104 attack mode of the data packet according to the function code obtained by analysis and the related field;

calculating the total amount of data packets of which the source IP is the IP address;

calculating the ratio of the data packets subjected to Modbus attack, S7comm attack and IEC104 attack in the total data packets respectively;

acquiring the total number of different honeypot IP addresses in a target IP;

calculating an attack time difference value between the first attack data packet and the last attack data packet;

and calculating the time difference value between the first relevant data packet time and the last relevant data packet in the data packets subjected to the Modbus attack, calculating the time difference value between the first relevant data packet time and the last relevant data packet in the data packets subjected to the S7comm attack, and calculating the time difference value between the first relevant data packet time and the last relevant data packet in the data packets subjected to the IEC104 attack.

A feature processing module 302, configured to perform dimension reduction, normalization, and reconstruction processing on the original feature of the IP address of the unknown attack source to obtain an IP feature of the IP address of the unknown attack source;

in order to reduce the memory usage and speed up the classification learning, the feature processing module 302 performs dimension reduction, normalization and reconstruction processing on the extracted original features of the IP address of the unknown attack source.

The feature processing module 302 performs dimension reduction on the IP address of the unknown attack source by using the existing dimension reduction method to obtain a feature after dimension reduction. And then, carrying out normalization processing on the reduced-dimension features, namely carrying out linear transformation on the reduced-dimension features, and mapping the reduced-dimension features into a value range of 0-1. And reconstructing the normalized features, and taking the reconstructed features as the IP features corresponding to the IP address of the unknown attack source.

The class determination module 303 is configured to calculate a distance between an IP feature of the IP address of the unknown attack source and each training sample in a pre-constructed training data set by using a KNN classification algorithm, select three training samples with the closest distance as nearest samples of the IP address of the unknown attack source, and obtain an attack source corresponding to the IP address of the unknown attack source according to a main attack source corresponding to the nearest samples;

Specifically, the category determination module 303 determines the attack source corresponding to the IP address of the unknown attack source by using the KNN classification algorithm in combination with the IP features corresponding to the IP addresses of the known attack sources, and can effectively identify the attack source corresponding to the IP address of the unknown attack source.

Firstly, the category determination module 303 calculates the distance between the IP feature of the IP address of the unknown attack source and each training sample in the pre-constructed training data set, selects three training samples with the closest distance as the nearest samples of the IP address of the unknown attack source, and obtains the attack source corresponding to the IP address of the unknown attack source according to the main attack source corresponding to the nearest samples.

The calculation formula of the Euclidean distance is as follows:

the formula for calculating the manhattan distance is as follows:

according to the device for judging the source of the industrial control honeypot attack data, which is provided by the embodiment of the invention, the attack source of the IP address of the unknown attack source can be effectively judged by extracting the IP characteristics corresponding to the IP address of the unknown attack source and adopting a KNN classification algorithm according to the IP characteristics.

Fig. 4 is a schematic entity structure diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 4, the electronic device may include: a processor (processor)410, a communication Interface 420, a memory (memory)430 and a communication bus 440, wherein the processor 410, the communication Interface 420 and the memory 430 are communicated with each other via the communication bus 440. The processor 410 may call a computer program stored in the memory 430 and operable on the processor 410 to execute the method for determining the source of industrial honeypot attack data provided by the above embodiments of the method, for example, including: for an IP address of an unknown attack source, extracting original characteristics of the IP address of the unknown attack source based on log information and flow data packets which are captured by an industrial control honeypot and are related to the IP address of the unknown attack source; carrying out dimensionality reduction, normalization and reconstruction processing on the original characteristics of the IP address of the unknown attack source to obtain the IP characteristics of the IP address of the unknown attack source; calculating the distance between the IP characteristics of the IP address of the unknown attack source and each training sample in a pre-constructed training data set by utilizing a KNN classification algorithm, selecting three training samples with the closest distance as the nearest samples of the IP address of the unknown attack source, and obtaining the attack source corresponding to the IP address of the unknown attack source according to the main attack source corresponding to the nearest samples; wherein each training sample in the pre-constructed training dataset is an IP feature of an IP address of a known attack source.

In addition, the logic instructions in the memory 430 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solutions of the embodiments of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

An embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a method for determining an attack data source of an industrial honeypot, which is provided in the foregoing method embodiments, and includes: for an IP address of an unknown attack source, extracting original characteristics of the IP address of the unknown attack source based on log information and flow data packets which are captured by an industrial control honeypot and are related to the IP address of the unknown attack source; carrying out dimensionality reduction, normalization and reconstruction processing on the original characteristics of the IP address of the unknown attack source to obtain the IP characteristics of the IP address of the unknown attack source; calculating the distance between the IP characteristics of the IP address of the unknown attack source and each training sample in a pre-constructed training data set by utilizing a KNN classification algorithm, selecting three training samples with the closest distance as the nearest samples of the IP address of the unknown attack source, and obtaining the attack source corresponding to the IP address of the unknown attack source according to the main attack source corresponding to the nearest samples; wherein each training sample in the pre-constructed training dataset is an IP feature of an IP address of a known attack source.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method for judging the source of industrial control honeypot attack data is characterized by comprising the following steps:

wherein each training sample in the pre-constructed training data set is an IP characteristic of an IP address of a known attack source;

wherein the original characteristics of the IP address comprise: the IP address, whether the Modbus protocol, the S7comm protocol and the IEC104 protocol are included, a Modbus attack mode, an S7comm attack mode, an IEC104 attack mode, the total amount of data packets from the IP, the frequency of attacking different protocols, the total amount of attack areas, attack time intervals and attack protocol time intervals;

the step of extracting the original characteristics of the IP address of the unknown attack source based on log information and traffic data packets, which are captured by an industrial control honeypot and are related to the IP address of the unknown attack source, specifically comprises the following steps:

acquiring the total number of different honeypot IP addresses in a target IP;

2. The method according to claim 1, wherein the original feature of the IP address of the unknown attack source is subjected to dimensionality reduction, normalization, and reconstruction processing to obtain the IP feature of the IP address of the unknown attack source, and specifically includes:

and reconstructing the processed characteristics by using the IP address of the unknown attack source to obtain the IP characteristics corresponding to the IP address of the unknown attack source.

3. The method according to claim 1, wherein before extracting the original characteristics of the IP address of the unknown attack source based on log information and traffic data packets captured by the industrial honeypot and related to the IP address of the unknown attack source, for the IP address of the unknown attack source, the method further comprises:

4. The utility model provides a discriminating gear of industrial control honeypot attack data source which characterized in that includes:

wherein, the original characteristics of the IP address comprise the following information: the IP address, whether the Modbus protocol, the S7comm protocol and the IEC104 protocol are included, a Modbus attack mode, an S7comm attack mode, an IEC104 attack mode, the total amount of data packets from the IP, the frequency of attacking different protocols, the total amount of attack areas, attack time intervals and attack protocol time intervals;

wherein the feature extraction module is specifically configured to:

acquiring the total number of different honeypot IP addresses in a target IP;

5. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method for determining the source of industrial honeypot attack data according to any one of claims 1 to 3 when executing the program.

6. A non-transitory computer readable storage medium, having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the steps of the method for determining the source of industrial honeypot attack data according to any one of claims 1 to 3.