CN115952455A - Sample data generation method, model training method and equipment fingerprint classification method - Google Patents

Sample data generation method, model training method and equipment fingerprint classification method Download PDF

Info

Publication number
CN115952455A
CN115952455A CN202211363399.0A CN202211363399A CN115952455A CN 115952455 A CN115952455 A CN 115952455A CN 202211363399 A CN202211363399 A CN 202211363399A CN 115952455 A CN115952455 A CN 115952455A
Authority
CN
China
Prior art keywords
message
current
equipment
intelligent substation
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211363399.0A
Other languages
Chinese (zh)
Inventor
周亮
蔺子卿
马子玉
朱亚运
张晓娟
李俊娥
王海翔
李昭晗
曹靖怡
姜琳
刘林彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
Wuhan University WHU
China Electric Power Research Institute Co Ltd CEPRI
State Grid Shanghai Electric Power Co Ltd
Original Assignee
State Grid Corp of China SGCC
Wuhan University WHU
China Electric Power Research Institute Co Ltd CEPRI
State Grid Shanghai Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, Wuhan University WHU, China Electric Power Research Institute Co Ltd CEPRI, State Grid Shanghai Electric Power Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN202211363399.0A priority Critical patent/CN115952455A/en
Publication of CN115952455A publication Critical patent/CN115952455A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A sample data generation method, a model training method, an equipment fingerprint classification method and a device are provided, wherein the sample data generation method comprises the following steps: capturing an intelligent substation message; analyzing the intelligent substation message, and acquiring a device type label and updating a device label table by using the analyzed message field and combining with the intelligent substation structure information stored in the SCD file; and obtaining a device type tag value and a message characteristic field based on the updated device tag table, and packaging the device type tag value and the message characteristic field into a PTD data set to be used as sample data of the fingerprint classification model of the intelligent substation measurement and control device. By training the model by using the sample data generated by the method provided by the embodiment of the invention, the characteristic extraction effect and the generalization of the model can be improved, and the model can be used for identifying and classifying the intelligent substation measurement and control equipment.

Description

Sample data generation method, model training method and equipment fingerprint classification method
Technical Field
The invention relates to the technical field of intelligent power grid information security, in particular to a sample data generation method, a model training method, an equipment fingerprint classification method and an equipment fingerprint classification device.
Background
With the development of intelligent power grid informatization, the network safety risk of the electric power engineering control system is increasingly prominent. The electric power engineering control system is an important component of the intelligent power grid, is a physical foundation for providing services, and is a main target of network attack. In recent years, power equipment or grid faults are frequently caused by network attacks, and a power engineering control system needs stronger safety measures to improve the safety of the system. The transformer substation automation system is used as an important component of an electric power engineering control system, realizes automatic monitoring, measurement, control and coordination of a transformer substation through a communication network and an intelligent terminal, has information interaction with other systems, and plays a vital role in an electric power system.
A device fingerprint refers to a device characteristic or unique device identification that can be used to uniquely identify the device. At present, the device fingerprint is widely applied to a PC end and a mobile terminal as one of identification and authentication modes of the terminal device. The existing device fingerprint acquisition modes mainly include active device fingerprint acquisition and passive device fingerprint acquisition, wherein the active device fingerprint acquisition uses a search frame to acquire device information, and the passive device fingerprint acquisition realizes device identification by capturing network traffic and extracting features. However, active device fingerprint acquisition needs to send a search frame for a device, deployment in an electric power industrial control system is complex, and certain requirements are imposed on bandwidth, which easily causes excessive consumption of network resources of the electric power industrial control device; in addition, although there are many scanning tools with active fingerprint acquisition capabilities, such as Zoomeye combining device fingerprints and Web application fingerprints, nmap based on TCP/IP protocol stack fingerprints, these recognition engines and tools are not suitable for the field of industrial power control; the passive fingerprint acquisition uses flow data to generate fingerprint characteristics, analysis needs to be carried out on flow characteristic fields, current research mainly focuses on identification of wireless equipment, research on electric power industrial control network equipment connected with a wired network is deficient, and a passive equipment fingerprint acquisition method for intelligent substation measurement and control equipment needs to be provided.
Disclosure of Invention
In view of this, the invention provides a sample data generation method, a model training method, an equipment fingerprint classification method and an apparatus, and aims to solve the problems that in the prior art, an equipment fingerprint acquisition mode is complex in deployment, high in bandwidth requirement, easy to cause excessive consumption of network resources of power industrial control equipment, and the like, and fill up the blank of a passive equipment fingerprint acquisition technology.
In a first aspect, an embodiment of the present invention provides a sample data generation method, where the method includes: capturing an intelligent substation message; analyzing the intelligent substation message, and acquiring a device type label and updating a device label table by using the analyzed message field and combining with the intelligent substation structure information stored in the SCD file; and obtaining an equipment type label value and a message characteristic field based on the updated equipment label table, and packaging the message characteristic field and the equipment type label value into a PTD data set to be used as sample data of the fingerprint classification model of the intelligent substation measurement and control equipment.
Further, before analyzing the intelligent substation message, the method includes: and checking the content of the intelligent substation message.
Further, the analyzing the intelligent substation message, and using the analyzed message field in combination with the intelligent substation structure information stored in the SCD file, acquiring the device type label and updating the device label table, including: analyzing the intelligent substation message to obtain an MAC address and an APPID sequence number; searching the APPID serial number obtained by current analysis in the SCD file to obtain the equipment type information corresponding to the current message; and judging whether to update the current equipment label table according to whether the equipment type information corresponding to the current message exists in the current equipment label table and the MAC address obtained by current analysis.
Further, the method further comprises: initializing a database table, and creating a device tag table for storing tag information and device information.
Further, determining whether to update the current device tag table according to whether the device type information corresponding to the current packet exists in the current device tag table and the MAC address obtained by current parsing, includes: if the device type information corresponding to the current message and the MAC address obtained by current analysis exist in the current device label table, the current device label table does not need to be updated; if the device type information corresponding to the current message exists in the current device label table but the MAC address obtained by current analysis does not exist, the MAC address obtained by current analysis is stored in an MAC address byte string field in the current device label table so as to update the current device label table; if the current equipment label table does not have the equipment type information corresponding to the current message and is not empty, adding a new row in the current equipment label table, adding an MAC address obtained by current analysis in an MAC address byte string field of the new row, setting an equipment type character string field of the new row as the equipment type information corresponding to the current message and setting an equipment type label as the maximum value of the current equipment type label +1; if the current equipment label table is empty, adding a new row in the current equipment label table, adding a currently analyzed MAC address in an MAC address byte string field of the new row, setting an equipment type character string field of the new row as equipment type information corresponding to the current message and setting an equipment type label as 0.
Further, the obtaining of the device type tag value and the message feature field based on the updated device tag table includes: newly building a PTD file, and completing the filling of the head of the newly built PTD file based on the updated equipment tag table; and obtaining an equipment type label value by using the MAC address obtained by analysis in the updated equipment label table, and extracting a message characteristic field from the updated equipment label table.
In a second aspect, an embodiment of the present invention further provides a model training method, where the method includes: dividing sample data obtained by adopting the method provided by each embodiment into a training set and a test set according to a preset proportion; training an initial neural network model by using the training set to generate an intelligent substation measurement and control equipment fingerprint classification model; and using the test set to perform performance test on the intelligent substation measurement and control equipment fingerprint classification model obtained by current training: and if the detection accuracy of the performance test reaches the target accuracy, storing the fingerprint classification model of the intelligent substation measurement and control equipment obtained by current training, otherwise, adjusting the model parameters, and returning to the step of model training.
Further, the initial neural network model is built in advance based on a TensorFlow framework.
Further, the initial neural network model includes: the device comprises an input layer, a first convolution layer, a first pooling layer, a second convolution layer, a second pooling layer, a full-connection layer and an output layer.
Further, the first convolution layer and the second convolution layer both adopt the following formula to calculate a convolution result:
Figure BDA0003922842290000041
wherein x is i,j The method comprises the following steps of inputting elements in a convolution region M in X, wherein omega is an element in a convolution kernel, M is the size of the convolution kernel, b is offset, and f (·) is a ReLU activation function;
and the first convolution layer and the second convolution layer adopt the following formula to calculate the zero-expansion size:
Figure BDA0003922842290000042
PT=(O-1)×step+m-I;
Figure BDA0003922842290000043
PR=PB=PT-PL;
wherein, I is the input image size, O is the output image size, m is the convolution kernel size, step is the step size, PL is the image left side extension size, PR is the image right side extension size, PU is the image top extension size, PB is the image bottom extension size, ceil (·) is an upward rounding function, floor (·) is a downward rounding function.
Further, the output layer outputs the classification result through a Softmax function.
In a third aspect, an embodiment of the present invention further provides a device fingerprint classification method, where the method includes: capturing flow data; after the flow data are processed and packaged, inputting the fingerprint classification model of the intelligent substation measurement and control equipment obtained by training by adopting the method provided by each embodiment for identification and classification; and outputting a recognition classification result.
In a fourth aspect, an embodiment of the present invention further provides a sample data generating apparatus, where the apparatus includes: the message capturing unit is used for capturing the intelligent substation message; the first processing unit is used for analyzing the intelligent substation message, and acquiring a device type label and updating a device label table by using the analyzed message field and combining the intelligent substation structure information stored in the SCD file; and the second processing unit is used for obtaining an equipment type label value and a message characteristic field based on the updated equipment label table, and packaging the message characteristic field and the equipment type label value into a PTD (packet transport description) data set to be used as sample data of the fingerprint classification model of the intelligent substation measurement and control equipment.
Further, before analyzing the intelligent substation message, the method includes: and checking the content of the intelligent substation message.
Further, the first processing unit is further configured to: analyzing the intelligent substation message to obtain an MAC address and an APPID sequence number; searching the APPID serial number obtained by current analysis in the SCD file to obtain the equipment type information corresponding to the current message; and judging whether to update the current equipment label table according to whether the equipment type information corresponding to the current message exists in the current equipment label table and the MAC address obtained by current analysis.
Further, the apparatus further includes a device tag table creating unit, configured to initialize the database table, and create a device tag table for storing tag information and device information.
Further, determining whether to update the current device tag table according to whether the device type information corresponding to the current packet exists in the current device tag table and the MAC address obtained by current parsing, includes: if the current equipment label table has the equipment type information corresponding to the current message and the MAC address obtained by current analysis, the current equipment label table does not need to be updated; if the device type information corresponding to the current message exists in the current device label table but the MAC address obtained by current analysis does not exist, the MAC address obtained by current analysis is stored in an MAC address byte string field in the current device label table so as to update the current device label table; if the current equipment label table does not have the equipment type information corresponding to the current message and is not empty, adding a new row in the current equipment label table, adding an MAC address obtained by current analysis in an MAC address byte string field of the new row, setting an equipment type character string field of the new row as the equipment type information corresponding to the current message and setting an equipment type label as the maximum value of the current equipment type label +1; if the current equipment label table is empty, adding a new row in the current equipment label table, adding a currently analyzed MAC address in an MAC address byte string field of the new row, setting an equipment type character string field of the new row as equipment type information corresponding to the current message and setting an equipment type label as 0.
Further, the obtaining of the device type tag value and the message feature field based on the updated device tag table includes: newly building a PTD file, and completing the filling of the head of the newly built PTD file based on the updated equipment tag table; and acquiring a device type label value by using the MAC address obtained by analysis in the updated device label table, and extracting a message characteristic field from the updated device label table.
In a fifth aspect, an embodiment of the present invention further provides a model training apparatus, where the apparatus includes: the data dividing unit is used for dividing the sample data obtained by the method provided by each embodiment into a training set and a test set according to a preset proportion; the model training unit is used for training an initial neural network model by using the training set to generate a fingerprint classification model of the intelligent substation measurement and control equipment; the model testing unit is used for testing the performance of the intelligent substation measurement and control equipment fingerprint classification model obtained by current training by using the test set: and if the detection accuracy of the performance test reaches the target accuracy, storing the fingerprint classification model of the intelligent substation measurement and control equipment obtained by current training, otherwise, adjusting the model parameters, and returning to the step of model training.
Further, the initial neural network model is built in advance based on a TensorFlow framework.
Further, the initial neural network model includes: the device comprises an input layer, a first convolution layer, a first pooling layer, a second convolution layer, a second pooling layer, a full-connection layer and an output layer.
Further, the first convolution layer and the second convolution layer both adopt the following formula to calculate a convolution result:
Figure BDA0003922842290000061
wherein x is i,j The method comprises the following steps of inputting elements in a convolution region M in X, wherein omega is an element in a convolution kernel, M is the size of the convolution kernel, b is offset, and f (·) is a ReLU activation function;
and the first convolution layer and the second convolution layer adopt the following formula to calculate the zero-expansion size:
Figure BDA0003922842290000062
PT=(O-1)×step+m-I;
Figure BDA0003922842290000071
PR=PB=PT-PL;
wherein, I is the input image size, O is the output image size, m is the convolution kernel size, step is the step size, PL is the image left side extension size, PR is the image right side extension size, PU is the image top extension size, PB is the image bottom extension size, ceil (·) is an upward rounding function, floor (·) is a downward rounding function.
Further, the output layer outputs the classification result through a Softmax function.
In a sixth aspect, an embodiment of the present invention further provides an apparatus for classifying a device fingerprint, where the apparatus includes: the flow grabbing unit is used for grabbing flow data; the identification and classification unit is used for processing and packaging the flow data, and inputting the flow data into an intelligent substation measurement and control equipment fingerprint classification model obtained by training by adopting the method provided by each embodiment to perform identification and classification; and the result output unit is used for outputting the identification and classification result.
In a seventh aspect, an embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the method for generating sample data or the method for training a model provided in the foregoing embodiments or the method for classifying a device fingerprint provided in the foregoing embodiments is implemented.
In an eighth aspect, an embodiment of the present invention further provides an electronic device, including: a processor; a memory for storing the processor-executable instructions; the processor is configured to read the executable instruction from the memory, and execute the instruction to implement the sample data generation method provided in each of the above embodiments, or implement the model training method provided in each of the above embodiments, or implement the device fingerprint classification method provided in each of the above embodiments.
According to the sample data generation method and device provided by the embodiment of the invention, the original intelligent substation message is analyzed, information is searched and matched, information is extracted and integrated, and the original intelligent substation message is encapsulated again, so that the sample data of the fingerprint classification model of the intelligent substation measurement and control equipment is generated, reliable sample data is provided for subsequent model training, and the problem of sample loss in the model training is solved. By training the model by using the sample data generated by the method provided by the embodiment of the invention, the feature extraction effect of the model can be improved, the generalization of the model is improved, and the model can be used for identifying and classifying the intelligent substation measurement and control equipment.
According to the model training method, the equipment fingerprint classification method and the equipment fingerprint classification device provided by the embodiment of the invention, the initial neural network model is constructed, the training and testing sets are respectively adopted to train and test the model, and the final intelligent substation measurement and control equipment fingerprint classification model is obtained, so that the problems that the equipment fingerprint acquisition mode in the prior art is complex in deployment, high in bandwidth requirement, easy to cause excessive consumption of network resources of electric industrial control equipment and the like are solved, the blank of a passive equipment fingerprint acquisition technology is filled, the intelligent substation measurement and control equipment fingerprint classification model provided by the embodiment of the invention can be adopted to realize identification and classification of the intelligent substation measurement and control equipment, the work of identity authentication, safe access, asset management and the like is facilitated, and the safety of an electric power engineering control system is improved.
Drawings
FIG. 1 illustrates an exemplary flow diagram of a sample data generation method according to an embodiment of the present invention;
fig. 2 shows a process diagram of an intelligent substation flow grabbing method according to an embodiment of the present invention;
FIG. 3 illustrates a structural schematic of a PTD data set according to one embodiment of the present invention;
FIG. 4 illustrates an exemplary flow diagram of a model training method according to an embodiment of the invention;
FIG. 5 is a schematic diagram of an architecture of a neural network model for performing intelligent substation measurement and control device fingerprint classification according to an embodiment of the present invention;
fig. 6a shows a data comparison diagram of accuracy and F1 score of a fingerprint classification CNN model of an intelligent substation measurement and control device and commonly used RNN models and LSTM models under a GOOSE protocol, and fig. 6b shows a data comparison diagram of accuracy and F1 score of a fingerprint classification CNN model of an intelligent substation measurement and control device and commonly used RNN models and LSTM models under an SV protocol, according to an embodiment of the present invention;
FIG. 7 illustrates an exemplary flow diagram of a device fingerprint classification method according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram illustrating a sample data generating apparatus according to an embodiment of the present invention;
FIG. 9 is a schematic diagram of a model training apparatus according to an embodiment of the present invention;
fig. 10 shows a schematic structural diagram of a device fingerprint classification apparatus according to an embodiment of the invention.
Detailed Description
Example embodiments of the present invention will now be described with reference to the accompanying drawings, however, the invention may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein, which are provided for a complete and complete disclosure of the invention and to fully convey the scope of the invention to those skilled in the art. The terminology used in the exemplary embodiments illustrated in the accompanying drawings is not intended to be limiting of the invention. In the drawings, the same unit/element is denoted by the same reference numeral.
Unless otherwise defined, terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Further, it will be understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense.
Fig. 1 illustrates an exemplary flowchart of a sample data generation method according to an embodiment of the present invention.
As shown in fig. 1, the method includes:
step S101: and capturing the intelligent substation message.
Fig. 2 is a process diagram illustrating an intelligent substation flow capture method according to an embodiment of the present invention. As shown in fig. 2, the monitoring computer is connected to the process layer switch by using the photoelectric converter, and then the capturing of the flow of the GOOSE network or the SV network in the process layer can be completed by the monitoring computer, so as to obtain the flow of the GOOSE protocol or the SV protocol in the intelligent substation, and the captured flow data is stored as a Pcapng file for subsequent flow preprocessing.
Step S102: and analyzing the intelligent substation message, and acquiring a device type label and updating a device label table by using the analyzed message field and combining the intelligent substation structure information stored in the SCD file.
Further, before analyzing the intelligent substation message, the method includes:
and checking the content of the intelligent substation message.
And checking the content accuracy of the GOOSE protocol message or the SV protocol message, deleting the malformed message, and storing the checked GOOSE protocol message or SV protocol message as a Pcap file.
Further, step S102 includes:
analyzing the intelligent substation message to obtain an MAC address and an APPID sequence number;
searching the APPID serial number obtained by current analysis in the SCD file to obtain the equipment type information corresponding to the current message;
and judging whether to update the current equipment label table according to whether the equipment type information corresponding to the current message exists in the current equipment label table and the MAC address obtained by current analysis.
The response content of the specified field in the message can be obtained by analyzing the GOOSE protocol message or the SV protocol message, and specifically, the fields concerned in the embodiment of the present invention are the source MAC address field and the APPID field of the message, and the source MAC address of the message and the APPID sequence number of the message obtained by analyzing the message of the intelligent substation. The method comprises the steps of opening an SCD file by using an SCD visualization tool, acquiring device type information corresponding to a message by using an APPID field as an intermediate medium, and acquiring a device topological graph corresponding to the current message in the SCD visualization tool by searching an APPID serial number obtained by current analysis so as to further acquire the device type information corresponding to the current message.
Further, the method further comprises:
initializing a database table, and creating a device tag table for storing tag information and device information.
The device label table comprises a device MAC address, device type information and label data matched with the device, the created device label table comprises fields including a device MAC address byte string, a device type character string and an unsigned integer type label corresponding to the device type, and the device type label is used as a main key of the device label table. In the embodiment of the invention, because the equipment under the GOOSE protocol and the equipment under the SV protocol respectively perform fingerprint identification, the equipment label tables with the same structure are respectively established for the GOOSE protocol and the SV protocol.
Further, after the message of the intelligent substation is analyzed to obtain the MAC address and the APPID number, the method includes:
and searching the MAC address in the MAC address byte string field of the current equipment label table, if the MAC address exists, moving to the next message, and returning to the message analysis step.
Further, judging whether to update the current device tag table according to whether the device type information corresponding to the current message exists in the current device tag table and the MAC address obtained by current analysis, includes:
if the equipment type information corresponding to the current message and the MAC address obtained by current analysis exist in the current equipment label table, the current equipment label table does not need to be updated;
if the current equipment label table has equipment type information corresponding to the current message but does not have the MAC address obtained by current analysis, storing the MAC address obtained by current analysis into an MAC address byte string field in the current equipment label table so as to update the current equipment label table;
if the current equipment label table does not have the equipment type information corresponding to the current message and is not empty, adding a new row in the current equipment label table, adding an MAC address obtained by current analysis in an MAC address byte string field of the new row, setting an equipment type character string field of the new row as the equipment type information corresponding to the current message and setting an equipment type label as the maximum value of the current equipment type label plus 1;
if the current equipment label table is empty, a new row is added in the current equipment label table, the MAC address obtained by current analysis is added in the MAC address byte string field of the new row, the equipment type character string field of the new row is set as the equipment type information corresponding to the current message, and the equipment type label is set as 0.
Step S103: and obtaining an equipment type label value and a message characteristic field based on the updated equipment label table, and packaging the message characteristic field and the equipment type label value into a PTD data set to be used as sample data of the fingerprint classification model of the intelligent substation measurement and control equipment.
Further, obtaining a device type tag value and a message feature field based on the updated device tag table includes:
newly building a PTD file, and completing the filling of the head of the newly built PTD file based on the updated equipment tag table;
and obtaining the equipment type label value by using the MAC address obtained by analysis in the updated equipment label table, and extracting the message characteristic field from the updated equipment label table.
Fig. 3 shows a schematic structural diagram of a PTD data set according to an embodiment of the invention. As shown in FIG. 3, the PTD data set is encapsulated in a format in which the message data is encapsulated into the data portion of the PTD file and the side information is encapsulated into the header of the PTD file. The auxiliary information comprises the number of label types, the number of message pictures, the length of the message pictures and the width of the message pictures; the number of the label types corresponds to the maximum value of the device type matching labels in the device label table, the number of the message pictures is the number of the messages in the current file, and the length and the width of the message pictures are both set to be 28.
For the GOOSE protocol, the message characteristic field comprises a destination MAC address, a source MAC address, an APPID field, a GoCBRef field, a DataSet field, a GoID field and an AllData field in an APDU field; for the SV protocol, the message feature field includes the destination MAC address, the source MAC address, the APPID field, and the ASDU string in the APDU field. By searching the message source MAC address in the updated device tag table, the device tag value corresponding to the MAC address can be obtained. Unifying the message characteristic field data into 28 multiplied by 28 size, if the message characteristic field data is shorter, filling the byte 0 multiplied by 00 in the message characteristic field data, otherwise, intercepting the message characteristic field and packaging the message characteristic field and the equipment label value into the PTD data set.
According to the embodiment, the original intelligent substation message is analyzed, matched with the information search, integrated with the information extraction and repackaged, sample data of the intelligent substation measurement and control equipment fingerprint classification model is generated, reliable sample data is provided for subsequent model training, and the problem of sample loss in the model training is solved. By training the model by using the sample data generated by the method provided by the embodiment of the invention, the feature extraction effect of the model can be improved, the generalization of the model is improved, and the model can be used for identifying and classifying the intelligent substation measurement and control equipment.
FIG. 4 illustrates an exemplary flow diagram of a model training method according to an embodiment of the present invention.
As shown in fig. 4, the method includes:
step S401: according to a preset proportion, sample data obtained by adopting the method provided by each embodiment is divided into a training set and a test set.
A total of 4 files of the PTD training data set and the testing data set under the GOOSE protocol and the SV protocol can be generated.
Step S402: and training the initial neural network model by using a training set to generate a fingerprint classification model of the intelligent substation measurement and control equipment.
Because the byte value can be in one-to-one correspondence with the gray value of a certain pixel of the gray image, the embodiment of the invention can be regarded as classifying the gray image of the message data, wherein the size of the gray image is 28 multiplied by 28, the neural network is used for obtaining the characteristics of the gray image of the message, the classification of the gray image is realized, namely the classification of the message data of the intelligent transformer substation is realized, and thus the fingerprint identification of the intelligent transformer substation measurement and control equipment is completed.
And inputting PTD training data sets corresponding to the GOOSE protocol and the SV protocol into an initial neural network model for training and learning respectively to generate fingerprint classification models of intelligent substation measurement and control equipment with two protocols.
Further, the initial neural network model is built in advance based on a TensorFlow framework.
Further, the initial neural network model includes: the device comprises an input layer, a first convolution layer, a first pooling layer, a second convolution layer, a second pooling layer, a full-connection layer and an output layer.
Further, the first convolution layer and the second convolution layer both adopt the following formula to calculate the convolution result:
Figure BDA0003922842290000131
wherein x is i,j The method comprises the following steps of inputting elements in a convolution region M in X, wherein omega is an element in a convolution kernel, M is the size of the convolution kernel, b is offset, and f (·) is a ReLU activation function;
and the first convolution layer and the second convolution layer are both calculated by adopting the following formula to obtain the zero-expansion size:
Figure BDA0003922842290000132
PT=(O-1)×step+m-I;
Figure BDA0003922842290000133
PR=PB=PT-PL;
wherein, I is the input image size, O is the output image size, m is the convolution kernel size, step is the step size, PL is the image left side extension size, PR is the image right side extension size, PU is the image top extension size, PB is the image bottom extension size, ceil (·) is an upward rounding function, floor (·) is a downward rounding function.
Further, the output layer outputs the classification result through a Softmax function.
Table 1 parameter configuration of neural network model for fingerprint split of intelligent substation measurement and control equipment
Figure BDA0003922842290000134
Figure BDA0003922842290000141
Fig. 5 and table 1 show an architecture diagram and parameter configuration of a neural network model for performing fingerprint classification of measurement and control equipment of an intelligent substation according to an embodiment of the present invention. As shown in fig. 5 and table 1, the model includes seven layers of input layers, as follows:
first layer (input layer): the input data is a 28 x 28 message gray scale image and a device type label value matched with the message.
Second layer (first buildup layer): the first convolution layer is used for extracting the gray image characteristics of the message, the convolution kernel size is set to be 5 multiplied by 5, and the convolution result is calculated by using the following formula:
Figure BDA0003922842290000142
where X represents the element in the convolution region M in the input X, ω represents the element in the convolution kernel, M represents the size of the convolution kernel, b represents the offset, f (-) represents the ReLU activation function, as shown in the following equation:
Figure BDA0003922842290000143
in order to ensure the stability of the input and output dimensions of the convolutional layer, the zero expansion operation is performed on the input, and the zero expansion size can be calculated by the following formula:
Figure BDA0003922842290000144
PT=(O-1)×step+m-I;
Figure BDA0003922842290000145
PR=PB=PT-PL;
wherein, I is the input image size, O is the output image size, m is the convolution kernel size, step is the step size, the image left side extension PL, right side extension PR, top extension PU, bottom extension PB, ceil (·) represents an upward rounding function, floor (·) represents a downward rounding function. It can be calculated that in the actual training of the model of the present invention, the second layer convolutional layer input size will be expanded to 32 × 32, and the fourth layer convolutional layer input size will be expanded to 18 × 18.
Third layer (first pooling layer): the first pooling layer simplifies the network complexity, compresses the network size, and reduces the training parameters, wherein a maximum pooling method is used, the maximum value in a pooling area is reserved, and the size of a pooling window is set to be 2 multiplied by 2;
fourth layer (second convolution layer): similar to the second layer, feature extraction is performed again on the input data;
fifth layer (second pooling layer): similarly to the third layer, the input data is down sampled again, reducing the training parameters.
Sixth layer (full connection layer): compressing the characteristic information into a one-dimensional vector form through a full connection layer.
Seventh layer (output layer): and the output layer is fully connected with the sixth layer again, and the fingerprint classification result of the device is finally output through a Softmax function, wherein the formula of the Softmax function is as follows:
Figure BDA0003922842290000151
and calculating the probability when the input is classified into the category j, wherein N represents the number of the types of the equipment which need to be classified in the invention, and the maximum value in the output result vector is the classification result.
Step S403: and (3) carrying out performance test on the intelligent substation measurement and control equipment fingerprint classification model obtained by current training by using a test set: and if the detection accuracy of the performance test reaches the target accuracy, storing the fingerprint classification model of the intelligent substation measurement and control equipment obtained by current training, otherwise, adjusting the model parameters, and returning to the step of model training.
Fig. 6a shows a data comparison graph of accuracy and F1 score of a fingerprint classification CNN model of an intelligent substation measurement and control device and commonly used RNN models and LSTM models under the GOOSE protocol according to an embodiment of the present invention. Fig. 6b shows a data comparison diagram of accuracy and F1 score of a fingerprint classification CNN model of the intelligent substation measurement and control device and commonly used RNN models and LSTM models under the SV protocol according to an embodiment of the present invention. Wherein, accuracy represents the Accuracy, namely the proportion of the sample with correct classification result in the total sample; f1 represents the F1 score, i.e., the harmonic mean of the precision and recall. As shown in fig. 6a and 6b, for GOOSE protocol devices and SV protocol devices, compared with the RNN model and the LSTM model which are also commonly used in the field of traffic classification, the fingerprint classification CNN model of the intelligent substation measurement and control device based on the traffic characteristics according to the embodiment of the present invention obtains higher classification accuracy and F1 score, that is, obtains better classification effect.
According to the embodiment of the invention, the initial neural network model is constructed, the training and testing sets are respectively adopted to train and test the model, and the final intelligent substation measurement and control equipment fingerprint classification model is obtained, so that the problems of complex deployment, high bandwidth requirement, excessive consumption of network resources of electric industrial control equipment and the like in an equipment fingerprint acquisition mode in the prior art are solved, the blank of a passive equipment fingerprint acquisition technology is filled, the intelligent substation measurement and control equipment fingerprint classification model provided by the embodiment of the invention can be used for realizing identification and classification of the intelligent substation measurement and control equipment, the work of identity authentication, safe access, asset management and the like is facilitated, and the safety of an electric power engineering control system is improved.
Fig. 7 illustrates an exemplary flowchart of a device fingerprint classification method according to an embodiment of the present invention.
As shown in fig. 7, the method includes:
step S701: capturing flow data;
step S702: after the flow data are processed and packaged, inputting the fingerprint classification model of the intelligent substation measuring and controlling equipment obtained by training by adopting the method provided by each embodiment to perform identification and classification;
step S703: and outputting a recognition classification result.
The flow data captured in the experimental environment is used as a sample to be detected, the flow data is processed and packaged through a Python program, the message data is input into a corresponding classification model to be detected, and the given label data is searched in the equipment type and the matching label table to obtain the equipment type information corresponding to the label.
According to the embodiment, the captured flow data are input into the fingerprint classification model of the intelligent substation measurement and control equipment for identification and classification, the problems that in the prior art, the equipment fingerprint acquisition mode is complex in deployment, high in bandwidth requirement, excessive consumption of network resources of the electric power industrial control equipment is easily caused and the like are solved, the blank of a passive equipment fingerprint acquisition technology is filled, identification and classification of the intelligent substation measurement and control equipment can be realized, the work of identity authentication, safe access, asset management and the like can be further developed, and the safety of the electric power industrial control system is improved.
Fig. 8 is a schematic structural diagram illustrating a sample data generation apparatus according to an embodiment of the present invention.
As shown in fig. 8, the apparatus includes:
and the message capturing unit 801 is used for capturing the intelligent substation message.
Fig. 2 is a process diagram illustrating an intelligent substation flow capture method according to an embodiment of the present invention. As shown in fig. 2, the monitoring computer is connected to the process layer switch by using the photoelectric converter, and then the capturing of the flow of the GOOSE network or the SV network in the process layer can be completed by the monitoring computer, so as to obtain the flow of the GOOSE protocol or the SV protocol in the intelligent substation, and the captured flow data is stored as a Pcapng file for subsequent flow preprocessing.
The first processing unit 802 is configured to parse the intelligent substation message, and use the parsed message field in combination with the intelligent substation structure information stored in the SCD file to obtain an equipment type tag and update an equipment tag table.
Further, before analyzing the intelligent substation message, the method includes:
and checking the content of the intelligent substation message.
And checking the content accuracy of the GOOSE protocol message or the SV protocol message, deleting the malformed message, and storing the checked GOOSE protocol message or SV protocol message as a Pcap file.
Further, the first processing unit 802 is further configured to:
analyzing the intelligent substation message to obtain an MAC address and an APPID sequence number;
searching the APPID serial number obtained by current analysis in the SCD file to obtain the equipment type information corresponding to the current message;
and judging whether to update the current equipment label table according to whether the equipment type information corresponding to the current message exists in the current equipment label table and the MAC address obtained by current analysis.
The response content of the specified field in the message can be obtained by analyzing the GOOSE protocol message or the SV protocol message, and specifically, the fields concerned in the embodiment of the present invention are the source MAC address field and the APPID field of the message, and the source MAC address of the message and the APPID sequence number of the message obtained by analyzing the message of the intelligent substation. The method comprises the steps of opening an SCD file by using an SCD visualization tool, obtaining device type information corresponding to a message by using an APPID field as an intermediate medium, and obtaining a device topological graph corresponding to the current message in the SCD visualization tool by searching an APPID serial number obtained by current analysis so as to obtain the device type information corresponding to the current message.
Further, the apparatus further comprises:
and the device tag table creating unit is used for initializing the database table and creating a device tag table for storing tag information and device information.
The device label table comprises a device MAC address, device type information and label data matched with the device, the created device label table comprises fields including a device MAC address byte string, a device type character string and an unsigned integer type label corresponding to the device type, and the device type label is used as a main key of the device label table. In the embodiment of the invention, because the equipment under the GOOSE protocol and the equipment under the SV protocol respectively perform fingerprint identification, equipment label tables with the same structure are respectively established for the GOOSE protocol and the SV protocol.
Further, after the message of the intelligent substation is analyzed to obtain the MAC address and the APPID number, the method includes:
and searching the MAC address in the MAC address byte string field of the current equipment label table, if the MAC address exists, moving to the next message, and returning to the message analysis step.
Further, judging whether to update the current device tag table according to whether the device type information corresponding to the current message and the MAC address obtained by the current analysis exist in the current device tag table, includes:
if the equipment type information corresponding to the current message and the MAC address obtained by current analysis exist in the current equipment label table, the current equipment label table does not need to be updated;
if the current equipment label table has equipment type information corresponding to the current message but does not have the MAC address obtained by current analysis, storing the MAC address obtained by current analysis into an MAC address byte string field in the current equipment label table so as to update the current equipment label table;
if the current equipment label table does not have the equipment type information corresponding to the current message and is not empty, adding a new row in the current equipment label table, adding an MAC address obtained by current analysis in an MAC address byte string field of the new row, setting an equipment type character string field of the new row as the equipment type information corresponding to the current message and setting an equipment type label as the maximum value of the current equipment type label plus 1;
if the current equipment label table is empty, adding a new row in the current equipment label table, adding a MAC address obtained by current analysis in the MAC address byte string field of the new row, setting the equipment type character string field of the new row as the equipment type information corresponding to the current message and setting the equipment type label as 0.
The second processing unit 803 is configured to obtain an equipment type tag value and a message feature field based on the updated equipment tag table, and encapsulate the message feature field and the equipment type tag value into a PTD data set to be used as sample data of the fingerprint classification model of the measurement and control equipment of the intelligent substation.
Further, obtaining a device type tag value and a message characteristic field based on the updated device tag table includes:
newly building a PTD file, and completing filling in the head of the newly built PTD file based on the updated equipment tag table;
and obtaining the equipment type label value by using the MAC address obtained by analysis in the updated equipment label table, and extracting the message characteristic field from the updated equipment label table.
Fig. 3 shows a schematic structural diagram of a PTD data set according to an embodiment of the invention. As shown in FIG. 3, the PTD data set is encapsulated in a format in which the message data is encapsulated into the data portion of the PTD file and the side information is encapsulated into the header of the PTD file. The auxiliary information comprises the number of label types, the number of message pictures, the length of the message pictures and the width of the message pictures; the number of the label types corresponds to the maximum value of the device type matching labels in the device label table, which is +1, the number of the message pictures is the number of the messages in the current file, and the length and the width of the message pictures are both set to be 28.
For the GOOSE protocol, the message characteristic field comprises a destination MAC address, a source MAC address, an APPID field, a GoCBRef field, a DataSet field, a GoID field and an AllData field in an APDU field; for the SV protocol, the message feature field includes the destination MAC address, the source MAC address, the APPID field, and the ASDU string in the APDU field. By searching the message source MAC address in the updated device tag table, the device tag value corresponding to the MAC address can be obtained. Unifying the message characteristic field data into 28 x 28 size, if the message characteristic field data is shorter, filling the byte 0x00 with the message characteristic field data, otherwise, intercepting the message characteristic field, and packaging the message characteristic field and the device label value into a PTD data set.
According to the embodiment, the original intelligent substation message is analyzed, matched with the information search, integrated with the information extraction and repackaged, sample data of the intelligent substation measurement and control equipment fingerprint classification model is generated, reliable sample data is provided for subsequent model training, and the problem of sample loss in the model training is solved. By training the model by using the sample data generated by the method provided by the embodiment of the invention, the feature extraction effect of the model can be improved, the generalization of the model is improved, and the model can be used for identifying and classifying the intelligent substation measurement and control equipment.
Fig. 9 is a schematic structural diagram of a model training apparatus according to an embodiment of the present invention.
As shown in fig. 9, the apparatus includes:
the data dividing unit 901 is configured to divide the sample data obtained by using the methods provided in the foregoing embodiments into a training set and a test set according to a preset ratio.
A total of 4 files of the PTD training data set and the testing data set under the GOOSE protocol and the SV protocol can be generated.
And the model training unit 902 is configured to train the initial neural network model by using a training set, and generate a fingerprint classification model of the intelligent substation measurement and control equipment.
Because the byte value can be in one-to-one correspondence with the gray value of a certain pixel of the gray image, the embodiment of the invention can be regarded as classifying the gray image of the message data, wherein the size of the gray image is 28 multiplied by 28, the neural network is used for obtaining the characteristics of the gray image of the message, the classification of the gray image is realized, namely the classification of the message data of the intelligent transformer substation is realized, and thus the fingerprint identification of the intelligent transformer substation measurement and control equipment is completed.
And inputting PTD training data sets corresponding to the GOOSE protocol and the SV protocol into an initial neural network model for training and learning respectively to generate fingerprint classification models of intelligent substation measurement and control equipment with two protocols.
Further, the initial neural network model is built in advance based on a TensorFlow framework.
Further, the initial neural network model includes: the device comprises an input layer, a first convolution layer, a first pooling layer, a second convolution layer, a second pooling layer, a full-link layer and an output layer.
Further, the first convolution layer and the second convolution layer both adopt the following formula to calculate the convolution result:
Figure BDA0003922842290000211
wherein x is i,j The method comprises the following steps of inputting elements in a convolution region M in X, wherein omega is an element in a convolution kernel, M is the size of the convolution kernel, b is offset, and f (·) is a ReLU activation function;
and the first convolution layer and the second convolution layer are both calculated by adopting the following formula to obtain the zero-expansion size:
Figure BDA0003922842290000212
PT=(O-1)×step+m-I;
Figure BDA0003922842290000213
PR=PB=PT-PL;
wherein, I is the input image size, O is the output image size, m is the convolution kernel size, step is the step size, PL is the image left side extension size, PR is the image right side extension size, PU is the image top extension size, PB is the image bottom extension size, ceil (·) is an upward rounding function, floor (·) is a downward rounding function.
Further, the output layer outputs the classification result through a Sofimax function.
Table 2 parameter configuration of neural network model for fingerprint split of intelligent substation measurement and control device
Figure BDA0003922842290000214
Fig. 5 and table 2 show an architecture diagram and parameter configuration of a neural network model for performing fingerprint classification of an intelligent substation measurement and control device according to an embodiment of the present invention. As shown in fig. 5 and table 1, the model includes seven layers of input layers, as follows:
first layer (input layer): the input data is a 28 x 28 message gray scale image and a device type label value matched with the message.
Second layer (first buildup layer): the first convolution layer is used for extracting the gray image features of the message, the convolution kernel size is set to be 5 multiplied by 5, and the convolution result is calculated by using the following formula:
Figure BDA0003922842290000221
where X represents the element in the convolution region M in the input X, ω represents the element in the convolution kernel, M represents the size of the convolution kernel, b represents the offset, f (-) represents the ReLU activation function, as shown in the following equation:
Figure BDA0003922842290000222
in order to ensure the stability of the input and output dimensions of the convolutional layer, the zero-extension operation is performed on the input, and the zero-extension size can be calculated by the following formula:
Figure BDA0003922842290000223
PT=(O-1)×step+m-I;
Figure BDA0003922842290000224
PR=PB=PT-PL;
wherein, I is the input image size, O is the output image size, m is the convolution kernel size, step is the step size, the image left side extension PL, right side extension PR, top extension PU, bottom extension PB, ceil (·) represents an upward rounding function, floor (·) represents a downward rounding function. It can be calculated that in the actual training of the model of the present invention, the second layer convolutional layer input size will be expanded to 32 × 32, and the fourth layer convolutional layer input size will be expanded to 18 × 18.
Third layer (first pooling layer): the first pooling layer simplifies the network complexity, compresses the network size, and reduces the training parameters, wherein a maximum pooling method is used, the maximum value in a pooling area is reserved, and the size of a pooling window is set to be 2 multiplied by 2;
fourth layer (second convolution layer): similar to the second layer, feature extraction is performed again on the input data;
fifth layer (second pooling layer): similarly to the third layer, the input data is down sampled again, reducing the training parameters.
Sixth layer (fully connected layer): compressing the characteristic information into a one-dimensional vector form through a full connection layer.
Seventh layer (output layer): and the output layer is fully connected with the sixth layer again, and the fingerprint classification result of the device is finally output through a Softmax function, wherein the formula of the Softmax function is as follows:
Figure BDA0003922842290000231
and calculating the probability when the input is classified into the category j, wherein N represents the number of the types of the equipment which need to be classified in the invention, and the maximum value in the output result vector is the classification result.
The model testing unit 903 is used for performing performance testing on the intelligent substation measurement and control equipment fingerprint classification model obtained by current training by using a test set: and if the detection accuracy of the performance test reaches the target accuracy, storing the fingerprint classification model of the intelligent substation measurement and control equipment obtained by current training, otherwise, adjusting the model parameters, and returning to the step of model training.
Fig. 6a shows a data comparison diagram of accuracy and F1 score of a fingerprint classification CNN model of an intelligent substation measurement and control device and a commonly used RNN model and LSTM model under the GOOSE protocol according to an embodiment of the present invention. Fig. 6b shows a data comparison diagram of accuracy and F1 score of a fingerprint classification CNN model of the intelligent substation measurement and control device and commonly used RNN models and LSTM models under the SV protocol according to an embodiment of the present invention. Wherein, accuracy represents the Accuracy, namely the proportion of the sample with correct classification result in the total sample; f1 represents the F1 score, i.e., the harmonic mean of the precision and recall. As shown in fig. 6a and 6b, for GOOSE protocol devices and SV protocol devices, the fingerprint classification CNN model of the intelligent substation measurement and control device based on the flow characteristics according to the embodiment of the present invention obtains higher classification accuracy and F1 score, i.e., obtains better classification effect, compared with the RNN model and the LSTM model that are also commonly used in the flow classification field.
According to the embodiment, the initial neural network model is constructed, the training set and the test set are respectively adopted to train and test the model, and the final intelligent substation measurement and control equipment fingerprint classification model is obtained, so that the problems that in the prior art, the equipment fingerprint acquisition mode is complex in deployment, high in bandwidth requirement, easy to cause excessive consumption of network resources of electric industrial control equipment and the like are solved, the blank of a passive equipment fingerprint acquisition technology is filled, the intelligent substation measurement and control equipment fingerprint classification model provided by the embodiment of the invention can be used for realizing identification and classification of the intelligent substation measurement and control equipment, is beneficial to further carrying out the work of identity authentication, safe access, asset management and the like, and the safety of an electric power engineering control system is improved.
Fig. 10 is a schematic structural diagram of a device fingerprint classification apparatus according to an embodiment of the present invention.
As shown in fig. 10, the apparatus includes:
a traffic grasping unit 1001 for grasping traffic data;
the identification and classification unit 1002 is configured to input the traffic data after being processed and encapsulated into a fingerprint classification model of the intelligent substation measurement and control device obtained by training according to the method provided by each embodiment to perform identification and classification;
a result output unit 1003 for outputting the recognition and classification result.
The flow data captured in the experimental environment is used as a sample to be detected, the flow data is processed and packaged through a Python program, the message data is input into a corresponding classification model to be detected, and the given label data is searched in the equipment type and the matching label table to obtain the equipment type information corresponding to the label.
According to the embodiment, the captured flow data are input into the fingerprint classification model of the measurement and control equipment of the intelligent substation for identification and classification, the problems that in the prior art, the equipment fingerprint acquisition mode is complex in deployment and high in bandwidth requirement, excessive consumption of network resources of the power industrial control equipment is easily caused, and the like are solved, the blank of a passive equipment fingerprint acquisition technology is filled, identification and classification of the measurement and control equipment of the intelligent substation can be realized, the work of identity authentication, safe access, asset management and the like can be further developed, and the safety of the power industrial control system is improved.
An embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the method for generating sample data provided in the foregoing embodiments is implemented, or the method for training a model provided in the foregoing embodiments is implemented, or the method for classifying a device fingerprint provided in the foregoing embodiments is implemented.
An embodiment of the present invention further provides an electronic device, including: a processor; a memory for storing processor-executable instructions; the processor is configured to read the executable instruction from the memory, and execute the instruction to implement the sample data generation method provided in each of the above embodiments, or to implement the model training method provided in each of the above embodiments, or to implement the device fingerprint classification method provided in each of the above embodiments.
The invention has been described with reference to a few embodiments. However, other embodiments of the invention than the ones disclosed above are equally possible within the scope of these appended patent claims, as these are known to those skilled in the art.
Generally, all terms used in the claims are to be interpreted according to their ordinary meaning in the technical field, unless explicitly defined otherwise herein. All references to "a/an/the [ device, component, etc ]" are to be interpreted openly as referring to at least one instance of said device, component, etc., unless explicitly stated otherwise. The steps of any method disclosed herein do not have to be performed in the exact order disclosed, unless explicitly stated.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting the same, and although the present invention is described in detail with reference to the above embodiments, those of ordinary skill in the art should understand that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.

Claims (19)

1. A sample data generating method, characterized in that the method comprises:
capturing an intelligent substation message;
analyzing the intelligent substation message, and acquiring a device type label and updating a device label table by using the analyzed message field and combining with the intelligent substation structure information stored in the SCD file;
and obtaining an equipment type label value and a message characteristic field based on the updated equipment label table, and packaging the message characteristic field and the equipment type label value into a PTD data set to be used as sample data of the fingerprint classification model of the intelligent substation measuring and controlling equipment.
2. The method of claim 1, wherein before parsing the smart substation message, comprising:
and checking the content of the intelligent substation message.
3. The method of claim 1, wherein parsing the intelligent substation message and using the parsed message field in combination with intelligent substation structure information stored in an SCD file to obtain a device type tag and update a device tag table comprises:
analyzing the intelligent substation message to obtain an MAC address and an APPID sequence number;
searching the APPID serial number obtained by current analysis in the SCD file to obtain the equipment type information corresponding to the current message;
and judging whether to update the current equipment label table according to whether the equipment type information corresponding to the current message exists in the current equipment label table and the MAC address obtained by current analysis.
4. The method of claim 1, further comprising:
initializing a database table, and creating a device tag table for storing tag information and device information.
5. The method of claim 3, wherein determining whether to update the current device tag table according to whether the device type information corresponding to the current packet exists in the current device tag table and the MAC address obtained by current parsing includes:
if the device type information corresponding to the current message and the MAC address obtained by current analysis exist in the current device label table, the current device label table does not need to be updated;
if the device type information corresponding to the current message exists in the current device label table but the MAC address obtained by current analysis does not exist, the MAC address obtained by current analysis is stored in an MAC address byte string field in the current device label table so as to update the current device label table;
if the current equipment label table does not have the equipment type information corresponding to the current message and is not empty, adding a new row in the current equipment label table, adding an MAC address obtained by current analysis in an MAC address byte string field of the new row, setting an equipment type character string field of the new row as the equipment type information corresponding to the current message and setting an equipment type label as the maximum value of the current equipment type label +1;
if the current equipment label table is empty, adding a new row in the current equipment label table, adding a currently analyzed MAC address in an MAC address byte string field of the new row, setting an equipment type character string field of the new row as equipment type information corresponding to the current message and setting an equipment type label as 0.
6. The method of claim 1, wherein obtaining a device type tag value and a message feature field based on the updated device tag table comprises:
newly building a PTD file, and completing the filling of the head of the newly built PTD file based on the updated equipment tag table;
and acquiring a device type label value by using the MAC address obtained by analysis in the updated device label table, and extracting a message characteristic field from the updated device label table.
7. A method of model training, the method comprising:
dividing sample data obtained by the method of any one of claims 1 to 6 into a training set and a test set according to a preset proportion;
training the initial neural network model by using the training set to generate an intelligent substation measurement and control equipment fingerprint classification model;
and performing performance test on the intelligent substation measurement and control equipment fingerprint classification model obtained by current training by using the test set: and if the detection accuracy of the performance test reaches the target accuracy, storing the fingerprint classification model of the intelligent substation measurement and control equipment obtained by current training, otherwise, adjusting the model parameters, and returning to the step of model training.
8. The method according to claim 7, wherein the initial neural network model is built in advance based on a TensorFlow framework.
9. The method of claim 7, wherein the initial neural network model comprises: the device comprises an input layer, a first convolution layer, a first pooling layer, a second convolution layer, a second pooling layer, a full-link layer and an output layer.
10. The method of claim 9, wherein the first convolutional layer and the second convolutional layer each calculate a convolution result using the following formula:
Figure FDA0003922842280000031
wherein x is i,j The method comprises the following steps of inputting elements in a convolution region M in X, wherein omega is an element in a convolution kernel, M is the size of the convolution kernel, b is offset, and f (·) is a ReLU activation function;
and the first convolution layer and the second convolution layer adopt the following formula to calculate the zero-expansion size:
Figure FDA0003922842280000032
pT=(O-1)×step+m-I;
Figure FDA0003922842280000033
PR=PB=PT-PL;
wherein I is the size of an input image, O is the size of an output image, m is the size of a convolution kernel, step is the step size, PL is the left-side expansion size of the image, PR is the right-side expansion size of the image, PU is the top expansion size of the image, PB is the bottom expansion size of the image, ceil (·) is an rounding-up function, and floor (·) is a rounding-down function.
11. The method of claim 9, wherein the output layer outputs the classification result via a Softmax function.
12. A method for device fingerprint classification, the method comprising:
capturing flow data;
after the flow data are processed and packaged, inputting a fingerprint classification model of the intelligent substation measurement and control equipment obtained by training according to the method of any one of claims 7 to 11 for identification and classification;
and outputting a recognition classification result.
13. An apparatus for generating sample data, the apparatus comprising:
the message capturing unit is used for capturing the intelligent substation message;
the first processing unit is used for analyzing the intelligent substation message, and acquiring a device type label and updating a device label table by using the analyzed message field and combining the intelligent substation structure information stored in the SCD file;
and the second processing unit is used for obtaining an equipment type label value and a message characteristic field based on the updated equipment label table, and packaging the message characteristic field and the equipment type label value into a PTD (packet transport description) data set to be used as sample data of the fingerprint classification model of the intelligent substation measurement and control equipment.
14. The apparatus of claim 13, wherein the first processing unit is further configured to:
analyzing the intelligent substation message to obtain an MAC address and an APPID sequence number;
searching the APPID serial number obtained by current analysis in the SCD file to obtain the equipment type information corresponding to the current message;
and judging whether to update the current equipment label table according to whether the equipment type information corresponding to the current message exists in the current equipment label table and the MAC address obtained by current analysis.
15. The apparatus of claim 13, wherein obtaining a device type tag value and a packet feature field based on the updated device tag table comprises:
newly building a PTD file, and completing the filling of the head of the newly built PTD file based on the updated equipment tag table;
and acquiring a device type label value by using the MAC address obtained by analysis in the updated device label table, and extracting a message characteristic field from the updated device label table.
16. A model training apparatus, the apparatus comprising:
a data dividing unit, for dividing the sample data obtained by any method of claims 1-6 into a training set and a test set according to a preset proportion;
the model training unit is used for training an initial neural network model by using the training set to generate a fingerprint classification model of the intelligent substation measurement and control equipment;
the model testing unit is used for performing performance testing on the intelligent substation measurement and control equipment fingerprint classification model obtained by current training by using the test set: and if the detection accuracy of the performance test reaches the target accuracy, storing the fingerprint classification model of the intelligent substation measurement and control equipment obtained by current training, otherwise, adjusting the model parameters, and returning to the step of model training.
17. An apparatus for classifying device fingerprints, the apparatus comprising:
the flow grabbing unit is used for grabbing flow data;
the identification and classification unit is used for processing and packaging the flow data, and inputting the flow data into the intelligent substation measurement and control equipment fingerprint classification model obtained by training according to the method of any one of claims 7-11 for identification and classification;
and the result output unit is used for outputting the identification and classification result.
18. A computer readable storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the sample data generation method of any of claims 1-6 or implements the model training method of any of claims 7-11 or implements the device fingerprint classification method of claim 12.
19. An electronic device, comprising:
a processor;
a memory for storing the processor-executable instructions;
the processor is configured to read the executable instructions from the memory and execute the instructions to implement the sample data generation method according to any one of claims 1 to 6, or to implement the model training method according to any one of claims 7 to 11, or to implement the device fingerprint classification method according to claim 12.
CN202211363399.0A 2022-11-02 2022-11-02 Sample data generation method, model training method and equipment fingerprint classification method Pending CN115952455A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211363399.0A CN115952455A (en) 2022-11-02 2022-11-02 Sample data generation method, model training method and equipment fingerprint classification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211363399.0A CN115952455A (en) 2022-11-02 2022-11-02 Sample data generation method, model training method and equipment fingerprint classification method

Publications (1)

Publication Number Publication Date
CN115952455A true CN115952455A (en) 2023-04-11

Family

ID=87285152

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211363399.0A Pending CN115952455A (en) 2022-11-02 2022-11-02 Sample data generation method, model training method and equipment fingerprint classification method

Country Status (1)

Country Link
CN (1) CN115952455A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116471127A (en) * 2023-06-20 2023-07-21 中国电力科学研究院有限公司 Electric power Internet of things terminal access authentication method based on passive device fingerprint
CN116894011A (en) * 2023-07-17 2023-10-17 上海螣龙科技有限公司 Multi-dimensional intelligent fingerprint library and multi-dimensional intelligent fingerprint library design and query method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116471127A (en) * 2023-06-20 2023-07-21 中国电力科学研究院有限公司 Electric power Internet of things terminal access authentication method based on passive device fingerprint
CN116894011A (en) * 2023-07-17 2023-10-17 上海螣龙科技有限公司 Multi-dimensional intelligent fingerprint library and multi-dimensional intelligent fingerprint library design and query method

Similar Documents

Publication Publication Date Title
CN110909811B (en) OCSVM (online charging management system) -based power grid abnormal behavior detection and analysis method and system
CN115952455A (en) Sample data generation method, model training method and equipment fingerprint classification method
CN110380989B (en) Internet of things equipment identification method based on two-stage and multi-classification network traffic fingerprint features
CN103348325B (en) The loss of data monitoring of partial data stream
WO2020134867A1 (en) Method and device for detecting abnormal data of power terminal
Yang et al. iFinger: Intrusion detection in industrial control systems via register-based fingerprinting
CN111191767A (en) Vectorization-based malicious traffic attack type judgment method
CN113918526B (en) Log processing method, device, computer equipment and storage medium
CN116468392A (en) Method, device, equipment and storage medium for monitoring progress of power grid engineering project
CN111177794B (en) City image method, device, computer equipment and storage medium
CN106294219A (en) A kind of equipment identification, data processing method, Apparatus and system
CN111526099A (en) Internet of things application flow detection method based on deep learning
CN112367273A (en) Knowledge distillation-based flow classification method and device for deep neural network model
CN110009045A (en) The recognition methods of internet-of-things terminal and device
CN112202718A (en) XGboost algorithm-based operating system identification method, storage medium and device
CN116108202A (en) Mining system data attack behavior modeling method based on relational graph
CN116662184A (en) Industrial control protocol fuzzy test case screening method and system based on Bert
CN117640193A (en) Industrial control threat detection method based on application layer effective load extraction
CN111431872B (en) Two-stage Internet of things equipment identification method based on TCP/IP protocol characteristics
CN117707902A (en) Automatic log analysis method, system, electronic device and storage medium based on machine learning
CN116206093B (en) Electric meter data acquisition method and system based on bitmap and readable storage medium
CN117240522A (en) Vulnerability intelligent mining method based on attack event model
CN114268559B (en) Directional network detection method, device, equipment and medium based on TF-IDF algorithm
Peng et al. Research on abnormal detection technology of real-time interaction process in new energy network
CN108075918A (en) Internet service alteration detection method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination