CN113935426A - Method and device for detecting abnormal data traffic of power internet of things - Google Patents

Method and device for detecting abnormal data traffic of power internet of things Download PDF

Info

Publication number
CN113935426A
CN113935426A CN202111233244.0A CN202111233244A CN113935426A CN 113935426 A CN113935426 A CN 113935426A CN 202111233244 A CN202111233244 A CN 202111233244A CN 113935426 A CN113935426 A CN 113935426A
Authority
CN
China
Prior art keywords
data
detection model
things
anomaly detection
power internet
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111233244.0A
Other languages
Chinese (zh)
Inventor
谢可
赵峰
刘彩
李温静
张毅琦
陈智鹏
张楠
杨志鹏
郭文静
柯华强
王金发
陈婷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Information and Telecommunication Co Ltd
Original Assignee
State Grid Information and Telecommunication Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Information and Telecommunication Co Ltd filed Critical State Grid Information and Telecommunication Co Ltd
Priority to CN202111233244.0A priority Critical patent/CN113935426A/en
Publication of CN113935426A publication Critical patent/CN113935426A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • G06F18/2414Smoothing the distance, e.g. radial basis function networks [RBFN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition

Abstract

The embodiment of the invention discloses a method and a device for detecting abnormal data traffic of an electric power Internet of things, wherein the method comprises the following steps: acquiring data of the power Internet of things to be identified; inputting the to-be-identified electric power Internet of things data into a pre-trained first anomaly detection model to obtain an anomaly result of the to-be-identified electric power Internet of things data; the anomaly detection model is obtained by training a first anomaly detection model to be trained through electric power Internet of things data marked with time and anomaly results. When the first anomaly detection model is trained, the time dimension is added, the accuracy of anomaly detection is improved, automatic detection of abnormal flow is realized, and the detection efficiency is improved.

Description

Method and device for detecting abnormal data traffic of power internet of things
Technical Field
The invention relates to the field of data processing, in particular to a method and a device for detecting abnormal data traffic of an electric power internet of things.
Background
With the continuous development of advanced technologies such as artificial intelligence, 5G communication, big data analysis and the like, the deep fusion of the Internet +' technology and the smart power grid, the power internet of things becomes a research hotspot in the field of current electrical engineering.
The architecture of the power internet of things can be divided into: an application layer, a platform layer, a network layer and a perception layer. The sensing layer is composed of a plurality of agent modules, massive heterogeneous data can be obtained from basic equipment and an environment, key information is extracted through edge calculation, the obtained information is transmitted to the platform layer through the network layer to be analyzed and managed in a unified mode, and finally final service is provided by the application layer. The network layer is an important transmission channel for transmitting the massive heterogeneous data of the terminal to the cloud platform; attacks launched on the terminal may gradually penetrate to the cloud through the network layer and cause serious damage to the power internet of things, and in order to guarantee data security, a model for performing anomaly detection on data of the power internet of things is urgently needed.
Disclosure of Invention
In view of this, the embodiment of the invention discloses a method and a device for detecting data traffic abnormality of an electric power internet of things, which achieve the purpose of performing abnormality detection on the data of the electric power internet of things.
The embodiment of the invention discloses a method for detecting data traffic abnormity of an electric power Internet of things, which comprises the following steps:
acquiring data of the power Internet of things to be identified;
inputting the to-be-identified electric power Internet of things data into a pre-trained first anomaly detection model to obtain an anomaly result of the to-be-identified electric power Internet of things data; the anomaly detection model is obtained by training a first anomaly detection model to be trained through electric power Internet of things data marked with time and anomaly results.
Optionally, the training process of the first anomaly detection model includes:
acquiring a training sample set containing power Internet of things data, and labeling the training sample set to obtain a first data set; the first data set comprises power internet of things data with an abnormal result label;
dividing the data in the first data set into different first network flows according to preset data flow characteristics and timestamps to obtain a second data set;
performing dimensionality reduction on the data in the second data set through a preset automatic encoder, and extracting the characteristics of each network flow from the second data set;
inputting the characteristics and the label information of each network flow into a second anomaly detection model to be trained, and training the second anomaly detection model to be trained, wherein the label information at least comprises: an abnormal result; the automatic encoder and the second abnormality detection model constitute a first abnormality detection model.
Optionally, the labeling the training sample set includes:
constructing a rule set based on a preset expert knowledge base;
constructing a labeling function, wherein a first independent variable of the standard function is power internet of things data, a second independent variable of the standard function is the rule set, and a target variable is a labeling result;
and inputting the data in the training sample set into the standard function as a first independent variable to confirm the result of the target variable.
Optionally, the dividing the data in the first data set into different first network streams according to preset data stream characteristics and timestamps to obtain a second data set includes:
separating data in the first data set into a first type of data stream or a second type of data stream according to preset first data stream characteristics; the first data stream characteristics include at least: a source IP address, a destination IP address, a source port number, and a destination port number;
adding corresponding protocol numbers in the data packets of the first class data stream and the second class data stream respectively;
dividing data packets in the second target data stream according to the time stamp of the first data packet in the first type of data stream and the second type of data stream and a preset communication rule to obtain a first network stream; the data packets in the first-class data stream and the second-class data stream are arranged according to the sequence of the timestamps, and the preset communication rule is determined based on the timestamps and the communication time of the data packets.
Optionally, the method further includes:
dividing the training sample set into a plurality of data sets;
dividing a plurality of data sets into a plurality of groups of data sets; each group of data sets comprises a test set and a training set;
training a first anomaly detection model to be trained by adopting each group of training sets respectively to obtain a plurality of first anomaly detection models;
respectively testing the trained first anomaly detection models through the test set to obtain a plurality of test results;
calculating a generalization error of the model based on each trained first anomaly detection model;
an optimal first anomaly detection model is determined from the trained plurality of first anomaly detection models based on the generalized error of each trained first anomaly detection model.
The embodiment of the invention discloses a device for detecting data traffic abnormity of an electric power internet of things, which comprises:
the acquisition unit is used for acquiring the data of the power Internet of things to be identified;
the identification unit is used for inputting the to-be-identified electric power Internet of things data into a pre-trained first anomaly detection model to obtain an anomaly result of the to-be-identified electric power Internet of things data; the anomaly detection model is obtained by training a first anomaly detection model to be trained through electric power Internet of things data marked with time and anomaly results.
Optionally, the method further includes:
the system comprises a labeling unit, a data processing unit and a data processing unit, wherein the labeling unit is used for acquiring a training sample set containing power Internet of things data and labeling the training sample set to obtain a first data set; the first data set comprises power internet of things data with an abnormal result label;
the integration unit is used for dividing the data in the first data set into different first network flows according to preset data flow characteristics and timestamps to obtain a second data set;
the coding unit is used for performing dimension reduction processing on the data in the second data set through a preset automatic coder and extracting the characteristics of each network stream from the second data set;
a first training unit, configured to input features and label information of each network flow into a second anomaly detection model to be trained, and train the second anomaly detection model to be trained, where the label information at least includes: an abnormal result; the automatic encoder and the second abnormality detection model constitute a first abnormality detection model.
Optionally, the tagging unit includes:
the first construction subunit is used for constructing a rule set based on a preset expert knowledge base;
the second construction subunit is used for constructing an annotation function, wherein a first independent variable of the standard function is power internet of things data, a second independent variable is the rule set, and a target variable is an annotation result;
and the determining subunit is used for inputting the data in the training sample set into the standard function as a first independent variable and determining the result of the target variable.
Optionally, the integration unit includes:
the first dividing unit is used for dividing the data in the first data set into a first type data stream or a second type data stream according to preset first data stream characteristics; the first data stream characteristics include at least: a source IP address, a destination IP address, a source port number, and a destination port number;
an adding subunit, configured to add corresponding protocol numbers to the data packets of the first-class data stream and the second-class data stream, respectively;
the second dividing subunit is used for dividing the data packets in the second target data stream according to the time stamp of the first data packet in the first-class data stream and the second-class data stream and a preset communication rule to obtain a first network stream; the data packets in the first-class data stream and the second-class data stream are arranged according to the sequence of the timestamps, and the preset communication rule is determined based on the timestamps and the communication time of the data packets.
The invention discloses an electronic device, comprising:
a memory and a processor;
the memory is used for storing programs, and the processor executes the power internet of things data flow abnormity detection method when executing the programs in the memory
The embodiment of the invention discloses a method and a device for detecting abnormal data traffic of an electric power Internet of things, wherein the method comprises the following steps: acquiring data of the power Internet of things to be identified; inputting the to-be-identified electric power Internet of things data into a pre-trained first anomaly detection model to obtain an anomaly result of the to-be-identified electric power Internet of things data; the anomaly detection model is obtained by training a first anomaly detection model to be trained through electric power Internet of things data marked with time and anomaly results. When the first anomaly detection model is trained, the time dimension is added, the accuracy of anomaly detection is improved, automatic detection of abnormal flow is realized, and the detection efficiency is improved.
And the first anomaly detection model is composed of an encoder and a second anomaly detection model, the second anomaly detection model is a BilSTM network model, deep features can be extracted through the encoder, dimension reduction is carried out on the features, and the detection efficiency and the detection precision of the second anomaly detection model are improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 shows a schematic flow diagram of a method for detecting data traffic anomaly of an electric power internet of things according to an embodiment of the present invention;
FIG. 2 is a schematic diagram illustrating a training process of a first anomaly detection model according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram illustrating a device for detecting data traffic abnormality of the power internet of things according to an embodiment of the present invention;
fig. 4 shows a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, a schematic flow chart of a method for detecting data traffic anomaly of an electric power internet of things according to an embodiment of the present invention is shown, where in the embodiment, the method includes:
s101, acquiring data of the power Internet of things to be identified;
in this embodiment, the data of the power internet of things to be identified may be original data or may be obtained after being processed in advance. The preprocessing process can include: and normalizing the original data of the power internet of things and the like.
In one embodiment, the data of the power internet of things to be identified may be data obtained from a network layer of the power internet of things.
S102: inputting the to-be-identified electric power Internet of things data into a pre-trained first anomaly detection model to obtain an anomaly result of the to-be-identified electric power Internet of things data; the anomaly detection model is obtained by training a first anomaly detection model to be trained through electric power Internet of things data marked with time and anomaly results.
In this embodiment, the first anomaly detection model has the capability of identifying whether the power internet of things data is anomalous data, wherein the first anomaly detection model is obtained by training through training samples, the training samples are power internet of things data in different time periods, and the power internet of things data in the training samples are power internet of things data marked with time and anomaly results.
The first anomaly detection model can be any one convolutional neural network model, or a combination of multiple convolutional neural network models, or any one machine learning model, or a combination of a machine learning model and a convolutional neural network model.
Under one embodiment, the first anomaly detection model may include: the system comprises an automatic encoder and a second anomaly detection model, wherein the automatic encoder is used for carrying out dimension reduction processing on data and extracting deep features from the data, and the second anomaly detection model is used for identifying an anomaly result. For example, the number of coding units of the auto-encoder is 128, the compression factor is 8, the Sigmoid function is used as an activation function, and a loss function of the model is constructed by cross entropy. The second abnormality detection model includes: the BilSTM network model, wherein, the BilSTM (bidirectional Long-Short Memory network) network is composed of the combination of the forward LSTM (Long Short-Term Memory network) and the backward LSTM.
In this embodiment, referring to fig. 2, a schematic diagram of a training process of a first anomaly detection model provided in an embodiment of the present invention is shown, including:
s201: acquiring a training sample set containing power Internet of things data, and labeling the training sample set to obtain a first data set; the first data set comprises power internet of things data with an abnormal result label;
in this embodiment, the data of the power internet of things can be acquired in various ways, for example, the network traffic can be acquired from the network layer of the power internet of things, for example, the Netflow V5 data packet of the network traffic can be acquired from the network layer. The Netflow V5 data (Netflow V5 data is in a data flow format) packet may include various data information transmitted by the power communication network, and includes massive data acquired from end sensing nodes such as power system operation, user energy consumption, market transaction, and external environment, and massive data acquired from end sensing nodes such as smart meters, electric vehicles, and power transformation devices.
In this embodiment, the training samples may be labeled in various ways, optionally, the training samples may be labeled through manual experience, or the training samples may be labeled by using an expert knowledge base.
In one embodiment, the training samples are labeled with an expert knowledge base, for example, the training samples may be labeled with an expert knowledge base for normal traffic and an expert knowledge base for abnormal traffic of different types.
Specifically, the method comprises the following steps:
constructing a rule set based on a preset expert knowledge base;
constructing a labeling function, wherein a first independent variable of the standard function is power internet of things data, a second independent variable of the standard function is the rule set, and a target variable is a labeling result;
and inputting the data in the training sample set into the standard function as a first independent variable to confirm the result of the target variable.
For example, the standard function is constructed as shown in equation 1):
y=label(x,r);
the independent variable x is a set of input data packets, the independent variable r is a rule set established according to expert knowledge, and the target variable y is a classification labeling result of the data packets.
In this embodiment, the marking result of each data packet in the training sample set is written into the training sample set.
S202: dividing the data in the first data set into different first network flows according to preset data flow characteristics and timestamps to obtain a second data set;
in this embodiment, the data stream features are features that can distinguish data streams of different protocols, and optionally, the features of the data stream may include: a source IP address, a destination IP address, a source port number, and a destination port number, or may also include a protocol number.
Optionally, the process of dividing the data in the first data set into different network streams according to the preset data stream characteristics and the time stamps includes:
separating data in the first data set into a first type of data stream or a second type of data stream according to preset first data stream characteristics; the first data stream characteristics include at least: a source IP address, a destination IP address, a source port number, and a destination port number;
adding corresponding protocol numbers in the data packets of the first class data stream and the second class data stream respectively;
merging the data packets in the second target data stream according to the time stamp of the first data packet in the first type of data stream and the second type of data stream and a preset communication rule to obtain a first network stream; and the data packets in the first class data stream and the second class data stream are arranged according to the sequence of the time stamps.
In this embodiment, data streams having the same "source IP address, destination IP, source port number, and destination port number" all belong to TCP or all belong to UDP, but data streams having the same "source IP address, destination IP, source port number, and destination port number" are not the same because the timestamps are different, and therefore, each type of data stream may be further divided based on the timestamp of the packet in each type. The first type of data stream and the second type of data stream are a TCP data stream and a UDP data stream respectively.
In one case, packets are separated into TCP or UDP streams according to < source IP address srcadr, destination IP address dstaddr, source port number srcport, destination port number dstport > 4 characteristics, and each stream is saved as a pcap file, while the timestamp of the first packet combining the 4 characteristics with the stream is named as a file and saved sequentially.
In this embodiment, the preset Protocol number is a Protocol number of a TCP (Transmission Control Protocol) or a UDP (user datagram Protocol).
In this embodiment, the preset communication rule is determined based on the timestamp, and includes, for example: with the time stamp t of the first data packet0And as the start time of the data stream, if the two parties do not have data communication within a preset time period or one party actively disconnects, the stream is ended, and the first network stream is obtained by combination.
In addition, the network stream may be digitized and normalized.
S203: performing dimensionality reduction on the data in the second data set through a preset automatic encoder, and extracting the characteristics of each network flow from the second data set;
in this embodiment, the autoencoder has the ability to extract deep features from the network stream of the second data set and perform dimensionality reduction on the features.
Optionally, S203 includes:
coding the data in the second data set through a preset automatic coder, and extracting hidden features;
performing dimension-increasing processing on the hidden features to enable dimension-increasing results to be consistent with dimensions of data in the second data set;
and reconstructing the second data set according to the hidden characteristics to obtain a second network flow.
In this embodiment, the preset automatic encoder includes: 128 coding units, compressing the coefficient dimension 8, using Sigmoid function as the activation function.
Firstly, re-encoding the second data set by using an automatic encoder, excavating hidden information at a deeper layer in the data, and learning to extract a data feature h at the deeper layer, wherein the encoding process is shown in the following formula 2):
h=σe(W1Z+b1)
wherein σeTo activate a function, W1,b1Are the weights and offsets of the codes.
And secondly, decoding the encoded data, specifically, generating dimensions of the extracted hidden features, and keeping the dimensions of the dimension-raised result consistent with the dimensions of the data of the second data set.
S204: inputting the characteristics and the label information of each network flow into a second anomaly detection model to be trained, and training the second anomaly detection model to be trained, wherein the label information at least comprises: an abnormal result; the automatic encoder and the second abnormality detection model constitute a first abnormality detection model;
in this embodiment, the second anomaly detection model may be any convolutional neural network model or any machine learning model. Optionally, the BiLSTM network model may be a BiLSTM network model, and before the BiLSTM network model is trained, the structural parameters need to be preset, wherein the process of setting the structural parameters includes:
d1: the method is characterized in that Keras is used as a deep learning framework to build a six-layer BilSTM network prediction model Pnn1, wherein a BilSTM network is formed by combining a forward LSTM and a backward LSTM, the forward LSTM network can effectively learn the sequence characteristics of communication flow, and the backward LSTM network can effectively extract the reverse sequence characteristics of the communication flow.
D2: when the flow information passes through the LSTM, the flow information firstly passes through a forgetting gate, the information passing through the forgetting gate is added with a nonlinear characteristic, a Sigmoid function is used as an activation function sigma, and W is setfIs a weight, bfFor biasing, the forgetting threshold structure is shown in equation 3) below:
3)ft=σ(Wf·[ht-1,xt]+bf);
the information will flow through the 'input gate' after passing through the forgetting gate, the LSTM adds new information through the input gate structure, and set WiAs a weight matrix, biFor biasing, the input threshold structure is shown in equation 4) below:
4)it=σ(Wi·[ht-1,xt]+bi);
then LSTM processes the input information through "output gate" to determine the information to be output, and sets WoAs a weight matrix, boFor biasing, the output threshold structure is shown in equation 5) below:
5)ot=σ(Wo·[ht-1,xt]+bo);
wherein, the long-term dependence problem of RNN can be effectively solved through three gate structures;
d3: the data dimensionality after feature dimensionality reduction is carried out through an automatic encoder is 128, so that the number of input layer nodes of the BilSTM network is 128, the number of neurons contained in a hidden layer is 200, and an output layer contains 2 nodes (abnormal flow and normal flow respectively);
d4: setting Pnn1 classification category number, word vector dimension, and hidden layer vector dimension in forward and backward LSTM, wherein setting input dimension [16,8], classification category number is 2, network iteration number is 50, and initial weight and bias are set to 0.
Further, in order to improve the robustness of the second anomaly detection model, when the second anomaly detection model is trained, a plurality of training sets are used to train the second anomaly detection model respectively, wherein the plurality of training sets can be obtained by splitting the training sample set. Moreover, the test set may also be obtained from the training sample set, for example, the training sample set may be divided into 9 training sets and one test set.
Further, the method also comprises the following steps:
dividing the training sample set into a plurality of data sets;
dividing a plurality of data sets into a plurality of groups of data sets; each group of data sets comprises a test set and a training set;
training a first anomaly detection model to be trained by adopting each group of training sets respectively to obtain a plurality of first anomaly detection models;
respectively testing the trained first anomaly detection models through the test set to obtain a plurality of test results;
calculating a generalization error of the model based on each trained first anomaly detection model;
an optimal first anomaly detection model is determined from the trained plurality of first anomaly detection models based on the generalized error of each trained first anomaly detection model.
The process of training the first anomaly detection model through each training set is as shown in S201-S204, and is not described in detail in this embodiment.
In this embodiment, when the generalization error is calculated, the trained first anomaly detection model may be tested multiple times by using different test sets to obtain multiple generalization errors, and the average generalization error is calculated by using the multiple generalization errors.
The embodiment of the invention discloses a method for detecting abnormal data traffic of an electric power Internet of things, which comprises the following steps: acquiring data of the power Internet of things to be identified; inputting the to-be-identified electric power Internet of things data into a pre-trained first anomaly detection model to obtain an anomaly result of the to-be-identified electric power Internet of things data; the anomaly detection model is obtained by training a first anomaly detection model to be trained through electric power Internet of things data marked with time and anomaly results. When the first anomaly detection model is trained, the time dimension is added, the accuracy of anomaly detection is improved, automatic detection of abnormal flow is realized, and the detection efficiency is improved.
And the first anomaly detection model is composed of an encoder and a second anomaly detection model, the second anomaly detection model is a BilSTM network model, deep features can be extracted through the encoder, dimension reduction is carried out on the features, and the detection efficiency and the detection precision of the second anomaly detection model are improved.
Referring to fig. 3, a schematic structural diagram of a device for detecting data traffic abnormality of an electric power internet of things according to an embodiment of the present invention is shown, and in this embodiment, the device includes:
the obtaining unit 301 is configured to obtain data of an electric power internet of things to be identified;
the identification unit 302 is configured to input the to-be-identified electric power internet of things data into a pre-trained first anomaly detection model, so as to obtain an anomaly result of the to-be-identified electric power internet of things data; the anomaly detection model is obtained by training a first anomaly detection model to be trained through electric power Internet of things data marked with time and anomaly results.
Optionally, the method further includes:
the system comprises a labeling unit, a data processing unit and a data processing unit, wherein the labeling unit is used for acquiring a training sample set containing power Internet of things data and labeling the training sample set to obtain a first data set; the first data set comprises power internet of things data with an abnormal result label;
the integration unit is used for dividing the data in the first data set into different first network flows according to preset data flow characteristics and timestamps to obtain a second data set;
the coding unit is used for performing dimension reduction processing on the data in the second data set through a preset automatic coder and extracting the characteristics of each network stream from the second data set;
a first training unit, configured to input features and label information of each network flow into a second anomaly detection model to be trained, and train the second anomaly detection model to be trained, where the label information at least includes: an abnormal result; the automatic encoder and the second abnormality detection model constitute a first abnormality detection model.
Optionally, the tagging unit includes:
the first construction subunit is used for constructing a rule set based on a preset expert knowledge base;
the second construction subunit is used for constructing an annotation function, wherein a first independent variable of the standard function is power internet of things data, a second independent variable is the rule set, and a target variable is an annotation result;
and the determining subunit is used for inputting the data in the training sample set into the standard function as a first independent variable and determining the result of the target variable.
Optionally, the integration unit includes:
the first dividing unit is used for dividing the data in the first data set into a first type data stream or a second type data stream according to preset first data stream characteristics; the first data stream characteristics include at least: a source IP address, a destination IP address, a source port number, and a destination port number;
an adding subunit, configured to add corresponding protocol numbers to the data packets of the first-class data stream and the second-class data stream, respectively;
the second dividing subunit is used for dividing the data packets in the second target data stream according to the time stamp of the first data packet in the first-class data stream and the second-class data stream and a preset communication rule to obtain a first network stream; the data packets in the first-class data stream and the second-class data stream are arranged according to the sequence of the timestamps, and the preset communication rule is determined based on the timestamps and the communication time of the data packets.
Optionally, the method further includes:
a third dividing unit configured to intentionally divide the training sample set into a plurality of data sets;
a fourth dividing unit configured to divide the plurality of data sets into a plurality of groups of data sets; each group of data sets comprises a test set and a training set;
the second training unit is used for training the first anomaly detection models to be trained by adopting each group of training sets respectively to obtain a plurality of first anomaly detection models;
the test unit is used for testing the trained first anomaly detection models respectively through the test set to obtain a plurality of test results;
a calculation unit for calculating a generalization error of the model based on each trained first anomaly detection model;
a determining unit configured to determine an optimal first abnormality detection model from among the trained plurality of first abnormality detection models based on a generalization error of each of the trained first abnormality detection models.
The device of the embodiment acquires the data of the power Internet of things to be identified; inputting the to-be-identified electric power Internet of things data into a pre-trained first anomaly detection model to obtain an anomaly result of the to-be-identified electric power Internet of things data; the anomaly detection model is obtained by training a first anomaly detection model to be trained through electric power Internet of things data marked with time and anomaly results. When the first anomaly detection model is trained, the time dimension is added, the accuracy of anomaly detection is improved, automatic detection of abnormal flow is realized, and the detection efficiency is improved.
And the first anomaly detection model is composed of an encoder and a second anomaly detection model, the second anomaly detection model is a BilSTM network model, deep features can be extracted through the encoder, dimension reduction is carried out on the features, and the detection efficiency and the detection precision of the second anomaly detection model are improved.
Referring to fig. 4, a schematic structural diagram of an electronic device according to an embodiment of the present invention is shown, where in this embodiment, the electronic device includes:
a memory 401 and a processor 402;
the memory 401 is configured to store a program, and when the processor 402 executes the program in the memory 401, the method for detecting data traffic abnormality of the power internet of things is executed:
acquiring data of the power Internet of things to be identified;
inputting the to-be-identified electric power Internet of things data into a pre-trained first anomaly detection model to obtain an anomaly result of the to-be-identified electric power Internet of things data; the anomaly detection model is obtained by training a first anomaly detection model to be trained through electric power Internet of things data marked with time and anomaly results.
Optionally, the training process of the first anomaly detection model includes:
acquiring a training sample set containing power Internet of things data, and labeling the training sample set to obtain a first data set; the first data set comprises power internet of things data with an abnormal result label;
dividing the data in the first data set into different first network flows according to preset data flow characteristics and timestamps to obtain a second data set;
performing dimensionality reduction on the data in the second data set through a preset automatic encoder, and extracting the characteristics of each network flow from the second data set;
inputting the characteristics and the label information of each network flow into a second anomaly detection model to be trained, and training the second anomaly detection model to be trained, wherein the label information at least comprises: an abnormal result; the automatic encoder and the second abnormality detection model constitute a first abnormality detection model.
Optionally, the labeling the training sample set includes:
constructing a rule set based on a preset expert knowledge base;
constructing a labeling function, wherein a first independent variable of the standard function is power internet of things data, a second independent variable of the standard function is the rule set, and a target variable is a labeling result;
and inputting the data in the training sample set into the standard function as a first independent variable to confirm the result of the target variable.
Optionally, the dividing the data in the first data set into different first network streams according to preset data stream characteristics and timestamps to obtain a second data set includes:
separating data in the first data set into a first type of data stream or a second type of data stream according to preset first data stream characteristics; the first data stream characteristics include at least: a source IP address, a destination IP address, a source port number, and a destination port number;
adding corresponding protocol numbers in the data packets of the first class data stream and the second class data stream respectively;
dividing data packets in the second target data stream according to the time stamp of the first data packet in the first type of data stream and the second type of data stream and a preset communication rule to obtain a first network stream; the data packets in the first-class data stream and the second-class data stream are arranged according to the sequence of the timestamps, and the preset communication rule is determined based on the timestamps and the communication time of the data packets.
Optionally, the method further includes:
dividing the training sample set into a plurality of data sets;
dividing a plurality of data sets into a plurality of groups of data sets; each group of data sets comprises a test set and a training set;
training a first anomaly detection model to be trained by adopting each group of training sets respectively to obtain a plurality of first anomaly detection models;
respectively testing the trained first anomaly detection models through the test set to obtain a plurality of test results;
calculating a generalization error of the model based on each trained first anomaly detection model;
an optimal first anomaly detection model is determined from the trained plurality of first anomaly detection models based on the generalized error of each trained first anomaly detection model.
It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method for detecting data traffic abnormity of an electric power Internet of things is characterized by comprising the following steps:
acquiring data of the power Internet of things to be identified;
inputting the to-be-identified electric power Internet of things data into a pre-trained first anomaly detection model to obtain an anomaly result of the to-be-identified electric power Internet of things data; the anomaly detection model is obtained by training a first anomaly detection model to be trained through electric power Internet of things data marked with time and anomaly results.
2. The method of claim 1, wherein the training process of the first anomaly detection model comprises:
acquiring a training sample set containing power Internet of things data, and labeling the training sample set to obtain a first data set; the first data set comprises power internet of things data with an abnormal result label;
dividing the data in the first data set into different first network flows according to preset data flow characteristics and timestamps to obtain a second data set;
performing dimensionality reduction on the data in the second data set through a preset automatic encoder, and extracting the characteristics of each network flow from the second data set;
inputting the characteristics and the label information of each network flow into a second anomaly detection model to be trained, and training the second anomaly detection model to be trained, wherein the label information at least comprises: an abnormal result; the automatic encoder and the second abnormality detection model constitute a first abnormality detection model.
3. The method of claim 2, wherein the labeling the training sample set comprises:
constructing a rule set based on a preset expert knowledge base;
constructing a labeling function, wherein a first independent variable of the standard function is power internet of things data, a second independent variable of the standard function is the rule set, and a target variable is a labeling result;
and inputting the data in the training sample set into the standard function as a first independent variable to confirm the result of the target variable.
4. The method of claim 2, wherein the dividing the data in the first data set into different first network streams according to the preset data stream characteristics and the time stamps to obtain a second data set comprises:
separating data in the first data set into a first type of data stream or a second type of data stream according to preset first data stream characteristics; the first data stream characteristics include at least: a source IP address, a destination IP address, a source port number, and a destination port number;
adding corresponding protocol numbers in the data packets of the first class data stream and the second class data stream respectively;
dividing data packets in the second target data stream according to the time stamp of the first data packet in the first type of data stream and the second type of data stream and a preset communication rule to obtain a first network stream; the data packets in the first-class data stream and the second-class data stream are arranged according to the sequence of the timestamps, and the preset communication rule is determined based on the timestamps and the communication time of the data packets.
5. The method of claim 2, further comprising:
dividing the training sample set into a plurality of data sets;
dividing a plurality of data sets into a plurality of groups of data sets; each group of data sets comprises a test set and a training set;
training a first anomaly detection model to be trained by adopting each group of training sets respectively to obtain a plurality of first anomaly detection models;
respectively testing the trained first anomaly detection models through the test set to obtain a plurality of test results;
calculating a generalization error of the model based on each trained first anomaly detection model;
an optimal first anomaly detection model is determined from the trained plurality of first anomaly detection models based on the generalized error of each trained first anomaly detection model.
6. The utility model provides an unusual detection device of electric power thing networking data flow which characterized in that includes:
the acquisition unit is used for acquiring the data of the power Internet of things to be identified;
the identification unit is used for inputting the to-be-identified electric power Internet of things data into a pre-trained first anomaly detection model to obtain an anomaly result of the to-be-identified electric power Internet of things data; the anomaly detection model is obtained by training a first anomaly detection model to be trained through electric power Internet of things data marked with time and anomaly results.
7. The apparatus of claim 6, further comprising:
the system comprises a labeling unit, a data processing unit and a data processing unit, wherein the labeling unit is used for acquiring a training sample set containing power Internet of things data and labeling the training sample set to obtain a first data set; the first data set comprises power internet of things data with an abnormal result label;
the integration unit is used for dividing the data in the first data set into different first network flows according to preset data flow characteristics and timestamps to obtain a second data set;
the coding unit is used for performing dimension reduction processing on the data in the second data set through a preset automatic coder and extracting the characteristics of each network stream from the second data set;
a first training unit, configured to input features and label information of each network flow into a second anomaly detection model to be trained, and train the second anomaly detection model to be trained, where the label information at least includes: an abnormal result; the automatic encoder and the second abnormality detection model constitute a first abnormality detection model.
8. The apparatus of claim 7, wherein the labeling unit comprises:
the first construction subunit is used for constructing a rule set based on a preset expert knowledge base;
the second construction subunit is used for constructing an annotation function, wherein a first independent variable of the standard function is power internet of things data, a second independent variable is the rule set, and a target variable is an annotation result;
and the determining subunit is used for inputting the data in the training sample set into the standard function as a first independent variable and determining the result of the target variable.
9. The apparatus of claim 7, wherein the integration unit comprises:
the first dividing unit is used for dividing the data in the first data set into a first type data stream or a second type data stream according to preset first data stream characteristics; the first data stream characteristics include at least: a source IP address, a destination IP address, a source port number, and a destination port number;
an adding subunit, configured to add corresponding protocol numbers to the data packets of the first-class data stream and the second-class data stream, respectively;
the second dividing subunit is used for dividing the data packets in the second target data stream according to the time stamp of the first data packet in the first-class data stream and the second-class data stream and a preset communication rule to obtain a first network stream; the data packets in the first-class data stream and the second-class data stream are arranged according to the sequence of the timestamps, and the preset communication rule is determined based on the timestamps and the communication time of the data packets.
10. An electronic device, comprising:
a memory and a processor;
the memory is used for storing programs, and the processor executes the method for detecting the data traffic abnormality of the power internet of things according to any one of claims 1 to 5 when executing the programs in the memory.
CN202111233244.0A 2021-10-22 2021-10-22 Method and device for detecting abnormal data traffic of power internet of things Pending CN113935426A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111233244.0A CN113935426A (en) 2021-10-22 2021-10-22 Method and device for detecting abnormal data traffic of power internet of things

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111233244.0A CN113935426A (en) 2021-10-22 2021-10-22 Method and device for detecting abnormal data traffic of power internet of things

Publications (1)

Publication Number Publication Date
CN113935426A true CN113935426A (en) 2022-01-14

Family

ID=79283741

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111233244.0A Pending CN113935426A (en) 2021-10-22 2021-10-22 Method and device for detecting abnormal data traffic of power internet of things

Country Status (1)

Country Link
CN (1) CN113935426A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114584522A (en) * 2022-01-21 2022-06-03 中国人民解放军国防科技大学 Identification method, system, medium and terminal of Internet of things equipment
CN116055413A (en) * 2023-03-07 2023-05-02 云南省交通规划设计研究院有限公司 Tunnel network anomaly identification method based on cloud edge cooperation
CN116738049A (en) * 2023-06-13 2023-09-12 湖北华中电力科技开发有限责任公司 Power consumption monitoring system, method, device and storage medium based on big data technology
CN116743636A (en) * 2023-08-14 2023-09-12 中国电信股份有限公司 Abnormal data detection method and device, electronic equipment and computer readable medium
CN117834389B (en) * 2024-03-04 2024-05-03 中国西安卫星测控中心 Fault analysis method based on abnormal communication service characteristic element matrix

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114584522A (en) * 2022-01-21 2022-06-03 中国人民解放军国防科技大学 Identification method, system, medium and terminal of Internet of things equipment
CN114584522B (en) * 2022-01-21 2024-02-06 中国人民解放军国防科技大学 Identification method, system, medium and terminal of Internet of things equipment
CN116055413A (en) * 2023-03-07 2023-05-02 云南省交通规划设计研究院有限公司 Tunnel network anomaly identification method based on cloud edge cooperation
CN116055413B (en) * 2023-03-07 2023-08-15 云南省交通规划设计研究院有限公司 Tunnel network anomaly identification method based on cloud edge cooperation
CN116738049A (en) * 2023-06-13 2023-09-12 湖北华中电力科技开发有限责任公司 Power consumption monitoring system, method, device and storage medium based on big data technology
CN116743636A (en) * 2023-08-14 2023-09-12 中国电信股份有限公司 Abnormal data detection method and device, electronic equipment and computer readable medium
CN116743636B (en) * 2023-08-14 2023-10-31 中国电信股份有限公司 Abnormal data detection method and device, electronic equipment and computer readable medium
CN117834389B (en) * 2024-03-04 2024-05-03 中国西安卫星测控中心 Fault analysis method based on abnormal communication service characteristic element matrix

Similar Documents

Publication Publication Date Title
CN113935426A (en) Method and device for detecting abnormal data traffic of power internet of things
CN108763319B (en) Social robot detection method and system fusing user behaviors and text information
CN111144470B (en) Unknown network flow identification method and system based on deep self-encoder
CN111191767B (en) Vectorization-based malicious traffic attack type judging method
CN113395276B (en) Network intrusion detection method based on self-encoder energy detection
CN111245848B (en) Industrial control intrusion detection method for hierarchical dependency modeling
CN111431819A (en) Network traffic classification method and device based on serialized protocol flow characteristics
CN111181923A (en) Flow detection method and device, electronic equipment and storage medium
CN109698798B (en) Application identification method and device, server and storage medium
CN112165484B (en) Network encryption traffic identification method and device based on deep learning and side channel analysis
CN104298782A (en) Method for analyzing active access behaviors of internet users
CN101447995B (en) Method for identifying P2P data stream, device and system thereof
CN114444096B (en) Network data storage encryption detection system based on data analysis
CN116662184B (en) Industrial control protocol fuzzy test case screening method and system based on Bert
CN115883424B (en) Method and system for predicting flow data between high-speed backbone networks
CN114205151A (en) HTTP/2 page access flow identification method based on multi-feature fusion learning
CN113343235B (en) Application layer malicious effective load detection method, system, device and medium based on Transformer
CN115587007A (en) Robertta-based weblog security detection method and system
CN108055149A (en) End-to-end Traffic Anomaly feature extracting method in a kind of Time and Frequency Synchronization application
CN113542271A (en) Network background flow generation method based on generation of confrontation network GAN
Tang et al. Relational reasoning-based approach for network protocol reverse engineering
CN111917715B (en) Equipment identification method based on 802.11ac MAC layer fingerprint
CN117675351A (en) Abnormal flow detection method and system based on BERT model
CN117041360A (en) Network flow independent coding method based on self-supervised learning
CN117014198A (en) Game platform network security detection method and system thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination