CN115348063A - DNN and K-means-based power system network flow identification method - Google Patents

DNN and K-means-based power system network flow identification method Download PDF

Info

Publication number
CN115348063A
CN115348063A CN202210882066.2A CN202210882066A CN115348063A CN 115348063 A CN115348063 A CN 115348063A CN 202210882066 A CN202210882066 A CN 202210882066A CN 115348063 A CN115348063 A CN 115348063A
Authority
CN
China
Prior art keywords
data
dnn
power system
network
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210882066.2A
Other languages
Chinese (zh)
Inventor
刘建戈
张鹏宇
季一木
李茂�
姜蒙娜
王伟业
刘尚东
高山
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Dingyan Power Technology Co ltd
Nanjing University of Posts and Telecommunications
HuaiAn Power Supply Co of State Grid Jiangsu Electric Power Co Ltd
Original Assignee
Nanjing Dingyan Power Technology Co ltd
Nanjing University of Posts and Telecommunications
HuaiAn Power Supply Co of State Grid Jiangsu Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Dingyan Power Technology Co ltd, Nanjing University of Posts and Telecommunications, HuaiAn Power Supply Co of State Grid Jiangsu Electric Power Co Ltd filed Critical Nanjing Dingyan Power Technology Co ltd
Publication of CN115348063A publication Critical patent/CN115348063A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention relates to the technical field of network security artificial intelligence, and discloses a DNN and K-means based power system network flow identification method, which is used for screening original data and selecting data items which can provide more information for classification to form a data sample; carrying out integration operation and normalization operation on sample data for preprocessing; iteratively training a DNN network model, using a preprocessed training set to train the DNN network model, and performing preliminary classification on the network flow of the power system to obtain a classification confidence coefficient and positive and negative example results; and classifying the samples which are judged to be suspected servers after the DNN network model is processed by using a K-means algorithm. Compared with the prior art, the method has higher accuracy in the classification application of the power network data, and can meet the requirement of the network flow classification of the power system in the real environment.

Description

DNN and K-means-based power system network flow identification method
Technical Field
The invention relates to the technical field of network security artificial intelligence, in particular to a DNN and K-means-based power system network flow identification method.
Background
With the development and application of the power system network technology, the scale of the power system network is continuously enlarged, the network complexity is also obviously improved, and the network security risk of the power system is increased accordingly. Under the current situation of power network threat normalization, the accurate detection and analysis capability and the early warning capability become the key of the safety capability of new generation of big data gradually.
Attacks on the power system network by attackers are basically performed in the form of network services, and the attack initiating end has similar traffic characteristics to the servers of the power system network. Therefore, identification and classification of power system network traffic is a key step in network security. The traditional network traffic identification method usually depends on a large amount of manual inquiry and verification or needs to manually determine rules, and the cost is not negligible under the current millions of large data traffic scenes. Meanwhile, the traditional method has long query interval, poor real-time performance and low accuracy, so that the method can only be used for passive defense strategies, has insufficient early warning capability and is difficult to deal with new network security threats of the power system.
Disclosure of Invention
The invention aims to: aiming at the problems in the prior art, the invention provides a DNN and K-means based power system network flow identification method, which has a good effect on classification of power system network flow, discovers abnormal flow and illegal service terminals as soon as possible and maintains the safety of a power system network.
The technical scheme is as follows: the invention provides a DNN and K-means based power system network flow identification method, which comprises the following steps:
step 1: screening original data, selecting data items which can provide more information for classification to form data samples, wherein the data samples comprise server IP addresses, ports, protocols, client IP addresses, byte numbers, bit rates, packet numbers, session total numbers and time;
step 2: performing integration operation and normalization operation on the sample data to perform preprocessing;
and step 3: iteratively training a DNN network model, using a preprocessed training set to train the DNN network model, and performing preliminary classification on the network flow of the power system to obtain a classification confidence coefficient and positive and negative example results;
and 4, step 4: and classifying the samples which are judged to be suspected servers after the DNN network model processing by using a K-means algorithm.
Further, in the step 1, some IPs of the power system network are known as power system servers, and some IPs that are determined not to be power system servers are also known, and sample data corresponding to the IPs are extracted to be respectively used as positive examples and negative examples of the training set.
Further, the integrating operation and the normalizing operation in step 2 specifically include:
and integrating the sample data of the same server IP, the same port, the same protocol and the same time period into one piece of data by using four-tuple of the server IP, the port, the protocol and the time as an index, and then performing max-min normalization processing on each sample data to map the sample value to a [0,1] interval.
Further, the DNN network model structure in step 3 is: the method comprises the steps that an input layer inputs an n-dimensional data sample, data characteristics are output to an output layer through three full-connection layers of a hidden layer, the output layer outputs a local value and becomes a predicted confidence value through a Sigmoid activation function, the three full-connection layers use a nonlinear activation function Relu to use real-time test data to input into a network to obtain a predicted result confidence, and then a result probability value output by a DNN network model is judged to obtain a positive case and a negative case.
Further, the number of the neurons of the three fully-connected layers of the hidden layer is 512, 256 and 128.
Further, the preliminary classification of the network traffic of the power system in the step 3 to obtain the classification confidence and the positive and negative case results includes the following specific operations:
step 3.1: acquiring data, namely acquiring a group of power system network flow training data { x (N), y (N) |1 is not less than N and not more than N }, wherein x is a network flow sample and comprises statistical information such as the total packet number, the packet number per second, the total byte number, the byte number per second and the like of an IP (Internet protocol) of a certain end; y is a sample label, is a manually labeled value, and represents whether the training data is a server; n is the total number of training data;
step 3.2: for a DNN network output function f, inputting data into the network to obtain a classification result:
Figure BDA0003762722090000021
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003762722090000022
the confidence coefficient of the classification result is, and theta is a network parameter of the DNN network model;
step 3.3, iterative training is carried out on the DNN network model, and a binary cross entropy function is used as a training loss function:
Figure BDA0003762722090000023
step 3.4: and optimizing and updating the network parameter theta, and selecting random gradient descent as an optimizer.
Further, the specific steps of classifying by using the K-means algorithm in the step 4 are as follows:
step 4.1: determining a threshold value, dividing the classification result, considering the classification result as a server if the classification result is higher than the threshold value, and finding out all original flow samples classified as the result of the server if the classification result is lower than the threshold value, adding respective classification confidence degrees into the original flow samples to form new sample data
Figure BDA0003762722090000024
M is the total number of new samples;
step 4.2: determining the value K of the cluster, thereby determining K cluster centers { c (K) |1 ≦ K ≦ K }, and performing random initialization on the cluster centers; calculating the Euclidean distance from each sample to each clustering center, sequentially comparing the distance from each sample to each clustering center, and then distributing the samples to the cluster of the clustering center closest to the sample to obtain K clusters { s (K) |1 is less than or equal to K };
step 4.3: after the class clusters are obtained, the position of a clustering center is updated through the class clusters by a K-means algorithm, and the new clustering center is the mean value of each sample in the class clusters on each dimension.
Further, the specific method for determining the K value of the cluster in the step 4.2 is to calculate the residual square sum SSE from the sample in the cluster to the center of the cluster, sequentially take the K value as 1,2,3 \8230, then use the K value as an independent variable and the average SSE as a dependent variable to construct a graph, find an inflection point where the image slope rapidly drops to a gentle drop, and the K value at the point is the optimal K value.
Has the advantages that:
compared with a manual method, the method provided by the invention has the advantages of higher real-time performance, stronger data processing capacity and lower cost. Compared with the traditional data analysis method based on ports and flow, the method has higher accuracy, adds a machine learning clustering algorithm after judging whether the server is used or not, automatically performs multi-classification, and provides greater convenience for the analysis and verification of results by subsequent workers.
Drawings
FIG. 1 is a diagram of a system model architecture;
FIG. 2 is a schematic diagram of a deep neural network architecture;
fig. 3 is a deep neural network connection diagram.
Detailed Description
The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
The invention discloses a DNN and K-means based power system network flow identification method, which comprises the following steps:
step 1: and acquiring real-time data from a network flow database of the power system, and screening out required data items.
Step 2: the data is integrated and normalized and preprocessed into a form suitable for classification.
The data items of the data samples in the power network traffic database are many, and some data items, such as a TCP synchronization packet, an average ACK delay of a client, and the like, do not help in classifying the network traffic. Therefore, the invention firstly screens the original data, selects the data items which can provide more information for classification to form data samples, and finally selects the data items such as server IP address, port, protocol, client IP address, byte number, bit rate, packet number, total session number, time and the like as sample data. In all the flow data, the invention knows that some power system network IPs are power system servers, and also knows that some IPs are determined not to be servers, and sample data corresponding to the IPs are extracted to be respectively used as positive examples and negative examples of the training set.
Although the number of known server IPs in the power network is a very small proportion of the number of all end IPs, the data samples belonging to the known server IPs account for a vast majority of all data samples, which indicates that a vast majority of sessions in the network are initiated by a very small number of servers, which results in a number of positive cases far greater than negative cases, and thus, a great influence is exerted on the training effect. Therefore, the present invention performs an integration operation on the sample data to solve this problem. Specifically, the present invention integrates sample data of the same server IP, the same port, the same protocol, and the same time period into one piece of data using a quadruple (server IP, port, protocol, time) as an index. The number of positive cases is effectively reduced, the number of positive cases and negative cases is more balanced after integration, and a better experiment effect is obtained.
Finally, because the magnitude difference of each data item in the data sample is too large, for example, the total byte number data item can reach 10 6 Of the order of magnitude, but only 10 packets per second 1 The direct training using such data may cause the neural network to have a gradient explosion condition, affecting the model performance. Therefore, the invention performs max-min normalization processing on each sample, and maps the sample value to [0, 1')]And (3) interval, so that the fast and stable convergence of the model is realized in training, and for a certain sample x, the normalization formula is shown as formula 1:
Figure BDA0003762722090000041
wherein x is min Denotes the minimum value, x, of all samples max Represents the maximum of all samples, and x' represents the normalized sample.
And step 3: and training a two-class DNN model by using data to distinguish whether the IP of a certain terminal is the IP of the server. The data to be classified is processed by DNN to output a two-classification prediction result which represents the confidence that the data belongs to a server in the network. And screening the results according to the confidence level, wherein the case that the confidence level is greater than a certain threshold value is a positive case, namely the server data, and the case that the confidence level is greater than the certain threshold value is a negative case, namely the server data.
For the deep neural network model construction mentioned in step 3:
firstly, acquiring data, namely acquiring a group of power system network flow training data { x (N), y (N) |1 is more than or equal to N and less than or equal to N }, wherein x is a network flow sample and comprises statistical information such as the total packet number, the packet number per second, the total byte number, the byte number per second and the like of an IP (Internet protocol) of a certain end; y is a sample label, is a manually labeled value, and represents whether the training data is a server; n is the total number of training data. At this time, for a DNN network output function f, data is input into the network to obtain a classification result, as shown in formula 2:
Figure BDA0003762722090000042
wherein the content of the first and second substances,
Figure BDA0003762722090000043
is the confidence of the classification result, and theta is the network parameter of DNN. The DNN network structure is schematically shown in fig. 2. The input layer packs the data of the network traffic into batch and transmits the batch into the hidden layer neural network. There are groups of neurons in the hidden layer. The hidden layer transmits the extracted features into the output layer, and the output layer outputs a result logits which can be converted into probability through a Sigmoid function.
To make it possible to
Figure BDA0003762722090000044
The results are as accurate as possible, requiring iterative training of the DNN. The invention uses a Binary cross entropy function as a loss function of training, wherein the Binary cross entropy function is shown as formula 3:
Figure BDA0003762722090000051
wherein N is the total amount of samples. The loss function is used for iteratively training the DNN network model, which is an optimization process aimed at minimizing the value of the loss function so that the training result value is as close to the label as possible, i.e. the objective function J can be expressed as:
J=min L(θ,x,y) (4)
in the process, the network parameter theta is optimized and updated, random gradient descent (SGD) is selected as an optimizer, and the optimization method can be expressed as follows:
θ t+1 =θ tt g t (5)
where t denotes a certain iteration, λ t Is an optimization weight, commonly referred to as learning rate, g represents a random gradient (stochastic gradient) that is expected to be a gradient of f, i.e., satisfy
Figure BDA0003762722090000052
The DNN can be fully learned to the characteristics of the power network flow data by iterative training for a certain number of times by using the algorithm, so that accurate classification can be made when real-time power network flow data is faced.
When the DNN model was trained as described above, the number of epochs used was 300, the batch size was 128, and the learning rate was 0.02.
And 4, step 4: and adding a confidence coefficient item into the data sample of the positive case, and carrying out K-means cluster analysis to cluster a multi-classification result representing the specific type of the server.
By training DNN by using a deep learning method, a classification result indicating whether a certain flow sample belongs to a certain server can be obtained, but in order to further explore which server the flow sample belongs to, the invention adopts a K-means clustering algorithm to perform clustering operation on the classified server flow sample. K-means is an unsupervised machine learning clustering method, and the flow is as follows: first, a threshold is determined, and the classification result is classified, wherein a value higher than the threshold is considered as a server, and a value lower than the threshold is not. Then, find all the clothes classified as clothesAdding respective classification confidence degrees to the original flow sample of the server result to form new sample data
Figure BDA0003762722090000053
M is the total number of new samples.
Subsequently, the value K of the cluster is determined, so that K cluster centers { c (K) |1 ≦ K ≦ K } are determined, and random initialization is performed on the cluster centers. Next, the euclidean distance from each sample to the center of each cluster is calculated, as shown in equation 6:
Figure BDA0003762722090000054
wherein the content of the first and second substances,
Figure BDA0003762722090000061
representing new sample data, c j Is the cluster center.
And sequentially comparing the distance from each sample to each clustering center, and then distributing the samples to the cluster of the nearest clustering center to obtain K clusters { s (K) |1 ≦ K }.
After the class clusters are obtained, the position of a clustering center is updated through the class clusters by a K-means algorithm, and the new clustering center is the mean value of each sample in the class clusters on each dimension, namely:
Figure BDA0003762722090000062
wherein the content of the first and second substances,
Figure BDA0003762722090000063
is the updated k-th cluster center. The algorithm is repeated for a plurality of times until convergence, and samples belonging to the server can be divided into K types.
The optimal K value of the K-means clustering algorithm can be determined through experiments, and the most common determination method is the elbow method. First, the Sum of squares of residuals (SSE) from the samples in the cluster to the cluster center is calculated, which is a commonly used index for measuring the classification quality of the samples in the cluster, as shown in equation 8:
Figure BDA0003762722090000064
in which p is a cluster of species s i A sample of (2), c i Is the corresponding cluster center. A smaller value of SSE indicates a better quality of classification of the samples in the cluster.
As the number of clusters K increases, the sample partitioning becomes finer and the SSE value for each cluster should be correspondingly smaller. Moreover, when K is smaller than the optimal cluster number, the decrease of SSE is large because the increase of K greatly increases the classification quality of each cluster, and when K reaches the optimal cluster number, the classification quality return obtained by increasing K is rapidly reduced, so the decrease of SSE tends to be gentle. The method for determining the optimal K value sequentially takes the K value as 1,2,3 \8230, then uses the K value as an independent variable and the average SSE as a dependent variable to carry out mapping, finds the inflection point of the image slope from the rapid reduction to the gentle reduction, and the K value of the point is the optimal K value.
For the DNN classification model, the commonly used indicators are accuracy (precision) and recall (recall), and their formulas are as follows:
Figure BDA0003762722090000065
Figure BDA0003762722090000066
wherein tp represents the number of true positive examples, namely the number of samples which are actually positive examples and are predicted to be positive examples; fp represents the number of false positive examples, namely the number of samples which are actually negative examples but are predicted to be positive examples; the direction represents the number of false negative examples, i.e. the number of samples that are actually positive examples but predicted to be negative examples. The performance of the model can be measured from two dimensions using accuracy and recall indicators. In general, different threshold values δ are taken as sample points in an experiment, an accuracy-Recall curve (P-R) is drawn, and the superiority and inferiority of a model are determined through the height of an equilibrium point of the curve.
For the evaluation of the K-means clustering algorithm, the evaluation index can be determined by using a contour Coefficient method (Silhouette Coefficient) in addition to the above-mentioned residual sum of squares SSE, and for a certain sample x, the formula of the contour Coefficient method is as follows:
Figure BDA0003762722090000071
where a (x) is called intra-cluster dissimilarity, which is the sum of the distances of sample x from other samples in the cluster; b (x) is the dissimilarity between clusters, which is the sum of the distances between the sample x and other samples in other clusters. The value range of SC is [ -1,1], and the closer to 1, the better the clustering performance is.
The present invention uses network data acquired in the actual power system network to validate the proposed method. The method selects the power network flow data of 25 days in total, and the time stamps of the data are divided according to hours. Firstly, a known intranet server IP and an IP which is known not to belong to a server are screened, 2257429 pieces of original data are selected in total, and after the data are subjected to operations such as marking, integration and normalization, 162952 pieces of training data samples are obtained to form a training set. Then, preprocessing operations except for marking are carried out on the rest 2784335 unknown data to obtain 6988459 test data samples, and a test set is formed.
The method uses a training set to construct a model, predicts a test set by using the model, and divides the test set according to a threshold value delta =0.5 to obtain 591 suspected IPs. Because the test set is unknown non-label data, manual verification is carried out, and some of the IPs are found to belong to a real service server, and some are class server devices, such as monitoring devices, IAD devices, soft switch devices, and the like. The specific experimental effects are shown in table 1:
TABLE 1 test set Classification results
Figure BDA0003762722090000072
In table 1, the occupation ratio refers to the proportion of a certain type of device in all suspected IPs, and the other types of devices refer to non-grid internal devices. Most suspected IPs are unreported devices, accounting for 70.38%, accounting for 25.92% of service servers, and the average confidence is higher, reaching 97.59%, which indicates that the model has the capability of more accurately discovering suspected servers.
In cluster analysis of data, it is found that there are also different classifications between the service servers. Of the suspected IPs found, 66.67% of the traffic servers were clustered into one class, and the remaining 4 classes of the cluster also included different traffic servers. In the class server device, the cluster analysis is successful in distinguishing various devices, and the specific cluster analysis result is shown in table 2.
TABLE 2 test set Cluster analysis results
Figure BDA0003762722090000081
The ratio in table 2 represents the proportion of devices clustered into this class to the total number of such devices. It can be seen that, in the case of class server devices, most of the various devices are clustered into the same class, which indicates that the clustering analysis has a certain distinguishing effect on the device types.
The above embodiments are merely illustrative of the technical concepts and features of the present invention, and the purpose of the embodiments is to enable those skilled in the art to understand the contents of the present invention and implement the present invention, and not to limit the protection scope of the present invention. All equivalent changes and modifications made according to the spirit of the present invention should be covered within the protection scope of the present invention.

Claims (8)

1. A DNN and K-means based power system network flow identification method is characterized by comprising the following steps:
step 1: screening original data, and selecting data items which can provide more information for classification to form data samples, wherein the data samples comprise server IP addresses, ports, protocols, client IP addresses, byte numbers, bit rates, packet numbers, session total numbers and time;
step 2: performing integration operation and normalization operation on the sample data to perform preprocessing;
and step 3: iteratively training a DNN network model, using a preprocessed training set to train the DNN network model, and performing preliminary classification on the network flow of the power system to obtain a classification confidence coefficient and positive and negative example results;
and 4, step 4: and classifying the samples which are judged to be suspected servers after the DNN network model is processed by using a K-means algorithm.
2. The method for identifying network traffic of electric power system based on DNN and K-means as claimed in claim 1, wherein in step 1, some IP of electric power system network are known as server of electric power system, and some IP determined not to be server of electric power system, and sample data corresponding to said IP are extracted as positive and negative examples of training set respectively.
3. The DNN and K-means based power system network traffic identification method according to claim 1, wherein the integrating operation and the normalizing operation in the step 2 specifically comprise:
and integrating the sample data of the same server IP, the same port, the same protocol and the same time period into one piece of data by using four-tuple of the server IP, the port, the protocol and the time as an index, and then performing max-min normalization processing on each sample data to map the sample value to a [0,1] interval.
4. The DNN and K-means based power system network traffic identification method according to claim 1, wherein the DNN network model structure in the step 3 is: the method comprises the steps that an input layer inputs an n-dimensional data sample, data characteristics are output to an output layer through three full-connection layers of a hidden layer, the output layer outputs a logic value and becomes a predicted confidence value through a Sigmoid activation function, the three full-connection layers use a nonlinear activation function Relu to input real-time test data into a network to obtain a predicted result confidence, and then a result probability value output by a DNN network model is judged to obtain a positive case and a negative case.
5. The DNN and K-means based power system network traffic identification method of claim 4, wherein the number of neurons of the three fully connected layers of the hidden layer is 512, 256 and 128.
6. The DNN and K-means based power system network traffic identification method according to claim 4 or 5, wherein the preliminary classification of the power system network traffic in the step 3 to obtain the classification confidence and the positive and negative case results comprises the following specific operations:
step 3.1: acquiring data, namely acquiring a group of power system network flow training data { x (N), y (N) |1 is not less than N and not more than N }, wherein x is a network flow sample and comprises statistical information such as the total packet number, the packet number per second, the total byte number, the byte number per second and the like of an IP (Internet protocol) of a certain end; y is a sample label, is a manually labeled value, and represents whether the training data is a server; n is the total number of training data;
step 3.2: for a DNN network output function f, inputting data into the network to obtain a classification result:
Figure FDA0003762722080000021
wherein, the first and the second end of the pipe are connected with each other,
Figure FDA0003762722080000022
the confidence coefficient of the classification result is, and theta is a network parameter of the DNN network model;
step 3.3, iterative training is carried out on the DNN network model, and a binary cross entropy function is used as a training loss function:
Figure FDA0003762722080000023
step 3.4: and optimizing and updating the network parameter theta, and selecting random gradient descent as an optimizer.
7. The DNN and K-means-based power system network traffic identification method of claim 1, wherein the K-means algorithm used for classification in the step 4 comprises the following specific steps:
step 4.1: determining a threshold value, dividing the classification result, considering the classification result as a server if the classification result is higher than the threshold value, and finding out all original flow samples classified as the result of the server if the classification result is lower than the threshold value, adding respective classification confidence degrees into the original flow samples to form new sample data
Figure FDA0003762722080000024
M is the total number of new samples;
step 4.2: determining the value K of the cluster, thereby determining K cluster centers { c (K) |1 ≦ K ≦ K }, and performing random initialization on the cluster centers; calculating the Euclidean distance from each sample to each clustering center, sequentially comparing the distance from each sample to each clustering center, and then distributing the samples to the cluster of the clustering center closest to the sample to obtain K clusters { s (K) |1 is less than or equal to K };
step 4.3: after the class clusters are obtained, the position of a clustering center is updated through the class clusters by a K-means algorithm, and the new clustering center is the mean value of each sample in the class clusters on each dimension.
8. The DNN and K-means-based power system network flow identification method of claim 7, wherein the specific method for determining the K value of the cluster in the step 4.2 is to calculate the residual Sum of Squares (SSE) from the samples in the cluster to the center of the cluster, sequentially take the K value of 1,2,3 \8230, then use the K value as an independent variable and the average SSE as a dependent variable to map, find the inflection point where the image slope is rapidly reduced to be gently reduced, and the K value of the point is the optimal K value.
CN202210882066.2A 2022-05-07 2022-07-25 DNN and K-means-based power system network flow identification method Pending CN115348063A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2022104962630 2022-05-07
CN202210496263 2022-05-07

Publications (1)

Publication Number Publication Date
CN115348063A true CN115348063A (en) 2022-11-15

Family

ID=83949752

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210882066.2A Pending CN115348063A (en) 2022-05-07 2022-07-25 DNN and K-means-based power system network flow identification method

Country Status (1)

Country Link
CN (1) CN115348063A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111209563A (en) * 2019-12-27 2020-05-29 北京邮电大学 Network intrusion detection method and system
CN111404911A (en) * 2020-03-11 2020-07-10 国网新疆电力有限公司电力科学研究院 Network attack detection method and device and electronic equipment
CN111565156A (en) * 2020-04-27 2020-08-21 南京烽火星空通信发展有限公司 Method for identifying and classifying network traffic
CN111832647A (en) * 2020-07-10 2020-10-27 上海交通大学 Abnormal flow detection system and method
CN113343587A (en) * 2021-07-01 2021-09-03 国网湖南省电力有限公司 Flow abnormity detection method for electric power industrial control network
CN114372536A (en) * 2022-01-13 2022-04-19 中国人民解放军国防科技大学 Unknown network flow data identification method and device, computer equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111209563A (en) * 2019-12-27 2020-05-29 北京邮电大学 Network intrusion detection method and system
CN111404911A (en) * 2020-03-11 2020-07-10 国网新疆电力有限公司电力科学研究院 Network attack detection method and device and electronic equipment
CN111565156A (en) * 2020-04-27 2020-08-21 南京烽火星空通信发展有限公司 Method for identifying and classifying network traffic
CN111832647A (en) * 2020-07-10 2020-10-27 上海交通大学 Abnormal flow detection system and method
CN113343587A (en) * 2021-07-01 2021-09-03 国网湖南省电力有限公司 Flow abnormity detection method for electric power industrial control network
CN114372536A (en) * 2022-01-13 2022-04-19 中国人民解放军国防科技大学 Unknown network flow data identification method and device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN104601565B (en) A kind of network invasion monitoring sorting technique of intelligent optimization rule
CN110309302B (en) Unbalanced text classification method and system combining SVM and semi-supervised clustering
CN111211994B (en) Network traffic classification method based on SOM and K-means fusion algorithm
CN111786951B (en) Traffic data feature extraction method, malicious traffic identification method and network system
CN107579846B (en) Cloud computing fault data detection method and system
CN111507385B (en) Extensible network attack behavior classification method
CN112488226B (en) Terminal abnormal behavior identification method based on machine learning algorithm
CN111343171B (en) Intrusion detection method based on mixed feature selection of support vector machine
CN115811440B (en) Real-time flow detection method based on network situation awareness
CN113269647A (en) Graph-based transaction abnormity associated user detection method
CN116805051A (en) Double convolution dynamic domain adaptive equipment fault diagnosis method based on attention mechanism
CN115801374A (en) Network intrusion data classification method and device, electronic equipment and storage medium
CN117478390A (en) Network intrusion detection method based on improved density peak clustering algorithm
CN114124437B (en) Encrypted flow identification method based on prototype convolutional network
CN115348063A (en) DNN and K-means-based power system network flow identification method
CN116541792A (en) Method for carrying out group partner identification based on graph neural network node classification
CN113609480B (en) Multipath learning intrusion detection method based on large-scale network flow
CN115879030A (en) Network attack classification method and system for power distribution network
CN108540474B (en) Computer network defense decision-making system
CN113010673A (en) Vulnerability automatic classification method based on entropy optimization support vector machine
Lu et al. An Alert Aggregation Algorithm Based on K-means and Genetic Algorithm
Luo et al. Network attack classification and recognition using hmm and improved evidence theory
CN115208703B (en) Industrial control equipment intrusion detection method and system of fragment parallelization mechanism
CN116192765B (en) Attention mechanism-based early identification method for flow of Internet of things equipment
CN115580472B (en) Industrial control network attack flow classification method based on heuristic clustering algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination