CN115348063A - DNN and K-means-based power system network flow identification method - Google Patents
DNN and K-means-based power system network flow identification method Download PDFInfo
- Publication number
- CN115348063A CN115348063A CN202210882066.2A CN202210882066A CN115348063A CN 115348063 A CN115348063 A CN 115348063A CN 202210882066 A CN202210882066 A CN 202210882066A CN 115348063 A CN115348063 A CN 115348063A
- Authority
- CN
- China
- Prior art keywords
- data
- dnn
- power system
- network
- sample
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 40
- 238000012549 training Methods 0.000 claims abstract description 35
- 238000010606 normalization Methods 0.000 claims abstract description 9
- 230000010354 integration Effects 0.000 claims abstract description 6
- 238000012216 screening Methods 0.000 claims abstract description 5
- 238000007781 pre-processing Methods 0.000 claims abstract description 4
- 238000012360 testing method Methods 0.000 claims description 9
- 238000012545 processing Methods 0.000 claims description 5
- 230000004913 activation Effects 0.000 claims description 4
- 230000001419 dependent effect Effects 0.000 claims description 3
- 210000002569 neuron Anatomy 0.000 claims description 3
- 238000013473 artificial intelligence Methods 0.000 abstract description 2
- 230000006870 function Effects 0.000 description 13
- 238000007621 cluster analysis Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 5
- 238000013528 artificial neural network Methods 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 3
- 238000003064 k means clustering Methods 0.000 description 3
- 238000005457 optimization Methods 0.000 description 3
- 239000000126 substance Substances 0.000 description 3
- 238000012795 verification Methods 0.000 description 3
- 238000011156 evaluation Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007123 defense Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 230000000977 initiatory effect Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012806 monitoring device Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1416—Event detection, e.g. attack signature detection
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computer Hardware Design (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention relates to the technical field of network security artificial intelligence, and discloses a DNN and K-means based power system network flow identification method, which is used for screening original data and selecting data items which can provide more information for classification to form a data sample; carrying out integration operation and normalization operation on sample data for preprocessing; iteratively training a DNN network model, using a preprocessed training set to train the DNN network model, and performing preliminary classification on the network flow of the power system to obtain a classification confidence coefficient and positive and negative example results; and classifying the samples which are judged to be suspected servers after the DNN network model is processed by using a K-means algorithm. Compared with the prior art, the method has higher accuracy in the classification application of the power network data, and can meet the requirement of the network flow classification of the power system in the real environment.
Description
Technical Field
The invention relates to the technical field of network security artificial intelligence, in particular to a DNN and K-means-based power system network flow identification method.
Background
With the development and application of the power system network technology, the scale of the power system network is continuously enlarged, the network complexity is also obviously improved, and the network security risk of the power system is increased accordingly. Under the current situation of power network threat normalization, the accurate detection and analysis capability and the early warning capability become the key of the safety capability of new generation of big data gradually.
Attacks on the power system network by attackers are basically performed in the form of network services, and the attack initiating end has similar traffic characteristics to the servers of the power system network. Therefore, identification and classification of power system network traffic is a key step in network security. The traditional network traffic identification method usually depends on a large amount of manual inquiry and verification or needs to manually determine rules, and the cost is not negligible under the current millions of large data traffic scenes. Meanwhile, the traditional method has long query interval, poor real-time performance and low accuracy, so that the method can only be used for passive defense strategies, has insufficient early warning capability and is difficult to deal with new network security threats of the power system.
Disclosure of Invention
The invention aims to: aiming at the problems in the prior art, the invention provides a DNN and K-means based power system network flow identification method, which has a good effect on classification of power system network flow, discovers abnormal flow and illegal service terminals as soon as possible and maintains the safety of a power system network.
The technical scheme is as follows: the invention provides a DNN and K-means based power system network flow identification method, which comprises the following steps:
step 1: screening original data, selecting data items which can provide more information for classification to form data samples, wherein the data samples comprise server IP addresses, ports, protocols, client IP addresses, byte numbers, bit rates, packet numbers, session total numbers and time;
step 2: performing integration operation and normalization operation on the sample data to perform preprocessing;
and step 3: iteratively training a DNN network model, using a preprocessed training set to train the DNN network model, and performing preliminary classification on the network flow of the power system to obtain a classification confidence coefficient and positive and negative example results;
and 4, step 4: and classifying the samples which are judged to be suspected servers after the DNN network model processing by using a K-means algorithm.
Further, in the step 1, some IPs of the power system network are known as power system servers, and some IPs that are determined not to be power system servers are also known, and sample data corresponding to the IPs are extracted to be respectively used as positive examples and negative examples of the training set.
Further, the integrating operation and the normalizing operation in step 2 specifically include:
and integrating the sample data of the same server IP, the same port, the same protocol and the same time period into one piece of data by using four-tuple of the server IP, the port, the protocol and the time as an index, and then performing max-min normalization processing on each sample data to map the sample value to a [0,1] interval.
Further, the DNN network model structure in step 3 is: the method comprises the steps that an input layer inputs an n-dimensional data sample, data characteristics are output to an output layer through three full-connection layers of a hidden layer, the output layer outputs a local value and becomes a predicted confidence value through a Sigmoid activation function, the three full-connection layers use a nonlinear activation function Relu to use real-time test data to input into a network to obtain a predicted result confidence, and then a result probability value output by a DNN network model is judged to obtain a positive case and a negative case.
Further, the number of the neurons of the three fully-connected layers of the hidden layer is 512, 256 and 128.
Further, the preliminary classification of the network traffic of the power system in the step 3 to obtain the classification confidence and the positive and negative case results includes the following specific operations:
step 3.1: acquiring data, namely acquiring a group of power system network flow training data { x (N), y (N) |1 is not less than N and not more than N }, wherein x is a network flow sample and comprises statistical information such as the total packet number, the packet number per second, the total byte number, the byte number per second and the like of an IP (Internet protocol) of a certain end; y is a sample label, is a manually labeled value, and represents whether the training data is a server; n is the total number of training data;
step 3.2: for a DNN network output function f, inputting data into the network to obtain a classification result:
wherein, the first and the second end of the pipe are connected with each other,the confidence coefficient of the classification result is, and theta is a network parameter of the DNN network model;
step 3.3, iterative training is carried out on the DNN network model, and a binary cross entropy function is used as a training loss function:
step 3.4: and optimizing and updating the network parameter theta, and selecting random gradient descent as an optimizer.
Further, the specific steps of classifying by using the K-means algorithm in the step 4 are as follows:
step 4.1: determining a threshold value, dividing the classification result, considering the classification result as a server if the classification result is higher than the threshold value, and finding out all original flow samples classified as the result of the server if the classification result is lower than the threshold value, adding respective classification confidence degrees into the original flow samples to form new sample dataM is the total number of new samples;
step 4.2: determining the value K of the cluster, thereby determining K cluster centers { c (K) |1 ≦ K ≦ K }, and performing random initialization on the cluster centers; calculating the Euclidean distance from each sample to each clustering center, sequentially comparing the distance from each sample to each clustering center, and then distributing the samples to the cluster of the clustering center closest to the sample to obtain K clusters { s (K) |1 is less than or equal to K };
step 4.3: after the class clusters are obtained, the position of a clustering center is updated through the class clusters by a K-means algorithm, and the new clustering center is the mean value of each sample in the class clusters on each dimension.
Further, the specific method for determining the K value of the cluster in the step 4.2 is to calculate the residual square sum SSE from the sample in the cluster to the center of the cluster, sequentially take the K value as 1,2,3 \8230, then use the K value as an independent variable and the average SSE as a dependent variable to construct a graph, find an inflection point where the image slope rapidly drops to a gentle drop, and the K value at the point is the optimal K value.
Has the advantages that:
compared with a manual method, the method provided by the invention has the advantages of higher real-time performance, stronger data processing capacity and lower cost. Compared with the traditional data analysis method based on ports and flow, the method has higher accuracy, adds a machine learning clustering algorithm after judging whether the server is used or not, automatically performs multi-classification, and provides greater convenience for the analysis and verification of results by subsequent workers.
Drawings
FIG. 1 is a diagram of a system model architecture;
FIG. 2 is a schematic diagram of a deep neural network architecture;
fig. 3 is a deep neural network connection diagram.
Detailed Description
The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
The invention discloses a DNN and K-means based power system network flow identification method, which comprises the following steps:
step 1: and acquiring real-time data from a network flow database of the power system, and screening out required data items.
Step 2: the data is integrated and normalized and preprocessed into a form suitable for classification.
The data items of the data samples in the power network traffic database are many, and some data items, such as a TCP synchronization packet, an average ACK delay of a client, and the like, do not help in classifying the network traffic. Therefore, the invention firstly screens the original data, selects the data items which can provide more information for classification to form data samples, and finally selects the data items such as server IP address, port, protocol, client IP address, byte number, bit rate, packet number, total session number, time and the like as sample data. In all the flow data, the invention knows that some power system network IPs are power system servers, and also knows that some IPs are determined not to be servers, and sample data corresponding to the IPs are extracted to be respectively used as positive examples and negative examples of the training set.
Although the number of known server IPs in the power network is a very small proportion of the number of all end IPs, the data samples belonging to the known server IPs account for a vast majority of all data samples, which indicates that a vast majority of sessions in the network are initiated by a very small number of servers, which results in a number of positive cases far greater than negative cases, and thus, a great influence is exerted on the training effect. Therefore, the present invention performs an integration operation on the sample data to solve this problem. Specifically, the present invention integrates sample data of the same server IP, the same port, the same protocol, and the same time period into one piece of data using a quadruple (server IP, port, protocol, time) as an index. The number of positive cases is effectively reduced, the number of positive cases and negative cases is more balanced after integration, and a better experiment effect is obtained.
Finally, because the magnitude difference of each data item in the data sample is too large, for example, the total byte number data item can reach 10 6 Of the order of magnitude, but only 10 packets per second 1 The direct training using such data may cause the neural network to have a gradient explosion condition, affecting the model performance. Therefore, the invention performs max-min normalization processing on each sample, and maps the sample value to [0, 1')]And (3) interval, so that the fast and stable convergence of the model is realized in training, and for a certain sample x, the normalization formula is shown as formula 1:
wherein x is min Denotes the minimum value, x, of all samples max Represents the maximum of all samples, and x' represents the normalized sample.
And step 3: and training a two-class DNN model by using data to distinguish whether the IP of a certain terminal is the IP of the server. The data to be classified is processed by DNN to output a two-classification prediction result which represents the confidence that the data belongs to a server in the network. And screening the results according to the confidence level, wherein the case that the confidence level is greater than a certain threshold value is a positive case, namely the server data, and the case that the confidence level is greater than the certain threshold value is a negative case, namely the server data.
For the deep neural network model construction mentioned in step 3:
firstly, acquiring data, namely acquiring a group of power system network flow training data { x (N), y (N) |1 is more than or equal to N and less than or equal to N }, wherein x is a network flow sample and comprises statistical information such as the total packet number, the packet number per second, the total byte number, the byte number per second and the like of an IP (Internet protocol) of a certain end; y is a sample label, is a manually labeled value, and represents whether the training data is a server; n is the total number of training data. At this time, for a DNN network output function f, data is input into the network to obtain a classification result, as shown in formula 2:
wherein the content of the first and second substances,is the confidence of the classification result, and theta is the network parameter of DNN. The DNN network structure is schematically shown in fig. 2. The input layer packs the data of the network traffic into batch and transmits the batch into the hidden layer neural network. There are groups of neurons in the hidden layer. The hidden layer transmits the extracted features into the output layer, and the output layer outputs a result logits which can be converted into probability through a Sigmoid function.
To make it possible toThe results are as accurate as possible, requiring iterative training of the DNN. The invention uses a Binary cross entropy function as a loss function of training, wherein the Binary cross entropy function is shown as formula 3:
wherein N is the total amount of samples. The loss function is used for iteratively training the DNN network model, which is an optimization process aimed at minimizing the value of the loss function so that the training result value is as close to the label as possible, i.e. the objective function J can be expressed as:
J=min L(θ,x,y) (4)
in the process, the network parameter theta is optimized and updated, random gradient descent (SGD) is selected as an optimizer, and the optimization method can be expressed as follows:
θ t+1 =θ t -λ t g t (5)
where t denotes a certain iteration, λ t Is an optimization weight, commonly referred to as learning rate, g represents a random gradient (stochastic gradient) that is expected to be a gradient of f, i.e., satisfyThe DNN can be fully learned to the characteristics of the power network flow data by iterative training for a certain number of times by using the algorithm, so that accurate classification can be made when real-time power network flow data is faced.
When the DNN model was trained as described above, the number of epochs used was 300, the batch size was 128, and the learning rate was 0.02.
And 4, step 4: and adding a confidence coefficient item into the data sample of the positive case, and carrying out K-means cluster analysis to cluster a multi-classification result representing the specific type of the server.
By training DNN by using a deep learning method, a classification result indicating whether a certain flow sample belongs to a certain server can be obtained, but in order to further explore which server the flow sample belongs to, the invention adopts a K-means clustering algorithm to perform clustering operation on the classified server flow sample. K-means is an unsupervised machine learning clustering method, and the flow is as follows: first, a threshold is determined, and the classification result is classified, wherein a value higher than the threshold is considered as a server, and a value lower than the threshold is not. Then, find all the clothes classified as clothesAdding respective classification confidence degrees to the original flow sample of the server result to form new sample dataM is the total number of new samples.
Subsequently, the value K of the cluster is determined, so that K cluster centers { c (K) |1 ≦ K ≦ K } are determined, and random initialization is performed on the cluster centers. Next, the euclidean distance from each sample to the center of each cluster is calculated, as shown in equation 6:
wherein the content of the first and second substances,representing new sample data, c j Is the cluster center.
And sequentially comparing the distance from each sample to each clustering center, and then distributing the samples to the cluster of the nearest clustering center to obtain K clusters { s (K) |1 ≦ K }.
After the class clusters are obtained, the position of a clustering center is updated through the class clusters by a K-means algorithm, and the new clustering center is the mean value of each sample in the class clusters on each dimension, namely:
wherein the content of the first and second substances,is the updated k-th cluster center. The algorithm is repeated for a plurality of times until convergence, and samples belonging to the server can be divided into K types.
The optimal K value of the K-means clustering algorithm can be determined through experiments, and the most common determination method is the elbow method. First, the Sum of squares of residuals (SSE) from the samples in the cluster to the cluster center is calculated, which is a commonly used index for measuring the classification quality of the samples in the cluster, as shown in equation 8:
in which p is a cluster of species s i A sample of (2), c i Is the corresponding cluster center. A smaller value of SSE indicates a better quality of classification of the samples in the cluster.
As the number of clusters K increases, the sample partitioning becomes finer and the SSE value for each cluster should be correspondingly smaller. Moreover, when K is smaller than the optimal cluster number, the decrease of SSE is large because the increase of K greatly increases the classification quality of each cluster, and when K reaches the optimal cluster number, the classification quality return obtained by increasing K is rapidly reduced, so the decrease of SSE tends to be gentle. The method for determining the optimal K value sequentially takes the K value as 1,2,3 \8230, then uses the K value as an independent variable and the average SSE as a dependent variable to carry out mapping, finds the inflection point of the image slope from the rapid reduction to the gentle reduction, and the K value of the point is the optimal K value.
For the DNN classification model, the commonly used indicators are accuracy (precision) and recall (recall), and their formulas are as follows:
wherein tp represents the number of true positive examples, namely the number of samples which are actually positive examples and are predicted to be positive examples; fp represents the number of false positive examples, namely the number of samples which are actually negative examples but are predicted to be positive examples; the direction represents the number of false negative examples, i.e. the number of samples that are actually positive examples but predicted to be negative examples. The performance of the model can be measured from two dimensions using accuracy and recall indicators. In general, different threshold values δ are taken as sample points in an experiment, an accuracy-Recall curve (P-R) is drawn, and the superiority and inferiority of a model are determined through the height of an equilibrium point of the curve.
For the evaluation of the K-means clustering algorithm, the evaluation index can be determined by using a contour Coefficient method (Silhouette Coefficient) in addition to the above-mentioned residual sum of squares SSE, and for a certain sample x, the formula of the contour Coefficient method is as follows:
where a (x) is called intra-cluster dissimilarity, which is the sum of the distances of sample x from other samples in the cluster; b (x) is the dissimilarity between clusters, which is the sum of the distances between the sample x and other samples in other clusters. The value range of SC is [ -1,1], and the closer to 1, the better the clustering performance is.
The present invention uses network data acquired in the actual power system network to validate the proposed method. The method selects the power network flow data of 25 days in total, and the time stamps of the data are divided according to hours. Firstly, a known intranet server IP and an IP which is known not to belong to a server are screened, 2257429 pieces of original data are selected in total, and after the data are subjected to operations such as marking, integration and normalization, 162952 pieces of training data samples are obtained to form a training set. Then, preprocessing operations except for marking are carried out on the rest 2784335 unknown data to obtain 6988459 test data samples, and a test set is formed.
The method uses a training set to construct a model, predicts a test set by using the model, and divides the test set according to a threshold value delta =0.5 to obtain 591 suspected IPs. Because the test set is unknown non-label data, manual verification is carried out, and some of the IPs are found to belong to a real service server, and some are class server devices, such as monitoring devices, IAD devices, soft switch devices, and the like. The specific experimental effects are shown in table 1:
TABLE 1 test set Classification results
In table 1, the occupation ratio refers to the proportion of a certain type of device in all suspected IPs, and the other types of devices refer to non-grid internal devices. Most suspected IPs are unreported devices, accounting for 70.38%, accounting for 25.92% of service servers, and the average confidence is higher, reaching 97.59%, which indicates that the model has the capability of more accurately discovering suspected servers.
In cluster analysis of data, it is found that there are also different classifications between the service servers. Of the suspected IPs found, 66.67% of the traffic servers were clustered into one class, and the remaining 4 classes of the cluster also included different traffic servers. In the class server device, the cluster analysis is successful in distinguishing various devices, and the specific cluster analysis result is shown in table 2.
TABLE 2 test set Cluster analysis results
The ratio in table 2 represents the proportion of devices clustered into this class to the total number of such devices. It can be seen that, in the case of class server devices, most of the various devices are clustered into the same class, which indicates that the clustering analysis has a certain distinguishing effect on the device types.
The above embodiments are merely illustrative of the technical concepts and features of the present invention, and the purpose of the embodiments is to enable those skilled in the art to understand the contents of the present invention and implement the present invention, and not to limit the protection scope of the present invention. All equivalent changes and modifications made according to the spirit of the present invention should be covered within the protection scope of the present invention.
Claims (8)
1. A DNN and K-means based power system network flow identification method is characterized by comprising the following steps:
step 1: screening original data, and selecting data items which can provide more information for classification to form data samples, wherein the data samples comprise server IP addresses, ports, protocols, client IP addresses, byte numbers, bit rates, packet numbers, session total numbers and time;
step 2: performing integration operation and normalization operation on the sample data to perform preprocessing;
and step 3: iteratively training a DNN network model, using a preprocessed training set to train the DNN network model, and performing preliminary classification on the network flow of the power system to obtain a classification confidence coefficient and positive and negative example results;
and 4, step 4: and classifying the samples which are judged to be suspected servers after the DNN network model is processed by using a K-means algorithm.
2. The method for identifying network traffic of electric power system based on DNN and K-means as claimed in claim 1, wherein in step 1, some IP of electric power system network are known as server of electric power system, and some IP determined not to be server of electric power system, and sample data corresponding to said IP are extracted as positive and negative examples of training set respectively.
3. The DNN and K-means based power system network traffic identification method according to claim 1, wherein the integrating operation and the normalizing operation in the step 2 specifically comprise:
and integrating the sample data of the same server IP, the same port, the same protocol and the same time period into one piece of data by using four-tuple of the server IP, the port, the protocol and the time as an index, and then performing max-min normalization processing on each sample data to map the sample value to a [0,1] interval.
4. The DNN and K-means based power system network traffic identification method according to claim 1, wherein the DNN network model structure in the step 3 is: the method comprises the steps that an input layer inputs an n-dimensional data sample, data characteristics are output to an output layer through three full-connection layers of a hidden layer, the output layer outputs a logic value and becomes a predicted confidence value through a Sigmoid activation function, the three full-connection layers use a nonlinear activation function Relu to input real-time test data into a network to obtain a predicted result confidence, and then a result probability value output by a DNN network model is judged to obtain a positive case and a negative case.
5. The DNN and K-means based power system network traffic identification method of claim 4, wherein the number of neurons of the three fully connected layers of the hidden layer is 512, 256 and 128.
6. The DNN and K-means based power system network traffic identification method according to claim 4 or 5, wherein the preliminary classification of the power system network traffic in the step 3 to obtain the classification confidence and the positive and negative case results comprises the following specific operations:
step 3.1: acquiring data, namely acquiring a group of power system network flow training data { x (N), y (N) |1 is not less than N and not more than N }, wherein x is a network flow sample and comprises statistical information such as the total packet number, the packet number per second, the total byte number, the byte number per second and the like of an IP (Internet protocol) of a certain end; y is a sample label, is a manually labeled value, and represents whether the training data is a server; n is the total number of training data;
step 3.2: for a DNN network output function f, inputting data into the network to obtain a classification result:
wherein, the first and the second end of the pipe are connected with each other,the confidence coefficient of the classification result is, and theta is a network parameter of the DNN network model;
step 3.3, iterative training is carried out on the DNN network model, and a binary cross entropy function is used as a training loss function:
step 3.4: and optimizing and updating the network parameter theta, and selecting random gradient descent as an optimizer.
7. The DNN and K-means-based power system network traffic identification method of claim 1, wherein the K-means algorithm used for classification in the step 4 comprises the following specific steps:
step 4.1: determining a threshold value, dividing the classification result, considering the classification result as a server if the classification result is higher than the threshold value, and finding out all original flow samples classified as the result of the server if the classification result is lower than the threshold value, adding respective classification confidence degrees into the original flow samples to form new sample dataM is the total number of new samples;
step 4.2: determining the value K of the cluster, thereby determining K cluster centers { c (K) |1 ≦ K ≦ K }, and performing random initialization on the cluster centers; calculating the Euclidean distance from each sample to each clustering center, sequentially comparing the distance from each sample to each clustering center, and then distributing the samples to the cluster of the clustering center closest to the sample to obtain K clusters { s (K) |1 is less than or equal to K };
step 4.3: after the class clusters are obtained, the position of a clustering center is updated through the class clusters by a K-means algorithm, and the new clustering center is the mean value of each sample in the class clusters on each dimension.
8. The DNN and K-means-based power system network flow identification method of claim 7, wherein the specific method for determining the K value of the cluster in the step 4.2 is to calculate the residual Sum of Squares (SSE) from the samples in the cluster to the center of the cluster, sequentially take the K value of 1,2,3 \8230, then use the K value as an independent variable and the average SSE as a dependent variable to map, find the inflection point where the image slope is rapidly reduced to be gently reduced, and the K value of the point is the optimal K value.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2022104962630 | 2022-05-07 | ||
CN202210496263 | 2022-05-07 |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115348063A true CN115348063A (en) | 2022-11-15 |
Family
ID=83949752
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210882066.2A Pending CN115348063A (en) | 2022-05-07 | 2022-07-25 | DNN and K-means-based power system network flow identification method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115348063A (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111209563A (en) * | 2019-12-27 | 2020-05-29 | 北京邮电大学 | Network intrusion detection method and system |
CN111404911A (en) * | 2020-03-11 | 2020-07-10 | 国网新疆电力有限公司电力科学研究院 | Network attack detection method and device and electronic equipment |
CN111565156A (en) * | 2020-04-27 | 2020-08-21 | 南京烽火星空通信发展有限公司 | Method for identifying and classifying network traffic |
CN111832647A (en) * | 2020-07-10 | 2020-10-27 | 上海交通大学 | Abnormal flow detection system and method |
CN113343587A (en) * | 2021-07-01 | 2021-09-03 | 国网湖南省电力有限公司 | Flow abnormity detection method for electric power industrial control network |
CN114372536A (en) * | 2022-01-13 | 2022-04-19 | 中国人民解放军国防科技大学 | Unknown network flow data identification method and device, computer equipment and storage medium |
-
2022
- 2022-07-25 CN CN202210882066.2A patent/CN115348063A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111209563A (en) * | 2019-12-27 | 2020-05-29 | 北京邮电大学 | Network intrusion detection method and system |
CN111404911A (en) * | 2020-03-11 | 2020-07-10 | 国网新疆电力有限公司电力科学研究院 | Network attack detection method and device and electronic equipment |
CN111565156A (en) * | 2020-04-27 | 2020-08-21 | 南京烽火星空通信发展有限公司 | Method for identifying and classifying network traffic |
CN111832647A (en) * | 2020-07-10 | 2020-10-27 | 上海交通大学 | Abnormal flow detection system and method |
CN113343587A (en) * | 2021-07-01 | 2021-09-03 | 国网湖南省电力有限公司 | Flow abnormity detection method for electric power industrial control network |
CN114372536A (en) * | 2022-01-13 | 2022-04-19 | 中国人民解放军国防科技大学 | Unknown network flow data identification method and device, computer equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104601565B (en) | A kind of network invasion monitoring sorting technique of intelligent optimization rule | |
CN110309302B (en) | Unbalanced text classification method and system combining SVM and semi-supervised clustering | |
CN111211994B (en) | Network traffic classification method based on SOM and K-means fusion algorithm | |
CN111786951B (en) | Traffic data feature extraction method, malicious traffic identification method and network system | |
CN107579846B (en) | Cloud computing fault data detection method and system | |
CN111507385B (en) | Extensible network attack behavior classification method | |
CN112488226B (en) | Terminal abnormal behavior identification method based on machine learning algorithm | |
CN111343171B (en) | Intrusion detection method based on mixed feature selection of support vector machine | |
CN115811440B (en) | Real-time flow detection method based on network situation awareness | |
CN113269647A (en) | Graph-based transaction abnormity associated user detection method | |
CN116805051A (en) | Double convolution dynamic domain adaptive equipment fault diagnosis method based on attention mechanism | |
CN115801374A (en) | Network intrusion data classification method and device, electronic equipment and storage medium | |
CN117478390A (en) | Network intrusion detection method based on improved density peak clustering algorithm | |
CN114124437B (en) | Encrypted flow identification method based on prototype convolutional network | |
CN115348063A (en) | DNN and K-means-based power system network flow identification method | |
CN116541792A (en) | Method for carrying out group partner identification based on graph neural network node classification | |
CN113609480B (en) | Multipath learning intrusion detection method based on large-scale network flow | |
CN115879030A (en) | Network attack classification method and system for power distribution network | |
CN108540474B (en) | Computer network defense decision-making system | |
CN113010673A (en) | Vulnerability automatic classification method based on entropy optimization support vector machine | |
Lu et al. | An Alert Aggregation Algorithm Based on K-means and Genetic Algorithm | |
Luo et al. | Network attack classification and recognition using hmm and improved evidence theory | |
CN115208703B (en) | Industrial control equipment intrusion detection method and system of fragment parallelization mechanism | |
CN116192765B (en) | Attention mechanism-based early identification method for flow of Internet of things equipment | |
CN115580472B (en) | Industrial control network attack flow classification method based on heuristic clustering algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |