CN106375156A - Power network traffic anomaly detection method and device - Google Patents

Power network traffic anomaly detection method and device Download PDF

Info

Publication number
CN106375156A
CN106375156A CN201610874427.3A CN201610874427A CN106375156A CN 106375156 A CN106375156 A CN 106375156A CN 201610874427 A CN201610874427 A CN 201610874427A CN 106375156 A CN106375156 A CN 106375156A
Authority
CN
China
Prior art keywords
distance
data
data traffic
packet
traffic packet
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610874427.3A
Other languages
Chinese (zh)
Inventor
邢宁哲
纪雨彤
赵庆凯
张宁池
刘识
王宇
段寒硕
闫中平
马跃
彭柏
聂正璞
李信
申昉
叶青
田宇
常海娇
徐鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
State Grid Beijing Electric Power Co Ltd
Information and Telecommunication Branch of State Grid Jibei Electric Power Co Ltd
Original Assignee
State Grid Corp of China SGCC
State Grid Beijing Electric Power Co Ltd
Information and Telecommunication Branch of State Grid Jibei Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, State Grid Beijing Electric Power Co Ltd, Information and Telecommunication Branch of State Grid Jibei Electric Power Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN201610874427.3A priority Critical patent/CN106375156A/en
Publication of CN106375156A publication Critical patent/CN106375156A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/12Network monitoring probes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/50Testing arrangements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/2441Traffic characterised by specific attributes, e.g. priority or QoS relying on flow classification, e.g. using integrated services [IntServ]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/2483Traffic characterised by specific attributes, e.g. priority or QoS involving identification of individual flows

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention provides a power network traffic anomaly detection method and device. The power network traffic anomaly detection method comprises the steps of collecting a data traffic packet of a power network, wherein the data traffic packet is composed of data of multiple fields; and establishing a k-d tree based on the data traffic packet, and carrying out anomaly detection on the data traffic packet. According to the method and the device, specific classification does not need to be carried out on power network traffic when detection is carried out; the detection difficulty is reduced; the method and device can be well adapted to various newly occurring anomalies; and after improvement is carried out through utilization of the k-d tree, the detection time complexity is reduced, and the time cost is clearly reduced.

Description

Method and device for detecting abnormal flow of power network
Technical Field
The present application relates to a technology for detecting abnormal traffic in an electrical data network, and in particular, to a method and an apparatus for detecting abnormal traffic in an electrical data network.
Background
With the construction of the smart grid, the power data network and the service system carried by the power data network are rapidly developed, and a large amount of network flow is generated every day. Some abnormal flows can occur in network flows, and the abnormal flows are mixed in normal flows, so that the network is greatly damaged, the network service quality is sharply reduced, and the problem of the power data network with extremely high reliability requirements is very serious. Therefore, detecting abnormal traffic is an important aspect of the operation and maintenance work of the power data network.
Several related traffic anomaly detection schemes are introduced below;
scheme 1: a community concept is introduced into the anomaly detection field aiming at the problems of mass data processing and low anomaly detection rate of a large-scale network in a community flow anomaly detection based on wavelet decomposition (electronic measurement and instrument report, 2010, pp.24(4): 365-.
Scheme 2: an article, "network abnormal traffic detection method based on active entropy" (communication science report, 2013, pp.34(z2):51-57.) a network traffic analysis method based on entropy theory, which utilizes the long correlation characteristics of information units on traffic space to improve the entropy theory and provides various methods such as information entropy, conditional entropy and active entropy to detect traffic abnormality.
Scheme 3: patent No. CN201210560973.1 proposes an anomaly detection method based on network traffic analysis. The method mainly comprises the following steps: (1) firstly, data preprocessing is carried out: the method comprises the steps of obtaining host internet traffic, then carrying out data preprocessing on the host internet traffic according to an initial characteristic set and a preset time window length, and extracting initial characteristic values of the host internet traffic in each time interval to form a sample set. (2) Feature selection is then performed. (3) And finally, carrying out anomaly detection: and classifying the unknown samples by using the selected feature subset and a Bayesian classification algorithm, and prompting if the classification result is abnormal.
Scheme 4: patent number CN201010224404.0 proposes a method for detecting network traffic anomaly rapidly, the technical scheme utilizes Hurst index describing network traffic fractal characteristics to judge the occurrence of anomaly, and the main steps include: by sampling the latest flow data, iteratively solving the Hurst index by using the data, establishing an abnormality judgment threshold value through the change of the Hurst index, directly detecting flow abnormality and detecting network flow abnormality in real time.
Scheme 5: patent No. CN201510513055.7 proposes a network traffic anomaly detection method combining a dynamic baseline and a fixed threshold. The method mainly comprises the following steps: receiving a message; recording the number of the messages; calculating the current unit time quantity of the message according to the difference between the current message quantity and the historical message quantity before the preset historical time period; and judging whether the network flow is abnormal or not according to the unit time quantity by combining the dynamic baseline and the fixed threshold.
In the process of implementing the invention, the inventor finds that the prior art at least has the following problems:
although the anomaly detection method in the scheme 1 optimizes the detection target by using the community concept, in a specific detection stage, a relatively simple and fixed sliding deviation value is still used as a threshold value, the complexity and the variability of the anomaly flow characteristics are difficult to adapt, and the method has a relatively high omission factor and a relatively high false detection rate.
The anomaly detection method in the above-mentioned scheme 2 unifies all traffic, does not consider traffic distribution states at different time intervals, does not distinguish between peak time intervals and low peak time intervals, is difficult to satisfy both, and lacks adaptivity. The entropy-based method has a poor effect of detecting the flow with large distribution state change.
The anomaly detection method of the scheme 3 utilizes the idea of data mining to carry out anomaly detection on network traffic, but the selected Bayesian classification algorithm needs the support of prior probability, and the prior probability of unknown anomalies cannot be researched, so that the method is not suitable for use; the Bayesian model assumes that the attributes are independent of each other, but the practical application is difficult to realize, so that the final performance is different from the theory; and the Bayesian model has a certain classification decision error rate under the ideal condition in time.
The anomaly detection method in the scheme 4 utilizes the self-similarity and long correlation of network flow, uses the Hurst index to detect flow anomaly, has relatively simple threshold judgment, achieves the aim of rapidness, but simultaneously sacrifices the accuracy rate, and is difficult to apply to a power data network.
The anomaly detection method of the scheme 5 combines the dynamic baseline with the fixed threshold, improves the adaptability, but only analyzes the number of messages, neglects many key information, is difficult to find deeper anomalies, and cannot meet the requirement of the power data network on the reliability.
Therefore, a method for detecting network traffic anomaly is needed to solve the problems in the prior art, reduce the damage of traffic anomaly to the network, and reduce the degradation of network service quality.
Disclosure of Invention
The embodiment of the application provides a method and a device for detecting abnormal flow of a power network, which are used for reducing the detection difficulty and the detection time and are suitable for various new abnormal flow.
In order to achieve the above object, an embodiment of the present invention provides a method for detecting an abnormal flow of an electrical power network, where the method for detecting an abnormal flow of an electrical power network includes:
collecting a data traffic packet of a power network, wherein the data traffic packet is composed of data of a plurality of fields;
and establishing a k-d tree based on the data traffic packet, and carrying out anomaly detection on the data traffic packet.
In an embodiment, before building the k-d tree based on the data traffic packet, the method for detecting the abnormal traffic of the power network further includes: selecting data of at least one field related to the flow size of the data flow packet from the data flow packet as available data;
establishing a k-d tree based on the data traffic packet, and performing anomaly detection on the data traffic packet, wherein the anomaly detection comprises the following steps: and establishing a k-d tree based on a data traffic packet at least comprising the available data, and carrying out anomaly detection on the data traffic packet.
In one embodiment, collecting data traffic packets of a power network includes: and collecting the data traffic packet from a router or a switch through a probe, and storing the data traffic packet.
In an embodiment, the creating a k-d tree based on the data traffic packet after selecting the available data, and performing anomaly detection on the data traffic packet includes:
establishing a k-d tree by taking each data flow packet as an object, and calculating a local abnormal factor of each object;
and comparing the local abnormal factor of each object with a preset value, and detecting whether the data traffic packet corresponding to the object is abnormal or not.
In one embodiment, calculating the local anomaly factor for each of the objects comprises:
calculating a k-distance for each object;
calculating a corresponding k-distance neighborhood according to the k-distance of each object;
calculating the reachable distance of each object from objects in its k-distance neighborhood;
calculating corresponding local reachable densities according to the reachable distance of each object from objects in the k-distance neighborhood of the object;
and calculating a corresponding local abnormal factor according to the local reachable density of each object.
In one embodiment, for an object p, the k-distance neighborhood of the object p is a set of objects that are no more than k-distance from the object p, and the k-distance neighborhood N of the object pk-dis(p) is:
Nk-dis(p)={q|d(p,q)≤k-dis(p)}
wherein q is an object whose distance from the object p does not exceed the k-distance of the object, d (p, q) is the distance from the object p to the object q, and k-dis (p) is the k-distance of the object p.
In one embodiment, the reachable distance r-dis of the object p relative to the object o in its neighborhoodk(p, o) is:
r-disk(p,o)=max{k-dis(o),d(p,o)}
where k-dis (o) is the k-distance of object o and d (p, o) is the distance of object p from object o.
In one embodiment, the local achievable density lrd of object pk-dis(p) is the inverse of the average reachable distance of object p from its k-distance neighborhood:
lrd k - d i s ( p ) = 1 Σ o ∈ N k - d i s ( p ) r - d i s k ( p , o ) | N k - d i s ( p ) | .
in one embodiment, the local anomaly factor lof (p) of the subject p is:
L O F ( p ) = Σ o ∈ N k - d i s ( p ) lrd k - d i s ( o ) lrd k - d i s ( p ) | N k - d i s ( p ) | .
in order to achieve the above object, an embodiment of the present invention further provides a device for detecting an abnormal flow in an electrical network, where the device for detecting an abnormal flow in an electrical network includes:
the system comprises a flow packet acquisition unit, a flow packet processing unit and a flow packet processing unit, wherein the flow packet acquisition unit is used for acquiring a data flow packet of a power network, and the data flow packet consists of data of a plurality of fields;
and the anomaly detection unit is used for establishing a k-d tree based on the data traffic packet and carrying out anomaly detection on the data traffic packet.
In one embodiment, the apparatus for detecting abnormal flow in an electrical power network further includes: a field selection unit: the data of at least one field relevant to the flow size of the data flow packet is selected from the data flow packet to be used as available data;
the abnormality detection unit is specifically configured to: and establishing a k-d tree based on a data traffic packet at least comprising the available data, and carrying out anomaly detection on the data traffic packet.
In one embodiment, the traffic packet acquisition unit is specifically configured to: and collecting the data traffic packet from a router or a switch through a probe, and storing the data traffic packet.
In one embodiment, the abnormality detection unit includes:
the local abnormal factor calculation module is used for establishing a k-d tree by taking each data flow packet as an object and calculating a local abnormal factor of each object;
and the anomaly detection module is used for comparing the local anomaly factor of each object with a preset value and detecting whether the data traffic packet corresponding to the object is abnormal or not.
In one embodiment, the local anomaly factor calculation module includes:
a k-distance calculation submodule for calculating a k-distance of each object;
the k-distance neighborhood submodule is used for calculating a corresponding k-distance neighborhood according to the k-distance of each object;
an reachable distance computation sub-module for computing the reachable distance of each object from objects in its k-distance neighborhood;
the local reachable density calculation submodule is used for calculating corresponding local reachable density according to the reachable distance between each object and the object in the k-distance neighborhood of the object;
and the local abnormal factor calculation submodule is used for calculating a corresponding local abnormal factor according to the local reachable density of each object.
The method does not need to specifically classify the flow of the power network during detection, reduces the detection difficulty, and has good adaptability to various newly-appeared abnormalities; after the k-d tree is used for improvement, the time complexity of detection is reduced, and the time cost is obviously reduced.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart of a method for detecting abnormal flow in an electrical power network according to an embodiment of the present disclosure;
fig. 2 is a schematic diagram of an architecture of a method for detecting abnormal traffic in an electrical network according to an embodiment of the present invention;
FIG. 3 is a flowchart of a method for performing anomaly detection on a data traffic packet according to an embodiment of the present invention;
FIG. 4 is a flowchart of a method for calculating local anomaly factors for each of the objects according to an embodiment of the present invention;
FIG. 5 is a diagram illustrating a detection result according to an embodiment of the present invention;
FIG. 6 is a diagram illustrating a detection result according to another embodiment of the present invention;
FIG. 7 is a diagram illustrating a detection result according to another embodiment of the present invention;
FIG. 8 is a comparison of k taken at different values according to the present invention;
fig. 9A is a block diagram of a power network traffic anomaly detection device according to an embodiment of the present invention;
fig. 9B is a block diagram of a power network traffic anomaly detection device according to another embodiment of the present invention;
FIG. 10 is a block diagram of an anomaly detection unit according to an embodiment of the present invention;
fig. 11 is a block diagram of a local abnormal factor calculation module according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Fig. 1 is a method for detecting abnormal traffic in an electrical network according to an embodiment of the present invention, and as shown in fig. 1, the method for detecting abnormal traffic in an electrical network includes:
s101: collecting a data traffic packet of a power network, wherein the data traffic packet is composed of data of a plurality of fields;
s102: and establishing a k-d tree based on the data traffic packet, and carrying out anomaly detection on the data traffic packet.
The execution main body of the power network flow anomaly detection method can be a server, and as can be seen from the flow shown in fig. 1, the method firstly collects data flow packets of the power network, establishes a k-d tree based on the collected data flow packets, performs anomaly detection on the data flow packets, and detects the abnormal data flow packets. The method does not need to specifically classify the flow of the power network, reduces the detection difficulty and has good adaptability to various newly-appeared abnormalities; after the k-d tree is used for improvement, the time complexity of detection is reduced, and the time cost is obviously reduced.
Fig. 2 is a schematic diagram of an architecture of a method for detecting abnormal traffic in an electrical network according to an embodiment of the present invention, and as shown in fig. 2, when a data traffic packet of the electrical network is collected, the data traffic packet may be collected from a router or a switch (the number of the router and the switch may be multiple) through a probe, and the collected data traffic packet is sent to a database (which may be in a server) for storage.
In an embodiment, after acquiring a data traffic packet of the power network, the data traffic packet may be preprocessed as follows, and data of at least one field related to a traffic size of the data traffic packet is selected from the data traffic packet as available data. For example, the data traffic packet includes a fields and data related to the traffic size of the data traffic packet, and data of b fields may be selected, where b is greater than or equal to 1 and less than or equal to a.
After the available data is obtained through preprocessing, a k-d tree can be established based on a data traffic packet at least comprising the available data, and the data traffic packet is subjected to anomaly detection. By preprocessing the data traffic packet, the complexity of establishing the k-d tree can be reduced, and the detection efficiency is improved.
Based on the data traffic packet collected at S101 shown in fig. 1, or based on the data traffic packet at least including the available data, a k-d tree may be established, and the data traffic packet may be subjected to anomaly detection.
In one embodiment, as shown in fig. 3, the performing anomaly detection on the data traffic packet includes:
s301: establishing a k-d tree by taking each data flow packet as an object, and calculating a local abnormal factor of each object;
s302: and comparing the local abnormal factor of each object with a preset value, and detecting whether the data traffic packet corresponding to the object is abnormal or not. The preset value can be set according to specific detection conditions.
As shown in fig. 3, in the present embodiment, the anomaly detection performed on the data traffic packet may be referred to as Local anomaly Factor (LOF) detection, as shown in fig. 2.
In one embodiment, as shown in fig. 4, calculating the local anomaly factor of each of the objects includes:
s401: calculating a k-distance for each object;
a k-d tree is a data structure that partitions a k-dimensional data space, with k meaning the kth nearest. The k-d tree can quickly find out the k-th adjacent point, the k-distance can be conveniently calculated in the next step, and meanwhile, the k-adjacent points are recorded in the calculation process, so that the k-distance neighborhood needed subsequently can be conveniently obtained.
The k-d tree is essentially a binary tree, each node representing a spatial range, and represents a data traffic packet of the power data network in the present invention. The k-d tree is built as a recursive process with split step-by-step expansion. One node is a split point (split _ point) and can be split into a left son (left _ son) and a right son (right _ son), namely the split point is a father node of a binary tree, the left son and the right son are respectively a left child node and a right child node of the binary tree, and the split mode (split-method) of the split point is a key attribute for establishing a k-d tree.
The splitting process of the k-d tree is as follows: firstly, calculating the variance of each dimension (namely field), finding out the dimension a with the largest variance, sequencing all nodes from small to large on the dimension a, setting the middle value point as split _ point, recursing all points smaller than the middle value to obtain left _ son, and recursing all nodes larger than the middle value to obtain right _ son. In the present embodiment, the variance is taken as a basis for the determination, because a large variance indicates that data in the coordinate axis direction is more scattered, and data division in this direction has better resolution.
When calculating the k-distance, the embodiment of the invention abstracts a data traffic packet into an object p. For an arbitrary natural number k, defining k-distance k-dis (p) of an object p as the distance between the object p and an object o, where the object o needs to satisfy:
there are at least k objects o ' ∈ D { p } (D is a set comprising p) such that the distance D (p, o ') of object p to object o ' and the distance D (p, o) of object p to object o satisfy: d (p, o') ≦ D (p, o), and at most k-1 objects q ∈ D \ p } (D \ p } indicates that the set D does not include p), such that D (p, q) < D (p, o), where D (p, q) is the distance of object p from object q.
According to the established k-d tree, the nearest neighbor of a certain object can be easily inquired, when the kth nearest neighbor is inquired, an array (the first (k-1) nearest distances) can be used for recording whether a node can be used for updating the kth nearest distance, and the k-distance can be obtained after the kth nearest neighbor is inquired.
S402: a corresponding k-distance neighborhood is computed from the k-distance of each object.
In one embodiment, for an object p (which may correspond to any data traffic packet acquired), the k-distance neighborhood of the object p is a set of objects that are no more than k-distance from the object p, and the k-distance neighborhood N of the object pk-dis(p) is:
Nk-dis(p)={q|d(p,q)≤k-dis(p)} (1)
where q is an object whose distance from the object p does not exceed the k-distance of the object, and k-dis (p) is the k-distance of the object p.
S403: the reachable distance of each object from objects within its k-distance neighborhood is calculated.
In one embodiment, given a natural number k, the reachable distance r-dis of object p relative to object o in its distance neighborhoodk(p, o) is:
r-disk(p,o)=max{k-dis(o),d(p,o)} (2)
where k-dis (o) is the k-distance of object o and d (p, o) is the distance of object p from object o.
S404: the corresponding local reachable density is calculated from the reachable distance of each object from objects in its k-distance neighborhood.
In one embodiment, the local achievable density lrd of object pk-dis(p) is the inverse of the average reachable distance of object p from its k-distance neighborhood:
lrd k - d i s ( p ) = 1 &Sigma; o &Element; N k - d i s ( p ) r - d i s k ( p , o ) | N k - d i s ( p ) | - - - ( 3 )
wherein,is the average reachable distance of object p from its k-distance neighborhood.
S405: and calculating a corresponding local abnormal factor according to the local reachable density of each object.
In one embodiment, the local anomaly factor lof (p) of the subject p is:
L O F ( p ) = &Sigma; o &Element; N k - d i s ( p ) lrd k - d i s ( o ) lrd k - d i s ( p ) | N k - d i s ( p ) | - - - ( 4 )
in the formula (4), lrdk-dis(o) is the local achievable density of object o.
The abnormal degree of the object p can be represented by a local abnormal factor, and the abnormal factor is close to a point of 1, which indicates that the density of the abnormal factor is consistent with that of the surrounding points and can be judged to be normal; the larger the local anomaly factor is, the larger the difference between the local anomaly factor and the density of the surrounding points is, when a certain threshold value is exceeded, the local anomaly factor becomes an anomaly point, and the setting of the threshold value can be set according to experience or different application fields, and the invention is not limited to this.
By using the method for detecting the abnormal flow of the power network, the flow of the power network does not need to be specifically classified during detection, the detection difficulty is reduced, and the method has good adaptability to various newly-appeared abnormalities; after the k-d tree is used for improvement, the time complexity of detection is reduced, and the time cost is obviously reduced.
For a better understanding of the present invention, the following description is given in conjunction with specific examples:
taking the detection of the continuous data traffic packet in 2016 and 3 months of a certain power company data network as an example, the specific detection steps are as follows:
1) acquiring an original data traffic packet by using traffic acquisition equipment (such as a probe) arranged on a network node of the power data network through a bypass to obtain data with 25 fields in total;
2) data preprocessing is performed on the acquired data traffic packet to obtain available data with 4 fields in total, as shown in table 1 below:
table 1 fields obtained finally
PacketsIn PacketsOut BytesIn BytesOut
In table 1, PacketsIn is the number of downloaded packets, PacketsOut is the number of uploaded packets, BytesIn is the number of downloaded bytes, and BytesOut is the number of uploaded bytes.
The data of the 4 fields of table 1 is a field related to the traffic size of the data traffic packet, and is called available data. The present invention can select at least one field of available data from the four fields to build a k-d tree, and in this embodiment, the k-d tree is built only by selecting 4 fields of available data.
3) Establishing a k-d tree by taking each data traffic packet (including available data of 4 fields) as an object;
4) calculating k-distance and k-distance neighborhoods for each object;
5) calculating the reachable distance of each object from objects in its k-distance neighborhood;
6) calculating the local reachable density and the local abnormal factor of each object;
7) and establishing a threshold according to the calculation result, comparing the local abnormal factor with the threshold, and if the calculation result is greater than the threshold, judging that the data traffic packet corresponding to the local abnormal factor is abnormal.
It should be noted that step 2) is an optional step, and when the present invention is implemented, the step may be removed, and step 3) may be directly performed, and each acquired data traffic packet is used as an object to establish a k-d tree.
Example 1
400 continuous data traffic packets are randomly selected, and the detection result is shown in fig. 5, wherein abnormal traffic packets are selected when the LOF value exceeds 3.2.
Example 2
3000 consecutive data traffic packets are randomly selected, and the detection result is shown in fig. 6, wherein abnormal traffic packets are selected when the LOF value exceeds 2.7.
Example 3
10000 continuous data traffic packets are randomly selected, and the detection result is shown in fig. 7, wherein, the packets with LOF values exceeding 2.5 are abnormal traffic packets.
The invention is compared and analyzed with the prior art as follows:
when calculating k-distance, if the existing enumeration traversal method is adopted, all objects will be traversed when calculating nearest neighbors, and the time complexity when calculating LOF value of a single object will be at least o (n). The invention uses the data structure of the k-d tree for optimization, and because the binary tree is established, in the query process, although the distribution of specific samples is unknown, the query process can be directly and rapidly searched from the preferred child node, and each child node can also be required to be queried for many timesTree, uncertainty exists, but can reduce temporal complexity toEven when the efficiency is the worst, the time cost can be reduced, and the method is suitable for the power data network with high requirement on timeliness. Meanwhile, the uncertainty is only specific to time consumption, and the reliability of the power data network is not influenced.
In the experiment, k values of 5, 10, 15 and 20 are respectively taken, the same samples are analyzed, and a time comparison graph is obtained and shown in FIG. 8.
Based on the same inventive concept as the above-mentioned method for detecting the abnormal flow of the power network, the present application provides a method for detecting the abnormal flow of the power network, as described in the following embodiments. Because the principle of solving the problem of the power network flow anomaly detection device is similar to that of the power network flow anomaly detection method, the implementation of the power network flow anomaly detection device can refer to the implementation of the power network flow anomaly detection method, and repeated parts are not described again.
Fig. 9A is a diagram of an embodiment of the present invention further providing an apparatus for detecting abnormal traffic in an electrical network, where the apparatus for detecting abnormal traffic in an electrical network includes: a traffic packet acquisition unit 901 and an anomaly detection unit 903.
The traffic packet collecting unit 901 is configured to collect a data traffic packet of the power network, where the data traffic packet is composed of data of multiple fields;
the anomaly detection unit 903 is configured to establish a k-d tree based on the data traffic packet, and perform anomaly detection on the data traffic packet.
In an embodiment, the traffic packet collecting unit 901 is specifically configured to collect the data traffic packet from a router or a switch through a probe, and store the data traffic packet.
In an embodiment, as shown in fig. 9B, the apparatus for detecting abnormal flow in an electrical power network further includes: a field selecting unit 902, configured to select, from the data traffic packet, data of at least one field related to the traffic size of the data traffic packet as available data. The anomaly detection unit 902 may establish a k-d tree based on a data traffic packet including at least the available data, and perform anomaly detection on the data traffic packet.
In one embodiment, as shown in fig. 10, the anomaly detection unit 903 comprises:
a local abnormal factor calculation module 1001, configured to establish a k-d tree with each data traffic packet as an object, and calculate a local abnormal factor of each object.
The anomaly detection module 1002 is configured to compare the local anomaly factor of each object with a preset value, and detect whether a data traffic packet corresponding to the object is abnormal.
In one embodiment, as shown in fig. 11, the local abnormal factor calculation module 1001 includes: a k-distance calculation submodule 1101, a k-distance neighborhood submodule 1102, an reachable distance calculation submodule 1103, a local reachable density calculation submodule 1104, and a local anomaly factor calculation submodule 1105.
The k-distance calculation sub-module 1101 is used to calculate the k-distance of each object.
A k-d tree is a data structure that partitions a k-dimensional data space, with k meaning the kth nearest. The k-d tree can quickly find out the k-th adjacent point, the k-distance can be conveniently calculated in the next step, and meanwhile, the k-adjacent points are recorded in the calculation process, so that the k-distance neighborhood needed subsequently can be conveniently obtained.
The k-d tree is essentially a binary tree, each node representing a spatial range, and represents a data traffic packet of the power data network in the present invention. The k-d tree is built as a recursive process with split step-by-step expansion. One node is a split point (split _ point) and can be split into a left son (left _ son) and a right son (right _ son), namely the split point is a father node of a binary tree, the left son and the right son are respectively a left child node and a right child node of the binary tree, and the split mode (split-method) of the split point is a key attribute for establishing a k-d tree.
The splitting process of the k-d tree is as follows: firstly, calculating the variance of each dimension (namely field), finding out the dimension a with the largest variance, sequencing all nodes from small to large on the dimension a, setting the middle value point as split _ point, recursing all points smaller than the middle value to obtain left _ son, and recursing all nodes larger than the middle value to obtain right _ son. In the present embodiment, the variance is taken as a basis for the determination, because a large variance indicates that data in the coordinate axis direction is more scattered, and data division in this direction has better resolution.
When calculating the k-distance, the embodiment of the invention abstracts a data traffic packet into an object p. For an arbitrary natural number k, defining k-distance k-dis (p) of an object p as the distance between the object p and an object o, where the object o needs to satisfy:
there are at least k objects o ' ∈ D { p } (D is a set comprising p) such that the distance D (p, o ') of object p to object o ' and the distance D (p, o) of object p to object o satisfy: d (p, o') ≦ D (p, o), and at most k-1 objects q ∈ D \ p } (D \ p } indicates that the set D does not include p), such that D (p, q) < D (p, o), where D (p, q) is the distance of object p from object q.
According to the established k-d tree, the nearest neighbor of a certain object can be easily inquired, when the kth nearest neighbor is inquired, an array (the first (k-1) nearest distances) can be used for recording whether a node can be used for updating the kth nearest distance, and the k-distance can be obtained after the kth nearest neighbor is inquired.
The k-distance neighborhood sub-module 1102 is used to compute a corresponding k-distance neighborhood from the k-distance of each object.
In one embodiment, for an object p (which may correspond to any data traffic packet acquired), the k-distance neighborhood of the object p is a set of objects that are no more than k-distance from the object p, and the k-distance neighborhood N of the object pk-dis(p) is shown in formula (1).
The reachable distance computation sub-module 1103 is used to compute the reachable distance of each object from objects within its k-distance neighborhood.
In one embodiment, given a natural number k, the reachable distance r-dis of object p relative to object o in its distance neighborhoodk(p, o) is shown in equation (2).
The local reachable density calculation sub-module 1104 is used to calculate the corresponding local reachable density from the reachable distance of each object from objects in its k-distance neighborhood.
In one embodiment, the local achievable density lrd of object pk-dis(p) is the inverse of the average reachable distance of object p from its k-distance neighborhood, as shown in equation (3).
The local abnormal factor calculation submodule 1105 is configured to calculate a corresponding local abnormal factor according to the local reachable density of each object.
In one embodiment, the local abnormality factor lof (p) of the subject p is shown in equation (4).
By utilizing the power network flow abnormity detection device, the power network flow does not need to be specifically classified during detection, the detection difficulty is reduced, and the power network flow abnormity detection device has good adaptability to various newly-appeared abnormalities; after the k-d tree is used for improvement, the time complexity of detection is reduced, and the time cost is obviously reduced.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The principle and the implementation mode of the present application are explained by applying specific embodiments in the present application, and the description of the above embodiments is only used to help understanding the method and the core idea of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (14)

1. A method for detecting abnormal flow of a power network is characterized by comprising the following steps:
collecting a data traffic packet of a power network, wherein the data traffic packet is composed of data of a plurality of fields;
and establishing a k-d tree based on the data traffic packet, and carrying out anomaly detection on the data traffic packet.
2. The method according to claim 1, further comprising, before building a k-d tree based on the data traffic packet: selecting data of at least one field related to the flow size of the data flow packet from the data flow packet as available data;
establishing a k-d tree based on the data traffic packet, and performing anomaly detection on the data traffic packet, wherein the anomaly detection comprises the following steps: and establishing a k-d tree based on a data traffic packet at least comprising the available data, and carrying out anomaly detection on the data traffic packet.
3. The method according to claim 1, wherein the collecting data traffic packets of the power network comprises: and collecting the data traffic packet from a router or a switch through a probe, and storing the data traffic packet.
4. The method for detecting the abnormal traffic of the power network according to claim 1 or 2, wherein the establishing of the k-d tree based on the data traffic packet and the abnormal detection of the data traffic packet comprise:
establishing a k-d tree by taking each data flow packet as an object, and calculating a local abnormal factor of each object;
and comparing the local abnormal factor of each object with a preset value, and detecting whether the data traffic packet corresponding to the object is abnormal or not.
5. The method according to claim 4, wherein calculating the local anomaly factor for each of the objects comprises:
calculating a k-distance for each object;
calculating a corresponding k-distance neighborhood according to the k-distance of each object;
calculating the reachable distance of each object from objects in its k-distance neighborhood;
calculating corresponding local reachable densities according to the reachable distance of each object from objects in the k-distance neighborhood of the object;
and calculating a corresponding local abnormal factor according to the local reachable density of each object.
6. The method of claim 5, wherein for an object p, the k-distance neighborhood of the object p is a set of objects whose distances from the object p do not exceed the k-distance of the object, and the k-distance neighborhood N of the object pk-dis(p) is:
Nk-dis(p)={q|d(p,q)≤k-dis(p)}
wherein q is an object whose distance from the object p does not exceed the k-distance of the object, d (p, q) is the distance from the object p to the object q, and k-dis (p) is the k-distance of the object p.
7. Method for detecting anomalies in the traffic of an electrical power network, in accordance with claim 6, characterized in that the reachable distance r-dis of the object p with respect to the object o in its k-distance neighborhoodk(p, o) is:
r-disk(p,o)=max{k-dis(o),d(p,o)}
where k-dis (o) is the k-distance of object o and d (p, o) is the distance of object p from object o.
8. The method of claim 7, wherein the local reachable density lrd of object p isk-dis(p) is the inverse of the average reachable distance of object p from its k-distance neighborhood:
lrd k - d i s ( p ) = 1 &Sigma; o &Element; N k - d i s ( p ) r - dis k ( p , o ) | N k - d i s ( p ) | .
9. the method according to claim 8, wherein the local anomaly factor lof (p) of the object p is:
L O F ( p ) = &Sigma; o &Element; N k - d i s ( p ) lrd k - d i s ( o ) lrd k - d i s ( p ) | N k - d i s ( p ) | .
10. an apparatus for detecting an abnormality in a flow of an electric power network, comprising:
the system comprises a flow packet acquisition unit, a flow packet processing unit and a flow packet processing unit, wherein the flow packet acquisition unit is used for acquiring a data flow packet of a power network, and the data flow packet consists of data of a plurality of fields;
and the anomaly detection unit is used for establishing a k-d tree for the data traffic packet and carrying out anomaly detection on the data traffic packet.
11. The power network traffic anomaly detection device according to claim 10, further comprising: a field selecting unit, configured to select, from the data traffic packet, data of at least one field related to a traffic size of the data traffic packet as available data;
the abnormality detection unit is specifically configured to: and establishing a k-d tree based on a data traffic packet at least comprising the available data, and carrying out anomaly detection on the data traffic packet.
12. The device for detecting the abnormal flow of the power network according to claim 10, wherein the flow packet collecting unit is specifically configured to: and collecting the data traffic packet from a router or a switch through a probe, and storing the data traffic packet.
13. The power network traffic abnormality detection device according to claim 10, characterized in that the abnormality detection unit includes:
the local abnormal factor calculation module is used for establishing a k-d tree by taking each data flow packet as an object and calculating a local abnormal factor of each object;
and the anomaly detection module is used for comparing the local anomaly factor of each object with a preset value and detecting whether the data traffic packet corresponding to the object is abnormal or not.
14. The power network traffic anomaly detection device according to claim 13, wherein said local anomaly factor calculation module comprises:
a k-distance calculation submodule for calculating a k-distance of each object;
the k-distance neighborhood submodule is used for calculating a corresponding k-distance neighborhood according to the k-distance of each object;
an reachable distance computation sub-module for computing the reachable distance of each object from objects in its k-distance neighborhood;
the local reachable density calculation submodule is used for calculating corresponding local reachable density according to the reachable distance between each object and the object in the k-distance neighborhood of the object;
and the local abnormal factor calculation submodule is used for calculating a corresponding local abnormal factor according to the local reachable density of each object.
CN201610874427.3A 2016-09-30 2016-09-30 Power network traffic anomaly detection method and device Pending CN106375156A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610874427.3A CN106375156A (en) 2016-09-30 2016-09-30 Power network traffic anomaly detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610874427.3A CN106375156A (en) 2016-09-30 2016-09-30 Power network traffic anomaly detection method and device

Publications (1)

Publication Number Publication Date
CN106375156A true CN106375156A (en) 2017-02-01

Family

ID=57894731

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610874427.3A Pending CN106375156A (en) 2016-09-30 2016-09-30 Power network traffic anomaly detection method and device

Country Status (1)

Country Link
CN (1) CN106375156A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107257351A (en) * 2017-07-28 2017-10-17 广东电网有限责任公司云浮供电局 One kind is based on grey LOF Traffic anomaly detections system and its detection method
CN107454097A (en) * 2017-08-24 2017-12-08 深圳中兴网信科技有限公司 The detection method of abnormal access, system, computer equipment, readable storage medium storing program for executing
CN109660517A (en) * 2018-11-19 2019-04-19 北京天融信网络安全技术有限公司 Anomaly detection method, device and equipment
CN110098983A (en) * 2019-05-28 2019-08-06 上海优扬新媒信息技术有限公司 A kind of detection method and device of abnormal flow
CN113806204A (en) * 2020-06-11 2021-12-17 北京威努特技术有限公司 Method, device, system and storage medium for evaluating message field correlation

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101556601A (en) * 2009-03-12 2009-10-14 华为技术有限公司 Method and device for searching k neighbor

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101556601A (en) * 2009-03-12 2009-10-14 华为技术有限公司 Method and device for searching k neighbor

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
应斐昊,邢宁哲,纪雨彤,李文璟: "基于LOF的电力数据网业务流量异常检测", 《2016年全国通信软件学术会议程序册与交流文集》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107257351A (en) * 2017-07-28 2017-10-17 广东电网有限责任公司云浮供电局 One kind is based on grey LOF Traffic anomaly detections system and its detection method
CN107257351B (en) * 2017-07-28 2020-08-04 广东电网有限责任公司云浮供电局 OF flow anomaly detection system based on gray L and detection method thereof
CN107454097A (en) * 2017-08-24 2017-12-08 深圳中兴网信科技有限公司 The detection method of abnormal access, system, computer equipment, readable storage medium storing program for executing
CN109660517A (en) * 2018-11-19 2019-04-19 北京天融信网络安全技术有限公司 Anomaly detection method, device and equipment
CN109660517B (en) * 2018-11-19 2021-05-07 北京天融信网络安全技术有限公司 Abnormal behavior detection method, device and equipment
CN110098983A (en) * 2019-05-28 2019-08-06 上海优扬新媒信息技术有限公司 A kind of detection method and device of abnormal flow
CN110098983B (en) * 2019-05-28 2021-06-04 上海优扬新媒信息技术有限公司 Abnormal flow detection method and device
CN113806204A (en) * 2020-06-11 2021-12-17 北京威努特技术有限公司 Method, device, system and storage medium for evaluating message field correlation
CN113806204B (en) * 2020-06-11 2023-07-25 北京威努特技术有限公司 Method, device, system and storage medium for evaluating message segment correlation

Similar Documents

Publication Publication Date Title
CN106375156A (en) Power network traffic anomaly detection method and device
CN107274105B (en) Linear discriminant analysis-based multi-attribute decision tree power grid stability margin evaluation method
CN105378714A (en) Fast grouping of time series
CN106612511B (en) Wireless network throughput evaluation method and device based on support vector machine
CN114861788A (en) Load abnormity detection method and system based on DBSCAN clustering
CN116363601B (en) Data acquisition and analysis method and system for pollution monitoring equipment
CN101105841A (en) Method for constructing gene controlled subnetwork by large scale gene chip expression profile data
CN110995153B (en) Abnormal data detection method and device for photovoltaic power station and electronic equipment
Scholz et al. A cyclic time-dependent Markov process to model daily patterns in wind turbine power production
CN108805295A (en) A kind of method for diagnosing faults based on decision Tree algorithms
CN110809066A (en) IPv6 address generation model creation method, device and address generation method
CN110287237B (en) Social network structure analysis based community data mining method
CN115795329A (en) Power utilization abnormal behavior analysis method and device based on big data grid
CN106096117B (en) Uncertain graph key side appraisal procedure based on flow and reliability
CN116030955A (en) Medical equipment state monitoring method and related device based on Internet of things
CN112256752B (en) Data prediction processing method based on data mining
CN111476316B (en) Method and system for clustering mean value of power load characteristic data based on cloud computing
CN114205214A (en) Power communication network fault identification method, device, equipment and storage medium
KR20140006491A (en) Effective graph clustering apparatus and method for probabilistic graph
CN111460326A (en) Method, device and equipment for intelligently pushing environment monitoring data and storage medium
CN112199452A (en) Large-scale community network detection method based on random optimization and non-uniform sampling
CN112910984A (en) Electric power internet of things flow prediction method based on FGn and Poisson process
CN114793200B (en) Important internet of things node identification method based on electric power internet of things network structure
CN113569904B (en) Bus wiring type identification method, system, storage medium and computing device
CN114253953A (en) Power distribution network multidimensional data processing method and system based on cluster analysis

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170201

RJ01 Rejection of invention patent application after publication