CN108200005A - Electric power secondary system network flow abnormal detecting method based on unsupervised learning - Google Patents
Electric power secondary system network flow abnormal detecting method based on unsupervised learning Download PDFInfo
- Publication number
- CN108200005A CN108200005A CN201710828411.3A CN201710828411A CN108200005A CN 108200005 A CN108200005 A CN 108200005A CN 201710828411 A CN201710828411 A CN 201710828411A CN 108200005 A CN108200005 A CN 108200005A
- Authority
- CN
- China
- Prior art keywords
- neuron
- network flow
- secondary system
- som
- input vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/02—Capturing of monitoring data
- H04L43/028—Capturing of monitoring data by filtering
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1425—Traffic logging, e.g. anomaly detection
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/14—Network analysis or design
- H04L41/142—Network analysis or design using statistical or mathematical methods
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/14—Network analysis or design
- H04L41/145—Network analysis or design involving simulating, designing, planning or modelling of a network
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computer Security & Cryptography (AREA)
- Theoretical Computer Science (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Computer Hardware Design (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a kind of electric power secondary system network flow abnormal detecting methods based on unsupervised learning, include the following steps:S1, acquires the log information of equipment in electrical secondary system, and is pre-processed, and obtains historic training data;S2 is trained SOM nets using historic training data, and passes through crosscheck and obtain final detection model;S3, the log information for acquiring equipment in electrical secondary system in real time obtain input vector, input vector are inputted final detection model, the state value of current network flow is worth to according to the state of input vector.This method can timely and effectively find exception of network traffic, improve the treatment effeciency of Network Abnormal, effectively reduce the loss that Network Abnormal is brought.
Description
Technical field
The present invention relates to a kind of network flow detection method more particularly to a kind of Power Secondary systems based on unsupervised learning
System network flow abnormal detecting method;Belong to power communication security technology area.
Background technology
Electric power secondary system refers to by electric power monitoring systems at different levels and dispatching data network (SPDnet) and management at different levels
The system that information system and electric power data communication network network (SPTnet) are formed.Electric power secondary system is the important of power system security
Component part, with dispatching of power netwoks and the safe operation tight association of control system.There is a large amount of safety in electric power secondary system
Equipment, operation system, the network size of composition is huge, complicated, along with a large amount of data communication service.Therefore in real time
It monitors the network flow in electric power secondary system and finds when that there are Traffic Anomalies for where, transport the safety and stability of system in time
Row is most important.
At present, the traditional method of Traffic anomaly detection is to draw flow curve (or baseline), and history is being compared just by algorithm
The difference of normal flow curve and monitoring curve, so as to carry out the judgement of Traffic anomaly detection.Random device study in recent years and data
The fast development of the methods of excavation, a large number of researchers explore application of the relevant technologies in network flow detection field.
It is delivered in Jiang Honghong, Zhang Tao, Zhao Xinjian et al.《Power Information Network Traffic anomaly detection machine based on big data
System》(《Telecommunication science》, 2017,33 (3):In 134-141), a kind of Traffic anomaly detection mechanism based on big data is disclosed,
The mechanism introduces the local outlier factor learning method (LOF) based on density and the support vector data description based on distance
(SVDD) learning method carries out abnormality detection Electricity Information Network data on flows.It is common in Wu Liu, Zhang Situo, Lei Tong, Cai Yaoguang
It delivers《A kind of flow detection model towards Power System Data Net network》(《Information communicates》, 2013 (7):In 45-46), carry
Go out based on Dynamic Baseline theory, using tools such as SNMP, NetFlow, DPI, establish the network flow inspection towards electric system
It surveys model and network flow is analyzed from different dimensions.It is delivered jointly in Zhuan Zhengmao, Chen Xingshu, Shao Guolin, Ye Xiaoming
's《A kind of abnormal traffic detection model of temporal correlation》(《Journal of Shandong university》:Edition, 2017,52 (3):68-73)
In, propose a kind of clustering algorithm structure temporal correlation Traffic Anomaly inspection combined based on distributive law, cluster deviation and closeness
Survey model, experiment prove the model can assisted network administrator note abnormalities in time, but need the support of a large amount of historical datas, not
Suitable for fluctuating larger network environment.
Above-mentioned document obtains certain achievement in research in Power System Data Net network Traffic anomaly detection technical research, but
Its abnormal cause is not inquired into.In order to find exception of network traffic in time and position abnormal cause, self-organizing will be based on
The unsupervised machine learning method for mapping (SOM) is applied to the Traffic anomaly detection of electric power secondary system and Traffic Anomaly reason point
It is very necessary in analysis.
Invention content
In view of the deficiencies of the prior art, the technical problems to be solved by the invention are to provide a kind of based on unsupervised learning
Electric power secondary system network flow abnormal detecting method.
For achieving the above object, the present invention uses following technical solutions:
A kind of electric power secondary system network flow abnormal detecting method based on unsupervised learning, includes the following steps:
S1, acquires the log information of equipment in electrical secondary system, and is pre-processed, and obtains historic training data;
S2 is trained SOM nets using historic training data, and passes through crosscheck and obtain final detection model;
S3, the log information for acquiring equipment in electrical secondary system in real time obtain input vector, and input vector is inputted most final inspection
Model is surveyed, the state value of current network flow is worth to according to the state of input vector.
Wherein preferably, in step sl, the log information of acquisition is pre-processed, included the following steps:
S11 according to the characteristics of log information, filters out particular log type from electrical secondary system in the log information of equipment
The data record daily record of mark;
S12 carries out data cleansing to the data after filtering, and the tcp for selecting the unit interval in measurement period receives message
Number, tcp send message number, udp received data packets number, udp transmission data packet arrays into array;
S13 was calculated in the adjacent cells time, with the difference of dvielement in array, generated historic training data.
Wherein preferably, in step s 2, SOM nets are trained using historic training data, and pass through inspection and examine
Detection model is obtained, is included the following steps:
Historic training data is divided into K parts, uses D respectively by S211, D2……DkIt represents, wherein K is positive integer;
S22, in the i-th wheel, DiFor test data, the data of remaining K-1 part are as training data, using training data pair
SOM nets are trained, and obtain K SOM detection model, wherein i=1,2 ... K;
S23 passes sequentially through test data DiThe accuracy of i-th of SOM detection model is calculated, by the highest SOM of accuracy
Detection model is chosen for final detection model.
Wherein preferably, in step S22, SOM nets is trained using training data, are included the following steps:
S221 obtains a test vector from training data, finds the corresponding current training neuron of test vector;
S222 finds the adjacent neurons of the corresponding current training neuron of test vector, updates current training god
Value through member and adjacent neurons;
S223 repeats step S221~S222, until not having the test vector not being acquired in training data, obtains one
Preliminary SOM detection models;
S224 calculates the neighboring region of each neuron in preliminary SOM models, if the neighboring region is less than adjacent threshold
Value then marks the neuron as normal value, otherwise, the neuron is marked to obtain a SOM detection model for exceptional value.
Wherein preferably, in step S221, the corresponding current training neuron of test vector is found, including walking as follows
Suddenly:
Calculate Euclidean distance of the weight vectors of each neuron in SOM nets with measuring vector;
The weight vectors of each neuron carry out ascending order arrangement with measuring the Euclidean distance of vector during SOM is netted, and obtain
Obtain the neuron of Euclidean distance minimum;
The neuron of Euclidean distance minimum is selected as to current training neuron.
Wherein preferably, in step S222, the value of the update adjacent neurons, using equation below:
W (t+1)=W (t)+N (v, t) L (t) (D (t)-W (t));
Wherein, W (t) is the weight vectors in t moment;D (t) is the measurement vector in t moment;N (v, t) is adjacent nerve
The distance function of member;L (t) represents learning coefficient.
Wherein preferably, in step S225, the neighboring region of a neuron of calculating is neuron to its upper bottom left
The sum of right manhatton distance of adjacent neurons;
Use weight vectors Wi=[ω1,i,…,ωk,i], Wj=[ω1,j,…,ωk,j] calculate two neuron Ni、Nj's
Manhatton distance, using equation below:
Wherein, M (Ni, Nj) it is two neuron Ni、NjManhatton distance;ω1,iAnd ω1,jRespectively two neurons
Ni、NjWeight vectors in value.
Wherein preferably, in step s3, the log information of equipment obtains input vector in acquisition electrical secondary system in real time, will
Input vector inputs final detection model, the state value of current network flow is worth to according to the state of input vector, including such as
Lower step:
S31, the log information for acquiring equipment in electrical secondary system in real time obtain input vector, and input vector is inputted most final inspection
Model is surveyed, finds the corresponding current training neuron of input vector;
S32 obtains the state value of the corresponding current training neuron of input vector, according to current training neuron
State value determines the state value of input vector
S33 is worth to the state value of current network flow according to the state of input vector.
Wherein preferably, the electric power secondary system network flow abnormal detecting method based on unsupervised learning, also
Include the following steps:
S4, if the state value of current network flow is improper, by finding the corresponding neuron of input vector and phase
The difference index of adjacent normal neurons obtains current network flow abnormal cause.
Wherein preferably, in step s 4, if the state value of current network flow is improper, by find input to
The difference index of corresponding neuron and adjacent normal neurons is measured, current network flow abnormal cause is obtained, including walking as follows
Suddenly:
S41 if the state value of current network flow is improper, calculates the neuron to adjacent normal neurons
Euclidean distance;
S42 calculates improper neuron and difference of the normal neurons in weight vectors in each index, and will difference
Value carries out descending arrangement, obtains single index rank list;
S43 summarizes each single index rank list, determines the index for causing current network flow exception.
Electric power secondary system network flow abnormal detecting method provided by the present invention based on unsupervised learning, by adopting
SOM nets are trained with historic training data, and passes through crosscheck and obtains final detection model;In actual use process
In, the log information for acquiring equipment in electrical secondary system in real time obtains input vector, and input vector is inputted final detection model, root
The state value of current network flow is worth to according to the state of input vector.This method can timely and effectively find that network flow is different
Often, the loss that Network Abnormal is brought effectively is reduced.In addition to this, if the state value of current network flow is improper, lead to
It crosses and finds the corresponding neuron of input vector and the difference index of adjacent normal neurons, it is extremely former to obtain current network flow
Cause.The abnormal network cause of acquisition for decision-maker is referred to, timely and effectively Network Abnormal can be handled, carried
The high efficiency of Network Abnormal processing.
Description of the drawings
Fig. 1 is the electric power secondary system network flow abnormal detecting method provided by the present invention based on unsupervised learning
Flow chart;
Fig. 2 is the schematic diagram that is trained using training data to SOM nets in one embodiment provided by the present invention.
Specific embodiment
Detailed specific description is carried out to the technology contents of the present invention in the following with reference to the drawings and specific embodiments.
To analyze numerous equipment and server traffic behavior in electrical secondary system, using the non-supervisory engineering netted based on SOM
It practises algorithm to be analyzed for the device log in the electrical secondary system collected, be pushed away so as to fulfill Traffic anomaly detection and abnormal cause
Survey analysis.
As shown in Figure 1, the electric power secondary system exception of network traffic detection provided by the present invention based on unsupervised learning
Method includes the following steps:First, the log information of equipment in electrical secondary system is acquired, and is pre-processed, obtains history training
Data;Then, SOM nets are trained using historic training data, and pass through crosscheck and obtain final detection model;Most
Afterwards, the log information of equipment obtains input vector in acquisition electrical secondary system in real time, and input vector is inputted final detection model, root
The state value of current network flow is worth to according to the state of input vector.This processing procedure is described in detail below.
S1, acquires the log information of equipment in electrical secondary system, and is pre-processed, and obtains historic training data.
The log information of equipment in electrical secondary system is acquired, in embodiment provided by the present invention, the data used are to adopt
The log information of equipment in the electrical secondary system collected, time interval 2min.
Then the log information of acquisition is pre-processed, specifically comprised the following steps:
S11 according to the characteristics of log information, filters out particular log type from electrical secondary system in the log information of equipment
The data record daily record of mark.
Outflow correlation log is filtered from electrical secondary system massive logs.Only Log Types is selected to be identified as here
The data record daily record of " nettcp " and " netudp ".Nettcp data and netudp data are traffic statistics, numerical value
Larger and sustainable growth, variation tendency can more reflect current network flow state, and gathered data is as shown in table 1.
1 gathered data information table of table
S12 carries out data cleansing to the data after filtering, and the tcp for selecting the unit interval in measurement period receives message
Number, tcp send message number, udp received data packets number, udp transmission data packet arrays into array.
Data cleansing is carried out to the data after filtering, wherein cleaning method can be that application No. is 201510129479.3
The cleaning method that is there is provided in a kind of data cleaning method based on ETL or application No. is in 201710103779.3
The cleaning method or existing arbitrary data provided in distributed data cleaning system and method based on data analysis is clear
Washing method is not specifically limited herein.For convenience of monitoring device traffic conditions, unit interval is only taken in measurement period here
" tcp receives message number ", " tcp sends message number ", " udp received data packets number ", " udp transmission data packets number ".In the present invention
In the embodiment provided, the unit interval could be provided as 1 minute.
S13 was calculated in the adjacent cells time, with the difference of dvielement in array, generated historic training data.
Nettcp data and netudp data are larger for traffic statistics its numerical value and sustainable growth, and variation tendency is more
It can reflect current network flow state, therefore extract transmitting-receiving (section) packet number in two types data and form new array, and count
It calculates in the adjacent cells time, with the difference of dvielement in array, generation historic training data is as new training data group or survey
Try array.Treated, and data sample is as shown in table 2.
2 training data group of table shows table
S2 is trained SOM nets using historic training data, and passes through crosscheck and obtain final detection model.Its
In it is preferred, SOM nets are trained using historic training data, the K SOM detection moulds for being obtained training by crosscheck
Type is ranked up according to accuracy, and the highest SOM detection models of accuracy are chosen for final detection model.
At present, the machine learning frame based on Distributed Computing Platform mainly has the Mahout based on Hadoop platform, base
In MLlib and figure operation frame Graph Lab of Spark platforms etc..Wherein, Mahout is based on MapReduce computation module,
Therefore it needs to carry out frequent disk read-write operation when Hadoop platform performs operation, calculated performance is poor;Graph Lab masters
The computing platform of figure operational model is directed to, application range is relatively narrow;MLlib is realized based on Spark platforms, is towards iterative interior
It deposits the demand of calculating and designs, therefore more suitable for the application scenarios of machine learning, MLlib has apparent in these scenes
Performance advantage.Spark platforms can be carried significantly by introducing RDD (resilient distributed datasets) model
High processing rate makes it outclass Hadoop in terms of interactive mode calculating and iterative calculation;The advantage for being good at iterative calculation causes
Spark processing platforms are very beneficial for the data mining of big data.In embodiment provided by the present invention, Ubuntu is run on
System, distributed computing environment is using Spark MLlib.
Biological study shows the sensory channel of human brain, what the tissue of neuron was ordered into.When extraneous specific information inputs
When, cerebral cortex specific region can be excited, and similar external information, is contone in the region.It is refreshing in cerebral cortex
This features of response through member, not innately, but by learning what self-organizing was formed the day after tomorrow.Finland in 1981
Professor T.kohonen proposes a kind of self-organizing feature map net similar with human brain Self-organization, abbreviation SOM nets.SOM algorithms
The mapping space that simultaneously can preferably map high-dimensional input spaces to a low-dimensional without manual identification's training set (is typically two dimensions
Degree), while retain the topological property of original input space.Therefore, SOM can not lose while multivariable action learning is handled
Lose any typical behaviour.In embodiment provided by the present invention, to the day constantly acquired from each business host or equipment
Will data carry out pretreatment generation and measure vector D (t)=[x1, x2, x3, x4], wherein x1Represent that tcp receives message variable quantity, x2
Represent that tcp sends message variable quantity, x3Represent udp received data packets variable quantity, x4It represents udp transmission data packet variable quantities, is used in combination
This vector goes that SOM nets is trained to obtain SOM detection models as input.Here it is each business host or opening of device one
SOM goes to learn its network-flow characteristic.
Weight vectors are initialized due to the use of random number, these weight vectors can only represent a training in some mappings
The subset of data.This will lead to only have least a portion of neuron to be trained in the mapping, eventually lead to the quality of data of the mapping
It is deteriorated.Therefore in embodiment provided by the present invention, this is solved the problems, such as using crosscheck, using historic training data
SOM nets are trained, and passes through inspection and examines acquisition detection model, are specifically comprised the following steps:
Historic training data is divided into K parts, uses D respectively by S211, D2……DkIt represents, wherein K is positive integer.
Historic training data is divided into K parts, uses D respectively1, D2……DkIt represents.Test model is generated by crosscheck
Process need K wheel complete, wherein K be positive integer.
S22, in the i-th wheel, DiFor test data, the data of remaining K-1 part are as training data, using training data pair
SOM nets are trained, and obtain K SOM detection model, wherein i=1,2 ... K.
In the i-th wheel, DiSelected as test data, and the data of other K-1 parts are as training data, using training
Data are trained SOM nets, specifically comprise the following steps:
S221 obtains a test vector from training data, finds the corresponding current training neuron of test vector.
One SOM net is combined by the collection of one group of neuron, such as Fig. 2.Each neuron is with a weight vectors and one
Mapping point expression, wherein, the neuron weight vectors in Fig. 2 marked as 1 are [0,2,4,2].Weight vectors are with measuring vector
The length of (D (t)) is identical, it can dynamically update oneself according to the value of the measurement vector in training data.
In the training study stage, a competition learning process is crossed to adjust the weight vectors of different neurons by SOM Netcoms.From
A test vector is obtained in training data, the corresponding current training neuron of test vector is found, specifically includes following step
Suddenly:
Calculate Euclidean distance of the weight vectors of each neuron in SOM nets with measuring vector;
Point x=(x1,...,x4) and y=(y1,...,y4) between Euclidean distance be:
Wherein x is weight vectors, and y is measures vector.
The weight vectors of each neuron carry out ascending order arrangement with measuring the Euclidean distance of vector during SOM is netted, and obtain
Obtain the neuron of Euclidean distance minimum;
The neuron of Euclidean distance minimum is selected as to current training neuron.
For example, Fig. 2 illustrate one be used measure vector A [0,2,4,2] as input be trained include 9
The SOM networks of neuron.A is calculated first to the Euclidean distance of each neuron.Since neuron 1 arrives the Euclid of A
Distance is minimum, so neuron 1 is chosen as current training neuron.
S222 finds the adjacent neurons of the corresponding current training neuron of test vector, updates current training god
Value through member and adjacent neurons.
The value of neuron 1 and its adjacent neurons will be updated.In embodiment provided by the present invention, adjacent god is defined
It is 1 through the radius r between member.So the adjacent neurons of neuron 1 are 2,4,5.Update power of the neuron in t moment
The expression formula of weight vector is formula (1):
W (t+1)=W (t)+N (v, t) L (t) (D (t)-W (t)) (2)
Wherein, W (t) and D (t) is illustrated respectively in the weight vectors of t moment and measures vector.N (v, t) is an expression phase
The function of adjacent neuron distance.L (t) represents learning coefficient, changes in study course intermediate value for determining each weight vectors
How much.
The learning coefficient L (t) of formula (1) that study course shown in Fig. 2 uses is that 1 distance function N (v, t) is 1/
4.A simple distance function has been used in embodiment provided by the present invention, it can be according to reality during specific use
Border needs to use more complicated distance function.In example, the weight vectors of neuron 2 are [4,2,4,2], and input vector is
[0,2,4,2].The distance for calculating input vector and weight vectors obtains [- 4,0,0,0], then multiplied by with 1/4 and 1, i.e. (D
(t)-W (t)) * N (v, t) * L (t), obtain [- 1,0,0,0], finally plus initial weight vectors [4,2,4,2] up to [3,2,
4,2], which is 2 updated weight vectors of neuron.In fig. 2, it is all to want newer neuron overstriking is shown.
This method makes to train neuron and its adjacent neurons are close to converge on input measurement vector space.
S223 repeats step S221~S222, until not having the test vector not being acquired in training data, obtains one
Preliminary SOM detection models.
Newer adjacent neurons are repeated step S222 by S223, until institute in SOM nets
Until some neurons are all updated.
Step S221~S222 is repeated, until neuron all in SOM nets is all updated.When each input
Measurement vector be all used for update neural network it is multiple after, the study stage just completes.At this point, the weight vectors of neuron can
It is extensive to represent entire measure vector space one.Therefore, SOM nets can under different workloads acquisition system flow
Behavior.
S224 calculates the neighboring region of each neuron in preliminary SOM models, if the neighboring region is less than adjacent threshold
Value then marks the neuron as normal value, otherwise, the neuron is marked to obtain a SOM detection model for exceptional value.
In any given period, changes in flow rate behavior can divide three phases:Normally, pre- exception is abnormal.Whenever more
The weight vectors of a new neuron a pair neuron adjacent with it can be updated simultaneously.After study, frequent trained god
The weight vectors of identical measurement vector modification adjacent neurons can be used through member.Finally, the weight for the neuron often trained
Vector will be similar with the weight vectors of its adjacent neuron.Since flow system flow variation behavior is typically located at normal condition, because
This, the frequency of training for representing the neuron of normal condition is more much higher than the neuron for representing pre- failure and status of fail.Therefore,
There are one the neuron colonies for representing different normal discharge behaviors for meeting.
Neighboring region size is calculated by checking the direct neighbor neuron of each neuron.Here neuroid is opened up
It is two-dimentional to flutter structure, so neuron of the inspection department in the position up and down of each neuron.Use weight Wi=
[ω1,i,…,ωk,i], Wj=[ω1,j,…,ωk,j] two neuron N of vector calculatingi, NjManhatton distance, formula is as follows.
Define a neuron NiNeighboring region size be neuron to the Man Ha of its adjacent neurons up and down
Sum of the distance, uses N respectivelyT, NB, NL, NRRepresent neuron up and down.
Determine whether the neuron is abnormal by the neighboring region size of a neuron.In reality provided by the present invention
It applies in example, adjacent threshold value is obtained by the way that a large amount of log informations are carried out with analysis mining, if neighboring region area is less than the adjoining
Threshold value illustrates the neuron and other neurons relatively, is normal value.If neighboring region area is more than the adjoining threshold
Value, illustrates that the neuron is not close to other neurons, is then determined as exceptional value.
In the i-th wheel, DiFor test data, the data of remaining K-1 part are as training data, using training data to SOM
Net is trained, and implements above-mentioned steps S221~S225, obtains a SOM detection model, and i values are incremented by one by one from 1~K, weight
Multiple above-mentioned steps S221~S225, you can obtain K SOM detection model.
S23 passes sequentially through test data DiThe accuracy of i-th of SOM detection model is calculated, by the highest SOM of accuracy
Detection model is chosen for final detection model.
The accuracy of each SOM networks can be estimated by collecting various statistic of classifications correctly and incorrectly.At this
It is unsupervised by being then based in the provided one embodiment of invention, so training SOM using unmarked normal data
Network.In the i-th wheel, DiFor test data, test data is input in i-th of SOM detection model, calculates i-th of SOM detection
The accuracy of model.Each SOM detection models use equation below accuracy in computation:
Wherein, NfpIt represents wrong report number, that is, report exception but is practically without being abnormal.NfnNumber is failed to report in expression, is not reported
It accuses exception but exception actually has occurred.NtpIt represents, without wrong report number, to report exception and be actually also abnormal.NtnIt represents
Number is not failed to report, does not report that exception is not also abnormal actually.Since training data is all normal data, so Nfn=
Ntp=0.But due to during SOM detection models are built there may be deviation, so not all SOM detection models
Accuracy A be 1, the accuracy of K SOM detection model of calculating is arranged, accuracy of selection highest SOM detections
Model is as final detection model.
S3, the log information for acquiring equipment in electrical secondary system in real time obtain input vector, and input vector is inputted most final inspection
Model is surveyed, the state value of current network flow is worth to according to the state of input vector.
The log information of equipment in electrical secondary system is acquired in real time, is obtained tcp and is received message variable quantity, tcp transmission message changes
Change amount, udp received data packets variable quantity and udp transmission data packet variable quantities, the vector formed are inputted as input vector
To obtained final detection model, according to normal, pre- abnormal, the abnormal state value of input vector, current network flow is obtained
It is normal whether state value, specifically comprise the following steps:
S31, the log information for acquiring equipment in electrical secondary system in real time obtain input vector, and input vector is inputted most final inspection
Model is surveyed, finds the corresponding current training neuron of input vector.
Wherein, the process of the corresponding current training neuron of input vector is found, is specifically comprised the following steps:
Calculate the Euclidean distance of the weight vectors of each neuron and input vector in final detection model;
The Euclidean distance of the weight vectors of neuron each in final detection model and input vector is subjected to ascending order
Arrangement obtains the neuron of Euclidean distance minimum;
The neuron of Euclidean distance minimum is selected as to current training neuron.
S32 obtains the state value of the corresponding current training neuron of input vector, according to current training neuron
State value determines the state value of input vector;I.e.:
If the state value of current training neuron is normal, the state value of input vector is normal;It is if current
Training neuron state value for pre- exception, then the state value of input vector is pre- exception;If current training neuron
State value be abnormal, then the state value of input vector is abnormal.
S33 is worth to the state value of current network flow according to the state of input vector, i.e.,:
If the state value of input vector is normal, the state value of current network flow is normal;If input vector
State value for pre- exception, then the state value of current network flow is pre- exception;If the state value of input vector is exception,
The state value of current network flow is abnormal.
S4, if the state value of current network flow is improper, by finding the corresponding neuron of input vector and phase
The difference index of adjacent normal neurons obtains current network flow abnormal cause.
In embodiment provided by the present invention, by prompting which index that (tcp receives message number, tcp sends message
Number, udp received data packets number, udp transmission data packets number) Traffic Anomaly reason is speculated to abnormal contribution maximum.Although simultaneously
Cannot identify leads to abnormal immediate cause, but can search abnormal immediate cause so that aid prompting is started with from what angle.Due to
Final detection model saves the topological attribute for measuring vector, can cause abnormal wrong index by calculating therefrom identification.
Embodiment provided by the present invention is normally to be compared with abnormal neuron, finds out most different finger in weight vectors
Mark, and then obtain current network flow abnormal cause.If the state value of current network flow is improper, inputted by finding
The corresponding neuron of vector and the difference index of adjacent normal neurons, obtain current network flow abnormal cause, specifically include
Following steps:
S41 if the state value of current network flow is improper, calculates the neuron to adjacent normal neurons
Euclidean distance.
When mapping neuron of the input vector to exception, the neuron is calculated to adjacent normal neurons
Euclidean distance.This method needs avoid being compared with abnormal neuron, because abnormal neuron is originally in
In nondeterministic statement it is easy make mistake index prompting.In embodiment provided by the present invention, it can calculate first each
A neighboring region area if above the normality threshold of setting, is then determined as abnormal neuron, just changes another adjacent nerve
Member relatively, can also directly represent the state value (normal, pre- abnormal or abnormal) of the neuron in the training process.In the present invention
It in the embodiment provided, is not particularly limited, the state value that neighboring region area obtains neuron can be calculated in real time, also may be used
To store the state value of the neuron in advance in training.In this embodiment, threshold value is compared in setting, it is ensured that enough is normal
Neuron can be compared with current improper neuron, to obtain the result that accuracy meets actual demand.If
Normal neuron is not found in the adjacent neurons of abnormal neuron to compare or the number of normal neuron is less than
Compare threshold value and just expand calculating distance to include more neurons in SOM networks.In order to ensure correctness, carried in the present invention
In the embodiment of confession, compare threshold value with Q representatives, compared using Q normal neurons with abnormal neuron (at this
In embodiment, Q=5).
S42 calculates improper neuron and difference of the normal neurons in weight vectors in each index, and will difference
Value carries out descending arrangement, obtains single index rank list.
When the set of normal neurons is decided, calculating normal neurons exist with abnormal neuron (improper neuron)
Difference in weight vectors in each index.In embodiment provided by the present invention, which, which may be used, subtracts each other or arbitrarily
The algorithm of the two difference value can be embodied, is not specifically limited herein.When being subtracted each other using the two, just have since result of calculation has
It is negative, the absolute value of result is taken here.Then it sorts from big to small to the result that different indexs calculates, one individual event of generation refers to
Mark rank list.In this way, in abnormal neuron and set each normal neurons relatively after, Q single index will be obtained
Rank list.
S43 summarizes each single index rank list, determines the index for causing current network flow exception.
Each single index rank list is summarized, determines the index for causing current network flow exception, i.e.,:It is right
Each single index rank list is summarized, according to the sequence of abnormal index in each single index rank list to index
It is counted, descending arrangement finally is carried out to abnormal index according to count value, the highest index of count value is determined as causing and is worked as
The index of preceding exception of network traffic.
Each rank list is checked, to determine final index rank list.In order to select final index grade row
Table, here using the method for majority voting.Each list is to index ballot most different in oneself list, who gets the most votes's conduct
First item in index rank list, second and third and so on.For example 5 rank lists, three display tcp receive message
Data variation amount is to cause abnormal maximum reason, and another two show udp send bag data variable quantity be cause it is abnormal most
Big reason.At this moment it is the reason of causing abnormal possibility maximum judgement tcp to be received message data variation.
In conclusion the electric power secondary system exception of network traffic detection side provided by the present invention based on unsupervised learning
Method by acquiring the log information of equipment in electrical secondary system, and is pre-processed, obtains historic training data;Then, using going through
History training data is trained SOM nets, and passes through crosscheck and obtain final detection model;Finally, secondary system is acquired in real time
The log information of equipment obtains input vector in system, input vector is inputted final detection model, according to the state of input vector
It is worth to the state value of current network flow.This method is generated for the equipment and operation system run in electric power secondary system
Massive logs are collected, and are filtered outflow correlation log, using non-supervisory machine learning algorithm, can timely and effectively be found
Exception of network traffic, the effective loss for reducing Network Abnormal and bringing.In addition to this, if the state value of current network flow is
It is improper, by finding the difference index of the corresponding neuron of input vector and adjacent normal neurons, obtain current network stream
Measure abnormal cause.The abnormal network cause of acquisition is referred to for decision-maker, can timely and effectively to Network Abnormal into
Row processing improves the efficiency of Network Abnormal processing.
Above to the electric power secondary system network flow abnormal detecting method provided by the present invention based on unsupervised learning
It is described in detail.It is right under the premise of without departing substantially from true spirit for those of ordinary skill in the art
Any obvious change that it is done will all form to infringement of patent right of the present invention, will undertake corresponding legal liabilities.
Claims (10)
1. a kind of electric power secondary system network flow abnormal detecting method based on unsupervised learning, it is characterised in that including as follows
Step:
S1, acquires the log information of equipment in electrical secondary system, and is pre-processed, and obtains historic training data;
S2 is trained SOM nets using historic training data, and passes through crosscheck and obtain final detection model;
S3, the log information for acquiring equipment in electrical secondary system in real time obtain input vector, and input vector is inputted final detection mould
Type is worth to the state value of current network flow according to the state of input vector.
2. the electric power secondary system network flow abnormal detecting method based on unsupervised learning as described in claim 1, special
Sign is in step sl, to pre-process the log information of acquisition, include the following steps:
S11 according to the characteristics of log information, filters out particular log type identification from electrical secondary system in the log information of equipment
Data record daily record;
S12 carries out data cleansing to the data after filtering, select the unit interval in measurement period tcp receive message number,
Tcp sends message number, udp received data packets number, udp transmission data packet arrays into array;
S13 was calculated in the adjacent cells time, with the difference of dvielement in array, generated historic training data.
3. the electric power secondary system network flow abnormal detecting method based on unsupervised learning as described in claim 1, special
Sign is in step s 2, and SOM nets are trained using historic training data, and passes through inspection and examine acquisition detection model,
Include the following steps:
Historic training data is divided into K parts, uses D respectively by S211, D2……DkIt represents, wherein K is positive integer;
S22, in the i-th wheel, DiFor test data, the data of remaining K-1 part are as training data, using training data to SOM nets
It is trained, obtains K SOM detection model, wherein i=1,2 ... K;
S23 passes sequentially through test data DiThe accuracy of i-th of SOM detection model is calculated, the highest SOM of accuracy is detected into mould
Type is chosen for final detection model.
4. the electric power secondary system network flow abnormal detecting method based on unsupervised learning as claimed in claim 3, special
Sign is in step S22, and SOM nets are trained using training data, are included the following steps:
S221 obtains a test vector from training data, finds the corresponding current training neuron of test vector;
S222 finds the adjacent neurons of the corresponding current training neuron of test vector, updates current training neuron
And the value of adjacent neurons;
S223 repeats step S221~S222, until not having the test vector not being acquired in training data, obtains one tentatively
SOM detection models;
S224 calculates the neighboring region of each neuron in preliminary SOM models, if the neighboring region is less than adjacent threshold value,
The neuron is then marked as normal value, otherwise, the neuron is marked to obtain a SOM detection model for exceptional value.
5. the electric power secondary system network flow abnormal detecting method based on unsupervised learning as claimed in claim 4, special
Sign is in step S221, finds the corresponding current training neuron of test vector, includes the following steps:
Calculate Euclidean distance of the weight vectors of each neuron in SOM nets with measuring vector;
The weight vectors of each neuron carry out ascending order arrangement with measuring the Euclidean distance of vector during SOM is netted, and obtain Europe
The minimum neuron of distance is obtained in several;
The neuron of Euclidean distance minimum is selected as to current training neuron.
6. the electric power secondary system network flow abnormal detecting method based on unsupervised learning as claimed in claim 4, special
Sign is:
In step S222, the value of the update adjacent neurons, using equation below:
W (t+1)=W (t)+N (v, t) L (t) (D (t)-W (t));
Wherein, W (t) is the weight vectors in t moment;D (t) is the measurement vector in t moment;N (v, t) is adjacent neurons
Distance function;L (t) represents learning coefficient.
7. the electric power secondary system network flow abnormal detecting method based on unsupervised learning as described in claim 1, special
Sign is:
In step S225, the neighboring region of a neuron of calculating is neuron to its adjacent neurons up and down
The sum of manhatton distance;
Use weight vectors Wi=[ω1,i,…,ωk,i], Wj=[ω1,j,…,ωk,j] calculate two neuron Ni、NjMan Ha
Distance, using equation below:
Wherein, M (Ni, Nj) it is two neuron Ni、NjManhatton distance;ω1,iAnd ω1,jRespectively two neuron Ni、Nj
Weight vectors in value.
8. the electric power secondary system network flow abnormal detecting method based on unsupervised learning as described in claim 1, special
Sign is that in step s3 the log information for acquiring equipment in electrical secondary system in real time obtains input vector, and input vector is inputted
Final detection model is worth to the state value of current network flow according to the state of input vector, includes the following steps:
S31, the log information for acquiring equipment in electrical secondary system in real time obtain input vector, and input vector is inputted final detection mould
Type finds the corresponding current training neuron of input vector;
S32 obtains the state value of the corresponding current training neuron of input vector, according to the state of current training neuron
Value determines the state value of input vector
S33 is worth to the state value of current network flow according to the state of input vector.
9. the electric power secondary system network flow abnormal detecting method based on unsupervised learning as described in claim 1, special
Sign is to further include following steps:
S4, if the state value of current network flow to be improper, by find the corresponding neuron of input vector with it is adjacent just
The difference index of normal neuron, obtains current network flow abnormal cause.
10. the electric power secondary system network flow abnormal detecting method based on unsupervised learning as claimed in claim 9, special
Sign is in step s 4, if the state value of current network flow is improper, by finding the corresponding nerve of input vector
First difference index with adjacent normal neurons obtains current network flow abnormal cause, includes the following steps:
S41, if the state value of current network flow is improper, the Europe for calculating the neuron to adjacent normal neurons is several
In distance;
S42, calculates improper neuron and difference of the normal neurons in weight vectors in each index, and by difference be worth into
Row descending arranges, and obtains single index rank list;
S43 summarizes each single index rank list, determines the index for causing current network flow exception.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710828411.3A CN108200005A (en) | 2017-09-14 | 2017-09-14 | Electric power secondary system network flow abnormal detecting method based on unsupervised learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710828411.3A CN108200005A (en) | 2017-09-14 | 2017-09-14 | Electric power secondary system network flow abnormal detecting method based on unsupervised learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108200005A true CN108200005A (en) | 2018-06-22 |
Family
ID=62572779
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710828411.3A Pending CN108200005A (en) | 2017-09-14 | 2017-09-14 | Electric power secondary system network flow abnormal detecting method based on unsupervised learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108200005A (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109145431A (en) * | 2018-08-10 | 2019-01-04 | 国家电网公司西南分部 | A kind of modeling method and device monitoring electric power secondary system operating status |
CN109886833A (en) * | 2019-01-21 | 2019-06-14 | 广东电网有限责任公司信息中心 | A kind of deep learning method of smart grid-oriented server traffic abnormality detection |
CN110674940A (en) * | 2019-09-18 | 2020-01-10 | 上海擎创信息技术有限公司 | Multi-index anomaly detection method based on neural network |
CN111523661A (en) * | 2020-04-21 | 2020-08-11 | 厦门利德集团有限公司 | Neural network optimization method for electric power safety operation based on information entropy and self-expectation |
CN113079150A (en) * | 2021-03-26 | 2021-07-06 | 深圳供电局有限公司 | Intrusion detection method for power terminal equipment |
CN113138875A (en) * | 2021-04-22 | 2021-07-20 | 深圳感臻科技有限公司 | Fault detection method, terminal and computer storage medium |
CN113228585A (en) * | 2018-10-23 | 2021-08-06 | 阿卡麦科技公司 | Network security system with feedback loop based enhanced traffic analysis |
CN116846060A (en) * | 2023-03-08 | 2023-10-03 | 国网江苏省电力有限公司淮安供电分公司 | Working condition safety learning system of IEC61850 intelligent substation |
WO2024007615A1 (en) * | 2022-07-05 | 2024-01-11 | 华为云计算技术有限公司 | Model training method and apparatus, and related device |
CN117692207A (en) * | 2023-12-12 | 2024-03-12 | 国网湖北省电力有限公司鄂州供电公司 | Instruction-level power system service protection method based on weighted similarity matching |
-
2017
- 2017-09-14 CN CN201710828411.3A patent/CN108200005A/en active Pending
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109145431A (en) * | 2018-08-10 | 2019-01-04 | 国家电网公司西南分部 | A kind of modeling method and device monitoring electric power secondary system operating status |
CN113228585B (en) * | 2018-10-23 | 2023-03-31 | 阿卡麦科技公司 | Network security system with feedback loop based enhanced traffic analysis |
CN113228585A (en) * | 2018-10-23 | 2021-08-06 | 阿卡麦科技公司 | Network security system with feedback loop based enhanced traffic analysis |
CN109886833A (en) * | 2019-01-21 | 2019-06-14 | 广东电网有限责任公司信息中心 | A kind of deep learning method of smart grid-oriented server traffic abnormality detection |
CN110674940A (en) * | 2019-09-18 | 2020-01-10 | 上海擎创信息技术有限公司 | Multi-index anomaly detection method based on neural network |
CN110674940B (en) * | 2019-09-18 | 2023-04-18 | 上海擎创信息技术有限公司 | Multi-index anomaly detection method based on neural network |
CN111523661A (en) * | 2020-04-21 | 2020-08-11 | 厦门利德集团有限公司 | Neural network optimization method for electric power safety operation based on information entropy and self-expectation |
CN111523661B (en) * | 2020-04-21 | 2023-03-24 | 厦门利德集团有限公司 | Neural network optimization method for electric power safety operation based on information entropy and self-expectation |
CN113079150A (en) * | 2021-03-26 | 2021-07-06 | 深圳供电局有限公司 | Intrusion detection method for power terminal equipment |
CN113138875A (en) * | 2021-04-22 | 2021-07-20 | 深圳感臻科技有限公司 | Fault detection method, terminal and computer storage medium |
CN113138875B (en) * | 2021-04-22 | 2022-12-06 | 深圳感臻智能股份有限公司 | Fault detection method, terminal and computer storage medium |
WO2024007615A1 (en) * | 2022-07-05 | 2024-01-11 | 华为云计算技术有限公司 | Model training method and apparatus, and related device |
CN116846060A (en) * | 2023-03-08 | 2023-10-03 | 国网江苏省电力有限公司淮安供电分公司 | Working condition safety learning system of IEC61850 intelligent substation |
CN117692207A (en) * | 2023-12-12 | 2024-03-12 | 国网湖北省电力有限公司鄂州供电公司 | Instruction-level power system service protection method based on weighted similarity matching |
CN117692207B (en) * | 2023-12-12 | 2024-05-03 | 国网湖北省电力有限公司鄂州供电公司 | Instruction-level power system service protection method based on weighted similarity matching |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108200005A (en) | Electric power secondary system network flow abnormal detecting method based on unsupervised learning | |
KR101323074B1 (en) | Intelligence network anomaly detection using a type ⅱ fuzzy neural network | |
CN108197648A (en) | A kind of Fault Diagnosis Method of Hydro-generating Unit and system based on LSTM deep learning models | |
CN108897954A (en) | Wind turbines temperature pre-warning method and its system based on BootStrap confidence calculations | |
CN108320040A (en) | Acquisition terminal failure prediction method and system based on Bayesian network optimization algorithm | |
CN109726735A (en) | A kind of mobile applications recognition methods based on K-means cluster and random forests algorithm | |
CN109544399B (en) | Power transmission equipment state evaluation method and device based on multi-source heterogeneous data | |
CN101738998B (en) | System and method for monitoring industrial process based on local discriminatory analysis | |
US11775375B2 (en) | Automated incident detection and root cause analysis | |
CN108647707B (en) | Probabilistic neural network creation method, failure diagnosis method and apparatus, and storage medium | |
CN110378124A (en) | A kind of network security threats analysis method and system based on LDA machine learning | |
CN111126820A (en) | Electricity stealing prevention method and system | |
CN112200263B (en) | Self-organizing federal clustering method applied to power distribution internet of things | |
CN117591944B (en) | Learning early warning method and system for big data analysis | |
CN114298175A (en) | Power equipment state monitoring and fault early warning method and system based on edge calculation | |
CN105303771A (en) | Fatigue judging system and method | |
CN114168941A (en) | Big data monitoring method based on electric power operation and maintenance | |
CN105354622A (en) | Fuzzy comprehensive evaluation based enterprise production management evaluation method | |
CN115146726A (en) | KPI abnormity early warning method in intelligent operation and maintenance based on machine learning | |
CN107590733A (en) | Platform methods of risk assessment is borrowed based on the net of geographical economy and social networks | |
CN112685272B (en) | Interpretable user behavior abnormity detection method | |
CN112016769B (en) | Method and device for managing relative person risk prediction and information recommendation | |
CN117421994A (en) | Edge application health monitoring method and system | |
CN116485020B (en) | Supply chain risk identification early warning method, system and medium based on big data | |
Wang et al. | What maintenance is worth the money? a data-driven answer |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20180622 |
|
WD01 | Invention patent application deemed withdrawn after publication |