CN108200005A - Electric power secondary system network flow abnormal detecting method based on unsupervised learning - Google Patents

Electric power secondary system network flow abnormal detecting method based on unsupervised learning Download PDF

Info

Publication number
CN108200005A
CN108200005A CN201710828411.3A CN201710828411A CN108200005A CN 108200005 A CN108200005 A CN 108200005A CN 201710828411 A CN201710828411 A CN 201710828411A CN 108200005 A CN108200005 A CN 108200005A
Authority
CN
China
Prior art keywords
neuron
network flow
secondary system
som
input vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710828411.3A
Other languages
Chinese (zh)
Inventor
龚向阳
胡铁军
戚军
谢宏
章杜锡
江昊
周飞
焦旭明
周媛
马骁
吕超
王景
高明慧
梁野
董晨晖
林祺蓉
王俏俏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kedong Electric Power Control System Co Ltd
Ningbo Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Jinan Power Supply Co of State Grid Shandong Electric Power Co Ltd
Original Assignee
Beijing Kedong Electric Power Control System Co Ltd
Ningbo Power Supply Co of State Grid Zhejiang Electric Power Co Ltd
Jinan Power Supply Co of State Grid Shandong Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kedong Electric Power Control System Co Ltd, Ningbo Power Supply Co of State Grid Zhejiang Electric Power Co Ltd, Jinan Power Supply Co of State Grid Shandong Electric Power Co Ltd filed Critical Beijing Kedong Electric Power Control System Co Ltd
Priority to CN201710828411.3A priority Critical patent/CN108200005A/en
Publication of CN108200005A publication Critical patent/CN108200005A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/02Capturing of monitoring data
    • H04L43/028Capturing of monitoring data by filtering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/142Network analysis or design using statistical or mathematical methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a kind of electric power secondary system network flow abnormal detecting methods based on unsupervised learning, include the following steps:S1, acquires the log information of equipment in electrical secondary system, and is pre-processed, and obtains historic training data;S2 is trained SOM nets using historic training data, and passes through crosscheck and obtain final detection model;S3, the log information for acquiring equipment in electrical secondary system in real time obtain input vector, input vector are inputted final detection model, the state value of current network flow is worth to according to the state of input vector.This method can timely and effectively find exception of network traffic, improve the treatment effeciency of Network Abnormal, effectively reduce the loss that Network Abnormal is brought.

Description

Electric power secondary system network flow abnormal detecting method based on unsupervised learning
Technical field
The present invention relates to a kind of network flow detection method more particularly to a kind of Power Secondary systems based on unsupervised learning System network flow abnormal detecting method;Belong to power communication security technology area.
Background technology
Electric power secondary system refers to by electric power monitoring systems at different levels and dispatching data network (SPDnet) and management at different levels The system that information system and electric power data communication network network (SPTnet) are formed.Electric power secondary system is the important of power system security Component part, with dispatching of power netwoks and the safe operation tight association of control system.There is a large amount of safety in electric power secondary system Equipment, operation system, the network size of composition is huge, complicated, along with a large amount of data communication service.Therefore in real time It monitors the network flow in electric power secondary system and finds when that there are Traffic Anomalies for where, transport the safety and stability of system in time Row is most important.
At present, the traditional method of Traffic anomaly detection is to draw flow curve (or baseline), and history is being compared just by algorithm The difference of normal flow curve and monitoring curve, so as to carry out the judgement of Traffic anomaly detection.Random device study in recent years and data The fast development of the methods of excavation, a large number of researchers explore application of the relevant technologies in network flow detection field.
It is delivered in Jiang Honghong, Zhang Tao, Zhao Xinjian et al.《Power Information Network Traffic anomaly detection machine based on big data System》(《Telecommunication science》, 2017,33 (3):In 134-141), a kind of Traffic anomaly detection mechanism based on big data is disclosed, The mechanism introduces the local outlier factor learning method (LOF) based on density and the support vector data description based on distance (SVDD) learning method carries out abnormality detection Electricity Information Network data on flows.It is common in Wu Liu, Zhang Situo, Lei Tong, Cai Yaoguang It delivers《A kind of flow detection model towards Power System Data Net network》(《Information communicates》, 2013 (7):In 45-46), carry Go out based on Dynamic Baseline theory, using tools such as SNMP, NetFlow, DPI, establish the network flow inspection towards electric system It surveys model and network flow is analyzed from different dimensions.It is delivered jointly in Zhuan Zhengmao, Chen Xingshu, Shao Guolin, Ye Xiaoming 's《A kind of abnormal traffic detection model of temporal correlation》(《Journal of Shandong university》:Edition, 2017,52 (3):68-73) In, propose a kind of clustering algorithm structure temporal correlation Traffic Anomaly inspection combined based on distributive law, cluster deviation and closeness Survey model, experiment prove the model can assisted network administrator note abnormalities in time, but need the support of a large amount of historical datas, not Suitable for fluctuating larger network environment.
Above-mentioned document obtains certain achievement in research in Power System Data Net network Traffic anomaly detection technical research, but Its abnormal cause is not inquired into.In order to find exception of network traffic in time and position abnormal cause, self-organizing will be based on The unsupervised machine learning method for mapping (SOM) is applied to the Traffic anomaly detection of electric power secondary system and Traffic Anomaly reason point It is very necessary in analysis.
Invention content
In view of the deficiencies of the prior art, the technical problems to be solved by the invention are to provide a kind of based on unsupervised learning Electric power secondary system network flow abnormal detecting method.
For achieving the above object, the present invention uses following technical solutions:
A kind of electric power secondary system network flow abnormal detecting method based on unsupervised learning, includes the following steps:
S1, acquires the log information of equipment in electrical secondary system, and is pre-processed, and obtains historic training data;
S2 is trained SOM nets using historic training data, and passes through crosscheck and obtain final detection model;
S3, the log information for acquiring equipment in electrical secondary system in real time obtain input vector, and input vector is inputted most final inspection Model is surveyed, the state value of current network flow is worth to according to the state of input vector.
Wherein preferably, in step sl, the log information of acquisition is pre-processed, included the following steps:
S11 according to the characteristics of log information, filters out particular log type from electrical secondary system in the log information of equipment The data record daily record of mark;
S12 carries out data cleansing to the data after filtering, and the tcp for selecting the unit interval in measurement period receives message Number, tcp send message number, udp received data packets number, udp transmission data packet arrays into array;
S13 was calculated in the adjacent cells time, with the difference of dvielement in array, generated historic training data.
Wherein preferably, in step s 2, SOM nets are trained using historic training data, and pass through inspection and examine Detection model is obtained, is included the following steps:
Historic training data is divided into K parts, uses D respectively by S211, D2……DkIt represents, wherein K is positive integer;
S22, in the i-th wheel, DiFor test data, the data of remaining K-1 part are as training data, using training data pair SOM nets are trained, and obtain K SOM detection model, wherein i=1,2 ... K;
S23 passes sequentially through test data DiThe accuracy of i-th of SOM detection model is calculated, by the highest SOM of accuracy Detection model is chosen for final detection model.
Wherein preferably, in step S22, SOM nets is trained using training data, are included the following steps:
S221 obtains a test vector from training data, finds the corresponding current training neuron of test vector;
S222 finds the adjacent neurons of the corresponding current training neuron of test vector, updates current training god Value through member and adjacent neurons;
S223 repeats step S221~S222, until not having the test vector not being acquired in training data, obtains one Preliminary SOM detection models;
S224 calculates the neighboring region of each neuron in preliminary SOM models, if the neighboring region is less than adjacent threshold Value then marks the neuron as normal value, otherwise, the neuron is marked to obtain a SOM detection model for exceptional value.
Wherein preferably, in step S221, the corresponding current training neuron of test vector is found, including walking as follows Suddenly:
Calculate Euclidean distance of the weight vectors of each neuron in SOM nets with measuring vector;
The weight vectors of each neuron carry out ascending order arrangement with measuring the Euclidean distance of vector during SOM is netted, and obtain Obtain the neuron of Euclidean distance minimum;
The neuron of Euclidean distance minimum is selected as to current training neuron.
Wherein preferably, in step S222, the value of the update adjacent neurons, using equation below:
W (t+1)=W (t)+N (v, t) L (t) (D (t)-W (t));
Wherein, W (t) is the weight vectors in t moment;D (t) is the measurement vector in t moment;N (v, t) is adjacent nerve The distance function of member;L (t) represents learning coefficient.
Wherein preferably, in step S225, the neighboring region of a neuron of calculating is neuron to its upper bottom left The sum of right manhatton distance of adjacent neurons;
Use weight vectors Wi=[ω1,i,…,ωk,i], Wj=[ω1,j,…,ωk,j] calculate two neuron Ni、Nj's Manhatton distance, using equation below:
Wherein, M (Ni, Nj) it is two neuron Ni、NjManhatton distance;ω1,iAnd ω1,jRespectively two neurons Ni、NjWeight vectors in value.
Wherein preferably, in step s3, the log information of equipment obtains input vector in acquisition electrical secondary system in real time, will Input vector inputs final detection model, the state value of current network flow is worth to according to the state of input vector, including such as Lower step:
S31, the log information for acquiring equipment in electrical secondary system in real time obtain input vector, and input vector is inputted most final inspection Model is surveyed, finds the corresponding current training neuron of input vector;
S32 obtains the state value of the corresponding current training neuron of input vector, according to current training neuron State value determines the state value of input vector
S33 is worth to the state value of current network flow according to the state of input vector.
Wherein preferably, the electric power secondary system network flow abnormal detecting method based on unsupervised learning, also Include the following steps:
S4, if the state value of current network flow is improper, by finding the corresponding neuron of input vector and phase The difference index of adjacent normal neurons obtains current network flow abnormal cause.
Wherein preferably, in step s 4, if the state value of current network flow is improper, by find input to The difference index of corresponding neuron and adjacent normal neurons is measured, current network flow abnormal cause is obtained, including walking as follows Suddenly:
S41 if the state value of current network flow is improper, calculates the neuron to adjacent normal neurons Euclidean distance;
S42 calculates improper neuron and difference of the normal neurons in weight vectors in each index, and will difference Value carries out descending arrangement, obtains single index rank list;
S43 summarizes each single index rank list, determines the index for causing current network flow exception.
Electric power secondary system network flow abnormal detecting method provided by the present invention based on unsupervised learning, by adopting SOM nets are trained with historic training data, and passes through crosscheck and obtains final detection model;In actual use process In, the log information for acquiring equipment in electrical secondary system in real time obtains input vector, and input vector is inputted final detection model, root The state value of current network flow is worth to according to the state of input vector.This method can timely and effectively find that network flow is different Often, the loss that Network Abnormal is brought effectively is reduced.In addition to this, if the state value of current network flow is improper, lead to It crosses and finds the corresponding neuron of input vector and the difference index of adjacent normal neurons, it is extremely former to obtain current network flow Cause.The abnormal network cause of acquisition for decision-maker is referred to, timely and effectively Network Abnormal can be handled, carried The high efficiency of Network Abnormal processing.
Description of the drawings
Fig. 1 is the electric power secondary system network flow abnormal detecting method provided by the present invention based on unsupervised learning Flow chart;
Fig. 2 is the schematic diagram that is trained using training data to SOM nets in one embodiment provided by the present invention.
Specific embodiment
Detailed specific description is carried out to the technology contents of the present invention in the following with reference to the drawings and specific embodiments.
To analyze numerous equipment and server traffic behavior in electrical secondary system, using the non-supervisory engineering netted based on SOM It practises algorithm to be analyzed for the device log in the electrical secondary system collected, be pushed away so as to fulfill Traffic anomaly detection and abnormal cause Survey analysis.
As shown in Figure 1, the electric power secondary system exception of network traffic detection provided by the present invention based on unsupervised learning Method includes the following steps:First, the log information of equipment in electrical secondary system is acquired, and is pre-processed, obtains history training Data;Then, SOM nets are trained using historic training data, and pass through crosscheck and obtain final detection model;Most Afterwards, the log information of equipment obtains input vector in acquisition electrical secondary system in real time, and input vector is inputted final detection model, root The state value of current network flow is worth to according to the state of input vector.This processing procedure is described in detail below.
S1, acquires the log information of equipment in electrical secondary system, and is pre-processed, and obtains historic training data.
The log information of equipment in electrical secondary system is acquired, in embodiment provided by the present invention, the data used are to adopt The log information of equipment in the electrical secondary system collected, time interval 2min.
Then the log information of acquisition is pre-processed, specifically comprised the following steps:
S11 according to the characteristics of log information, filters out particular log type from electrical secondary system in the log information of equipment The data record daily record of mark.
Outflow correlation log is filtered from electrical secondary system massive logs.Only Log Types is selected to be identified as here The data record daily record of " nettcp " and " netudp ".Nettcp data and netudp data are traffic statistics, numerical value Larger and sustainable growth, variation tendency can more reflect current network flow state, and gathered data is as shown in table 1.
1 gathered data information table of table
S12 carries out data cleansing to the data after filtering, and the tcp for selecting the unit interval in measurement period receives message Number, tcp send message number, udp received data packets number, udp transmission data packet arrays into array.
Data cleansing is carried out to the data after filtering, wherein cleaning method can be that application No. is 201510129479.3 The cleaning method that is there is provided in a kind of data cleaning method based on ETL or application No. is in 201710103779.3 The cleaning method or existing arbitrary data provided in distributed data cleaning system and method based on data analysis is clear Washing method is not specifically limited herein.For convenience of monitoring device traffic conditions, unit interval is only taken in measurement period here " tcp receives message number ", " tcp sends message number ", " udp received data packets number ", " udp transmission data packets number ".In the present invention In the embodiment provided, the unit interval could be provided as 1 minute.
S13 was calculated in the adjacent cells time, with the difference of dvielement in array, generated historic training data.
Nettcp data and netudp data are larger for traffic statistics its numerical value and sustainable growth, and variation tendency is more It can reflect current network flow state, therefore extract transmitting-receiving (section) packet number in two types data and form new array, and count It calculates in the adjacent cells time, with the difference of dvielement in array, generation historic training data is as new training data group or survey Try array.Treated, and data sample is as shown in table 2.
2 training data group of table shows table
S2 is trained SOM nets using historic training data, and passes through crosscheck and obtain final detection model.Its In it is preferred, SOM nets are trained using historic training data, the K SOM detection moulds for being obtained training by crosscheck Type is ranked up according to accuracy, and the highest SOM detection models of accuracy are chosen for final detection model.
At present, the machine learning frame based on Distributed Computing Platform mainly has the Mahout based on Hadoop platform, base In MLlib and figure operation frame Graph Lab of Spark platforms etc..Wherein, Mahout is based on MapReduce computation module, Therefore it needs to carry out frequent disk read-write operation when Hadoop platform performs operation, calculated performance is poor;Graph Lab masters The computing platform of figure operational model is directed to, application range is relatively narrow;MLlib is realized based on Spark platforms, is towards iterative interior It deposits the demand of calculating and designs, therefore more suitable for the application scenarios of machine learning, MLlib has apparent in these scenes Performance advantage.Spark platforms can be carried significantly by introducing RDD (resilient distributed datasets) model High processing rate makes it outclass Hadoop in terms of interactive mode calculating and iterative calculation;The advantage for being good at iterative calculation causes Spark processing platforms are very beneficial for the data mining of big data.In embodiment provided by the present invention, Ubuntu is run on System, distributed computing environment is using Spark MLlib.
Biological study shows the sensory channel of human brain, what the tissue of neuron was ordered into.When extraneous specific information inputs When, cerebral cortex specific region can be excited, and similar external information, is contone in the region.It is refreshing in cerebral cortex This features of response through member, not innately, but by learning what self-organizing was formed the day after tomorrow.Finland in 1981 Professor T.kohonen proposes a kind of self-organizing feature map net similar with human brain Self-organization, abbreviation SOM nets.SOM algorithms The mapping space that simultaneously can preferably map high-dimensional input spaces to a low-dimensional without manual identification's training set (is typically two dimensions Degree), while retain the topological property of original input space.Therefore, SOM can not lose while multivariable action learning is handled Lose any typical behaviour.In embodiment provided by the present invention, to the day constantly acquired from each business host or equipment Will data carry out pretreatment generation and measure vector D (t)=[x1, x2, x3, x4], wherein x1Represent that tcp receives message variable quantity, x2 Represent that tcp sends message variable quantity, x3Represent udp received data packets variable quantity, x4It represents udp transmission data packet variable quantities, is used in combination This vector goes that SOM nets is trained to obtain SOM detection models as input.Here it is each business host or opening of device one SOM goes to learn its network-flow characteristic.
Weight vectors are initialized due to the use of random number, these weight vectors can only represent a training in some mappings The subset of data.This will lead to only have least a portion of neuron to be trained in the mapping, eventually lead to the quality of data of the mapping It is deteriorated.Therefore in embodiment provided by the present invention, this is solved the problems, such as using crosscheck, using historic training data SOM nets are trained, and passes through inspection and examines acquisition detection model, are specifically comprised the following steps:
Historic training data is divided into K parts, uses D respectively by S211, D2……DkIt represents, wherein K is positive integer.
Historic training data is divided into K parts, uses D respectively1, D2……DkIt represents.Test model is generated by crosscheck Process need K wheel complete, wherein K be positive integer.
S22, in the i-th wheel, DiFor test data, the data of remaining K-1 part are as training data, using training data pair SOM nets are trained, and obtain K SOM detection model, wherein i=1,2 ... K.
In the i-th wheel, DiSelected as test data, and the data of other K-1 parts are as training data, using training Data are trained SOM nets, specifically comprise the following steps:
S221 obtains a test vector from training data, finds the corresponding current training neuron of test vector.
One SOM net is combined by the collection of one group of neuron, such as Fig. 2.Each neuron is with a weight vectors and one Mapping point expression, wherein, the neuron weight vectors in Fig. 2 marked as 1 are [0,2,4,2].Weight vectors are with measuring vector The length of (D (t)) is identical, it can dynamically update oneself according to the value of the measurement vector in training data.
In the training study stage, a competition learning process is crossed to adjust the weight vectors of different neurons by SOM Netcoms.From A test vector is obtained in training data, the corresponding current training neuron of test vector is found, specifically includes following step Suddenly:
Calculate Euclidean distance of the weight vectors of each neuron in SOM nets with measuring vector;
Point x=(x1,...,x4) and y=(y1,...,y4) between Euclidean distance be:
Wherein x is weight vectors, and y is measures vector.
The weight vectors of each neuron carry out ascending order arrangement with measuring the Euclidean distance of vector during SOM is netted, and obtain Obtain the neuron of Euclidean distance minimum;
The neuron of Euclidean distance minimum is selected as to current training neuron.
For example, Fig. 2 illustrate one be used measure vector A [0,2,4,2] as input be trained include 9 The SOM networks of neuron.A is calculated first to the Euclidean distance of each neuron.Since neuron 1 arrives the Euclid of A Distance is minimum, so neuron 1 is chosen as current training neuron.
S222 finds the adjacent neurons of the corresponding current training neuron of test vector, updates current training god Value through member and adjacent neurons.
The value of neuron 1 and its adjacent neurons will be updated.In embodiment provided by the present invention, adjacent god is defined It is 1 through the radius r between member.So the adjacent neurons of neuron 1 are 2,4,5.Update power of the neuron in t moment The expression formula of weight vector is formula (1):
W (t+1)=W (t)+N (v, t) L (t) (D (t)-W (t)) (2)
Wherein, W (t) and D (t) is illustrated respectively in the weight vectors of t moment and measures vector.N (v, t) is an expression phase The function of adjacent neuron distance.L (t) represents learning coefficient, changes in study course intermediate value for determining each weight vectors How much.
The learning coefficient L (t) of formula (1) that study course shown in Fig. 2 uses is that 1 distance function N (v, t) is 1/ 4.A simple distance function has been used in embodiment provided by the present invention, it can be according to reality during specific use Border needs to use more complicated distance function.In example, the weight vectors of neuron 2 are [4,2,4,2], and input vector is [0,2,4,2].The distance for calculating input vector and weight vectors obtains [- 4,0,0,0], then multiplied by with 1/4 and 1, i.e. (D (t)-W (t)) * N (v, t) * L (t), obtain [- 1,0,0,0], finally plus initial weight vectors [4,2,4,2] up to [3,2, 4,2], which is 2 updated weight vectors of neuron.In fig. 2, it is all to want newer neuron overstriking is shown. This method makes to train neuron and its adjacent neurons are close to converge on input measurement vector space.
S223 repeats step S221~S222, until not having the test vector not being acquired in training data, obtains one Preliminary SOM detection models.
Newer adjacent neurons are repeated step S222 by S223, until institute in SOM nets Until some neurons are all updated.
Step S221~S222 is repeated, until neuron all in SOM nets is all updated.When each input Measurement vector be all used for update neural network it is multiple after, the study stage just completes.At this point, the weight vectors of neuron can It is extensive to represent entire measure vector space one.Therefore, SOM nets can under different workloads acquisition system flow Behavior.
S224 calculates the neighboring region of each neuron in preliminary SOM models, if the neighboring region is less than adjacent threshold Value then marks the neuron as normal value, otherwise, the neuron is marked to obtain a SOM detection model for exceptional value.
In any given period, changes in flow rate behavior can divide three phases:Normally, pre- exception is abnormal.Whenever more The weight vectors of a new neuron a pair neuron adjacent with it can be updated simultaneously.After study, frequent trained god The weight vectors of identical measurement vector modification adjacent neurons can be used through member.Finally, the weight for the neuron often trained Vector will be similar with the weight vectors of its adjacent neuron.Since flow system flow variation behavior is typically located at normal condition, because This, the frequency of training for representing the neuron of normal condition is more much higher than the neuron for representing pre- failure and status of fail.Therefore, There are one the neuron colonies for representing different normal discharge behaviors for meeting.
Neighboring region size is calculated by checking the direct neighbor neuron of each neuron.Here neuroid is opened up It is two-dimentional to flutter structure, so neuron of the inspection department in the position up and down of each neuron.Use weight Wi= [ω1,i,…,ωk,i], Wj=[ω1,j,…,ωk,j] two neuron N of vector calculatingi, NjManhatton distance, formula is as follows.
Define a neuron NiNeighboring region size be neuron to the Man Ha of its adjacent neurons up and down Sum of the distance, uses N respectivelyT, NB, NL, NRRepresent neuron up and down.
Determine whether the neuron is abnormal by the neighboring region size of a neuron.In reality provided by the present invention It applies in example, adjacent threshold value is obtained by the way that a large amount of log informations are carried out with analysis mining, if neighboring region area is less than the adjoining Threshold value illustrates the neuron and other neurons relatively, is normal value.If neighboring region area is more than the adjoining threshold Value, illustrates that the neuron is not close to other neurons, is then determined as exceptional value.
In the i-th wheel, DiFor test data, the data of remaining K-1 part are as training data, using training data to SOM Net is trained, and implements above-mentioned steps S221~S225, obtains a SOM detection model, and i values are incremented by one by one from 1~K, weight Multiple above-mentioned steps S221~S225, you can obtain K SOM detection model.
S23 passes sequentially through test data DiThe accuracy of i-th of SOM detection model is calculated, by the highest SOM of accuracy Detection model is chosen for final detection model.
The accuracy of each SOM networks can be estimated by collecting various statistic of classifications correctly and incorrectly.At this It is unsupervised by being then based in the provided one embodiment of invention, so training SOM using unmarked normal data Network.In the i-th wheel, DiFor test data, test data is input in i-th of SOM detection model, calculates i-th of SOM detection The accuracy of model.Each SOM detection models use equation below accuracy in computation:
Wherein, NfpIt represents wrong report number, that is, report exception but is practically without being abnormal.NfnNumber is failed to report in expression, is not reported It accuses exception but exception actually has occurred.NtpIt represents, without wrong report number, to report exception and be actually also abnormal.NtnIt represents Number is not failed to report, does not report that exception is not also abnormal actually.Since training data is all normal data, so Nfn= Ntp=0.But due to during SOM detection models are built there may be deviation, so not all SOM detection models Accuracy A be 1, the accuracy of K SOM detection model of calculating is arranged, accuracy of selection highest SOM detections Model is as final detection model.
S3, the log information for acquiring equipment in electrical secondary system in real time obtain input vector, and input vector is inputted most final inspection Model is surveyed, the state value of current network flow is worth to according to the state of input vector.
The log information of equipment in electrical secondary system is acquired in real time, is obtained tcp and is received message variable quantity, tcp transmission message changes Change amount, udp received data packets variable quantity and udp transmission data packet variable quantities, the vector formed are inputted as input vector To obtained final detection model, according to normal, pre- abnormal, the abnormal state value of input vector, current network flow is obtained It is normal whether state value, specifically comprise the following steps:
S31, the log information for acquiring equipment in electrical secondary system in real time obtain input vector, and input vector is inputted most final inspection Model is surveyed, finds the corresponding current training neuron of input vector.
Wherein, the process of the corresponding current training neuron of input vector is found, is specifically comprised the following steps:
Calculate the Euclidean distance of the weight vectors of each neuron and input vector in final detection model;
The Euclidean distance of the weight vectors of neuron each in final detection model and input vector is subjected to ascending order Arrangement obtains the neuron of Euclidean distance minimum;
The neuron of Euclidean distance minimum is selected as to current training neuron.
S32 obtains the state value of the corresponding current training neuron of input vector, according to current training neuron State value determines the state value of input vector;I.e.:
If the state value of current training neuron is normal, the state value of input vector is normal;It is if current Training neuron state value for pre- exception, then the state value of input vector is pre- exception;If current training neuron State value be abnormal, then the state value of input vector is abnormal.
S33 is worth to the state value of current network flow according to the state of input vector, i.e.,:
If the state value of input vector is normal, the state value of current network flow is normal;If input vector State value for pre- exception, then the state value of current network flow is pre- exception;If the state value of input vector is exception, The state value of current network flow is abnormal.
S4, if the state value of current network flow is improper, by finding the corresponding neuron of input vector and phase The difference index of adjacent normal neurons obtains current network flow abnormal cause.
In embodiment provided by the present invention, by prompting which index that (tcp receives message number, tcp sends message Number, udp received data packets number, udp transmission data packets number) Traffic Anomaly reason is speculated to abnormal contribution maximum.Although simultaneously Cannot identify leads to abnormal immediate cause, but can search abnormal immediate cause so that aid prompting is started with from what angle.Due to Final detection model saves the topological attribute for measuring vector, can cause abnormal wrong index by calculating therefrom identification. Embodiment provided by the present invention is normally to be compared with abnormal neuron, finds out most different finger in weight vectors Mark, and then obtain current network flow abnormal cause.If the state value of current network flow is improper, inputted by finding The corresponding neuron of vector and the difference index of adjacent normal neurons, obtain current network flow abnormal cause, specifically include Following steps:
S41 if the state value of current network flow is improper, calculates the neuron to adjacent normal neurons Euclidean distance.
When mapping neuron of the input vector to exception, the neuron is calculated to adjacent normal neurons Euclidean distance.This method needs avoid being compared with abnormal neuron, because abnormal neuron is originally in In nondeterministic statement it is easy make mistake index prompting.In embodiment provided by the present invention, it can calculate first each A neighboring region area if above the normality threshold of setting, is then determined as abnormal neuron, just changes another adjacent nerve Member relatively, can also directly represent the state value (normal, pre- abnormal or abnormal) of the neuron in the training process.In the present invention It in the embodiment provided, is not particularly limited, the state value that neighboring region area obtains neuron can be calculated in real time, also may be used To store the state value of the neuron in advance in training.In this embodiment, threshold value is compared in setting, it is ensured that enough is normal Neuron can be compared with current improper neuron, to obtain the result that accuracy meets actual demand.If Normal neuron is not found in the adjacent neurons of abnormal neuron to compare or the number of normal neuron is less than Compare threshold value and just expand calculating distance to include more neurons in SOM networks.In order to ensure correctness, carried in the present invention In the embodiment of confession, compare threshold value with Q representatives, compared using Q normal neurons with abnormal neuron (at this In embodiment, Q=5).
S42 calculates improper neuron and difference of the normal neurons in weight vectors in each index, and will difference Value carries out descending arrangement, obtains single index rank list.
When the set of normal neurons is decided, calculating normal neurons exist with abnormal neuron (improper neuron) Difference in weight vectors in each index.In embodiment provided by the present invention, which, which may be used, subtracts each other or arbitrarily The algorithm of the two difference value can be embodied, is not specifically limited herein.When being subtracted each other using the two, just have since result of calculation has It is negative, the absolute value of result is taken here.Then it sorts from big to small to the result that different indexs calculates, one individual event of generation refers to Mark rank list.In this way, in abnormal neuron and set each normal neurons relatively after, Q single index will be obtained Rank list.
S43 summarizes each single index rank list, determines the index for causing current network flow exception.
Each single index rank list is summarized, determines the index for causing current network flow exception, i.e.,:It is right Each single index rank list is summarized, according to the sequence of abnormal index in each single index rank list to index It is counted, descending arrangement finally is carried out to abnormal index according to count value, the highest index of count value is determined as causing and is worked as The index of preceding exception of network traffic.
Each rank list is checked, to determine final index rank list.In order to select final index grade row Table, here using the method for majority voting.Each list is to index ballot most different in oneself list, who gets the most votes's conduct First item in index rank list, second and third and so on.For example 5 rank lists, three display tcp receive message Data variation amount is to cause abnormal maximum reason, and another two show udp send bag data variable quantity be cause it is abnormal most Big reason.At this moment it is the reason of causing abnormal possibility maximum judgement tcp to be received message data variation.
In conclusion the electric power secondary system exception of network traffic detection side provided by the present invention based on unsupervised learning Method by acquiring the log information of equipment in electrical secondary system, and is pre-processed, obtains historic training data;Then, using going through History training data is trained SOM nets, and passes through crosscheck and obtain final detection model;Finally, secondary system is acquired in real time The log information of equipment obtains input vector in system, input vector is inputted final detection model, according to the state of input vector It is worth to the state value of current network flow.This method is generated for the equipment and operation system run in electric power secondary system Massive logs are collected, and are filtered outflow correlation log, using non-supervisory machine learning algorithm, can timely and effectively be found Exception of network traffic, the effective loss for reducing Network Abnormal and bringing.In addition to this, if the state value of current network flow is It is improper, by finding the difference index of the corresponding neuron of input vector and adjacent normal neurons, obtain current network stream Measure abnormal cause.The abnormal network cause of acquisition is referred to for decision-maker, can timely and effectively to Network Abnormal into Row processing improves the efficiency of Network Abnormal processing.
Above to the electric power secondary system network flow abnormal detecting method provided by the present invention based on unsupervised learning It is described in detail.It is right under the premise of without departing substantially from true spirit for those of ordinary skill in the art Any obvious change that it is done will all form to infringement of patent right of the present invention, will undertake corresponding legal liabilities.

Claims (10)

1. a kind of electric power secondary system network flow abnormal detecting method based on unsupervised learning, it is characterised in that including as follows Step:
S1, acquires the log information of equipment in electrical secondary system, and is pre-processed, and obtains historic training data;
S2 is trained SOM nets using historic training data, and passes through crosscheck and obtain final detection model;
S3, the log information for acquiring equipment in electrical secondary system in real time obtain input vector, and input vector is inputted final detection mould Type is worth to the state value of current network flow according to the state of input vector.
2. the electric power secondary system network flow abnormal detecting method based on unsupervised learning as described in claim 1, special Sign is in step sl, to pre-process the log information of acquisition, include the following steps:
S11 according to the characteristics of log information, filters out particular log type identification from electrical secondary system in the log information of equipment Data record daily record;
S12 carries out data cleansing to the data after filtering, select the unit interval in measurement period tcp receive message number, Tcp sends message number, udp received data packets number, udp transmission data packet arrays into array;
S13 was calculated in the adjacent cells time, with the difference of dvielement in array, generated historic training data.
3. the electric power secondary system network flow abnormal detecting method based on unsupervised learning as described in claim 1, special Sign is in step s 2, and SOM nets are trained using historic training data, and passes through inspection and examine acquisition detection model, Include the following steps:
Historic training data is divided into K parts, uses D respectively by S211, D2……DkIt represents, wherein K is positive integer;
S22, in the i-th wheel, DiFor test data, the data of remaining K-1 part are as training data, using training data to SOM nets It is trained, obtains K SOM detection model, wherein i=1,2 ... K;
S23 passes sequentially through test data DiThe accuracy of i-th of SOM detection model is calculated, the highest SOM of accuracy is detected into mould Type is chosen for final detection model.
4. the electric power secondary system network flow abnormal detecting method based on unsupervised learning as claimed in claim 3, special Sign is in step S22, and SOM nets are trained using training data, are included the following steps:
S221 obtains a test vector from training data, finds the corresponding current training neuron of test vector;
S222 finds the adjacent neurons of the corresponding current training neuron of test vector, updates current training neuron And the value of adjacent neurons;
S223 repeats step S221~S222, until not having the test vector not being acquired in training data, obtains one tentatively SOM detection models;
S224 calculates the neighboring region of each neuron in preliminary SOM models, if the neighboring region is less than adjacent threshold value, The neuron is then marked as normal value, otherwise, the neuron is marked to obtain a SOM detection model for exceptional value.
5. the electric power secondary system network flow abnormal detecting method based on unsupervised learning as claimed in claim 4, special Sign is in step S221, finds the corresponding current training neuron of test vector, includes the following steps:
Calculate Euclidean distance of the weight vectors of each neuron in SOM nets with measuring vector;
The weight vectors of each neuron carry out ascending order arrangement with measuring the Euclidean distance of vector during SOM is netted, and obtain Europe The minimum neuron of distance is obtained in several;
The neuron of Euclidean distance minimum is selected as to current training neuron.
6. the electric power secondary system network flow abnormal detecting method based on unsupervised learning as claimed in claim 4, special Sign is:
In step S222, the value of the update adjacent neurons, using equation below:
W (t+1)=W (t)+N (v, t) L (t) (D (t)-W (t));
Wherein, W (t) is the weight vectors in t moment;D (t) is the measurement vector in t moment;N (v, t) is adjacent neurons Distance function;L (t) represents learning coefficient.
7. the electric power secondary system network flow abnormal detecting method based on unsupervised learning as described in claim 1, special Sign is:
In step S225, the neighboring region of a neuron of calculating is neuron to its adjacent neurons up and down The sum of manhatton distance;
Use weight vectors Wi=[ω1,i,…,ωk,i], Wj=[ω1,j,…,ωk,j] calculate two neuron Ni、NjMan Ha Distance, using equation below:
Wherein, M (Ni, Nj) it is two neuron Ni、NjManhatton distance;ω1,iAnd ω1,jRespectively two neuron Ni、Nj Weight vectors in value.
8. the electric power secondary system network flow abnormal detecting method based on unsupervised learning as described in claim 1, special Sign is that in step s3 the log information for acquiring equipment in electrical secondary system in real time obtains input vector, and input vector is inputted Final detection model is worth to the state value of current network flow according to the state of input vector, includes the following steps:
S31, the log information for acquiring equipment in electrical secondary system in real time obtain input vector, and input vector is inputted final detection mould Type finds the corresponding current training neuron of input vector;
S32 obtains the state value of the corresponding current training neuron of input vector, according to the state of current training neuron Value determines the state value of input vector
S33 is worth to the state value of current network flow according to the state of input vector.
9. the electric power secondary system network flow abnormal detecting method based on unsupervised learning as described in claim 1, special Sign is to further include following steps:
S4, if the state value of current network flow to be improper, by find the corresponding neuron of input vector with it is adjacent just The difference index of normal neuron, obtains current network flow abnormal cause.
10. the electric power secondary system network flow abnormal detecting method based on unsupervised learning as claimed in claim 9, special Sign is in step s 4, if the state value of current network flow is improper, by finding the corresponding nerve of input vector First difference index with adjacent normal neurons obtains current network flow abnormal cause, includes the following steps:
S41, if the state value of current network flow is improper, the Europe for calculating the neuron to adjacent normal neurons is several In distance;
S42, calculates improper neuron and difference of the normal neurons in weight vectors in each index, and by difference be worth into Row descending arranges, and obtains single index rank list;
S43 summarizes each single index rank list, determines the index for causing current network flow exception.
CN201710828411.3A 2017-09-14 2017-09-14 Electric power secondary system network flow abnormal detecting method based on unsupervised learning Pending CN108200005A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710828411.3A CN108200005A (en) 2017-09-14 2017-09-14 Electric power secondary system network flow abnormal detecting method based on unsupervised learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710828411.3A CN108200005A (en) 2017-09-14 2017-09-14 Electric power secondary system network flow abnormal detecting method based on unsupervised learning

Publications (1)

Publication Number Publication Date
CN108200005A true CN108200005A (en) 2018-06-22

Family

ID=62572779

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710828411.3A Pending CN108200005A (en) 2017-09-14 2017-09-14 Electric power secondary system network flow abnormal detecting method based on unsupervised learning

Country Status (1)

Country Link
CN (1) CN108200005A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109145431A (en) * 2018-08-10 2019-01-04 国家电网公司西南分部 A kind of modeling method and device monitoring electric power secondary system operating status
CN109886833A (en) * 2019-01-21 2019-06-14 广东电网有限责任公司信息中心 A kind of deep learning method of smart grid-oriented server traffic abnormality detection
CN110674940A (en) * 2019-09-18 2020-01-10 上海擎创信息技术有限公司 Multi-index anomaly detection method based on neural network
CN111523661A (en) * 2020-04-21 2020-08-11 厦门利德集团有限公司 Neural network optimization method for electric power safety operation based on information entropy and self-expectation
CN113079150A (en) * 2021-03-26 2021-07-06 深圳供电局有限公司 Intrusion detection method for power terminal equipment
CN113138875A (en) * 2021-04-22 2021-07-20 深圳感臻科技有限公司 Fault detection method, terminal and computer storage medium
CN113228585A (en) * 2018-10-23 2021-08-06 阿卡麦科技公司 Network security system with feedback loop based enhanced traffic analysis
CN116846060A (en) * 2023-03-08 2023-10-03 国网江苏省电力有限公司淮安供电分公司 Working condition safety learning system of IEC61850 intelligent substation
WO2024007615A1 (en) * 2022-07-05 2024-01-11 华为云计算技术有限公司 Model training method and apparatus, and related device
CN117692207A (en) * 2023-12-12 2024-03-12 国网湖北省电力有限公司鄂州供电公司 Instruction-level power system service protection method based on weighted similarity matching

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109145431A (en) * 2018-08-10 2019-01-04 国家电网公司西南分部 A kind of modeling method and device monitoring electric power secondary system operating status
CN113228585B (en) * 2018-10-23 2023-03-31 阿卡麦科技公司 Network security system with feedback loop based enhanced traffic analysis
CN113228585A (en) * 2018-10-23 2021-08-06 阿卡麦科技公司 Network security system with feedback loop based enhanced traffic analysis
CN109886833A (en) * 2019-01-21 2019-06-14 广东电网有限责任公司信息中心 A kind of deep learning method of smart grid-oriented server traffic abnormality detection
CN110674940A (en) * 2019-09-18 2020-01-10 上海擎创信息技术有限公司 Multi-index anomaly detection method based on neural network
CN110674940B (en) * 2019-09-18 2023-04-18 上海擎创信息技术有限公司 Multi-index anomaly detection method based on neural network
CN111523661A (en) * 2020-04-21 2020-08-11 厦门利德集团有限公司 Neural network optimization method for electric power safety operation based on information entropy and self-expectation
CN111523661B (en) * 2020-04-21 2023-03-24 厦门利德集团有限公司 Neural network optimization method for electric power safety operation based on information entropy and self-expectation
CN113079150A (en) * 2021-03-26 2021-07-06 深圳供电局有限公司 Intrusion detection method for power terminal equipment
CN113138875A (en) * 2021-04-22 2021-07-20 深圳感臻科技有限公司 Fault detection method, terminal and computer storage medium
CN113138875B (en) * 2021-04-22 2022-12-06 深圳感臻智能股份有限公司 Fault detection method, terminal and computer storage medium
WO2024007615A1 (en) * 2022-07-05 2024-01-11 华为云计算技术有限公司 Model training method and apparatus, and related device
CN116846060A (en) * 2023-03-08 2023-10-03 国网江苏省电力有限公司淮安供电分公司 Working condition safety learning system of IEC61850 intelligent substation
CN117692207A (en) * 2023-12-12 2024-03-12 国网湖北省电力有限公司鄂州供电公司 Instruction-level power system service protection method based on weighted similarity matching
CN117692207B (en) * 2023-12-12 2024-05-03 国网湖北省电力有限公司鄂州供电公司 Instruction-level power system service protection method based on weighted similarity matching

Similar Documents

Publication Publication Date Title
CN108200005A (en) Electric power secondary system network flow abnormal detecting method based on unsupervised learning
KR101323074B1 (en) Intelligence network anomaly detection using a type ⅱ fuzzy neural network
CN108197648A (en) A kind of Fault Diagnosis Method of Hydro-generating Unit and system based on LSTM deep learning models
CN108897954A (en) Wind turbines temperature pre-warning method and its system based on BootStrap confidence calculations
CN108320040A (en) Acquisition terminal failure prediction method and system based on Bayesian network optimization algorithm
CN109726735A (en) A kind of mobile applications recognition methods based on K-means cluster and random forests algorithm
CN109544399B (en) Power transmission equipment state evaluation method and device based on multi-source heterogeneous data
CN101738998B (en) System and method for monitoring industrial process based on local discriminatory analysis
US11775375B2 (en) Automated incident detection and root cause analysis
CN108647707B (en) Probabilistic neural network creation method, failure diagnosis method and apparatus, and storage medium
CN110378124A (en) A kind of network security threats analysis method and system based on LDA machine learning
CN111126820A (en) Electricity stealing prevention method and system
CN112200263B (en) Self-organizing federal clustering method applied to power distribution internet of things
CN117591944B (en) Learning early warning method and system for big data analysis
CN114298175A (en) Power equipment state monitoring and fault early warning method and system based on edge calculation
CN105303771A (en) Fatigue judging system and method
CN114168941A (en) Big data monitoring method based on electric power operation and maintenance
CN105354622A (en) Fuzzy comprehensive evaluation based enterprise production management evaluation method
CN115146726A (en) KPI abnormity early warning method in intelligent operation and maintenance based on machine learning
CN107590733A (en) Platform methods of risk assessment is borrowed based on the net of geographical economy and social networks
CN112685272B (en) Interpretable user behavior abnormity detection method
CN112016769B (en) Method and device for managing relative person risk prediction and information recommendation
CN117421994A (en) Edge application health monitoring method and system
CN116485020B (en) Supply chain risk identification early warning method, system and medium based on big data
Wang et al. What maintenance is worth the money? a data-driven answer

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20180622

WD01 Invention patent application deemed withdrawn after publication