Background technology
Along with country's intelligent grid building-up work is progressively recommended, intelligent substation has been enter into the all-round construction stage, and intelligence becomes
Power station is exactly that intelligent substation is integrated with a large amount of Intelligent electric compared to one of the important difference of tradition transformer station and digital transformer substation
Subset, information-based, digitized and interactive degree are promoted, and bring being significantly increased of data volume.And intelligent substation
Body monitoring system then establishes in the data set of entirely stand unified database, different sub-systems and equipment and leaves unified database in
In, although quantity of information has had the raising of amount, but without the valid data processing method being proposed for mass data.Intelligence at present
Information classification and the retrieval of transformer station's integrated monitoring system mostly use temporally, carried out by interval, by alarm grade etc.
Filter and screening, when carrying out information comprehensive analysis and intelligent alarm analyzing and processing, be only capable of obtaining individual data from data base
Information, the dependency between data cannot be obtained by conventional means, causes the senior application of intelligent substation also to rest on use
On the basis of unalterable rules storehouse, it is impossible to realize intellectual analysis and the alarm function of efficiently and accurately.
Along with going deep into of State Grid Corporation of China's " three collection five are big " System Construction, the especially development of " big operation, general overhaul "
Building, the construction to main station system proposes higher intelligent requirements.The proposition of new demand also makes main station system to power transformation
The demand data stood has had great change, is no longer limited to traditional " four is distant " information and control command information, more needs
The panoramic information of transformer station is (such as electrical network real-time traffic information, a secondary device status monitoring information, equipment account and configuration letter
Breath, illustraton of model shape file etc.) and a large amount of pretreated object information (intelligent alarm information, far information
Deng).New demand makes main website that the approach of numerous data acquisitions is proposed new demand, and under traditional approach, Various types of data stores respectively
In each equipment, not only obtain difficulty data, and the data obtained are the most extremely limited, and new demand urgently changes distributed obtaining
The mode fetched data, the substitute is from the acquisition of a data centre punch one, and this needs the body of existing substation supervisory control system
System structure is optimized and rebuilds, thus the unified database in intelligent substation arises at the historic moment, and adds that smart machine is certainly
The monitoring alarm number of signals of body is also substantially improved, thus causes the quantity of information of intelligent substation data base to have matter relatively before
Leap, tradition transformer station and the data processing method of digital transformer substation cannot meet integrated monitoring system application need
Ask, therefore occur in that data processing method based on data digging method is imagined.
So-called data mining refers to disclose from the mass data of data base implicit, not previously known and has potential valency
The non-trivial process of the information of value.Data mining is a kind of decision support processes, it be based primarily upon artificial intelligence, machine learning,
Pattern recognition, statistics, data base, visualization technique etc., analyze the data of enterprise increasingly automatedly, make pushing away of inductive
Reason, therefrom excavates potential pattern, aid decision making person's adjustable strategies, reduces risks, make correct decision-making.And for electric power
For automatic field, the substation operation data of the magnanimity the most month after month preserved in background system data, transport including electrical network
The various data such as row data, apparatus warning information, accident action data, recorder data, power equipment essential information, how
It is effectively processed and analyzes be present stage transformer station promote intelligent level key, be derived from other field
Data digging method just meet intelligent substation integrated monitoring system in data in the urgent demand of dependency.
Data mining technology just obtains the extensive concern of every profession and trade at the beginning of coming out, and it is also carried out in a large number by power system
Research, proposes various mining algorithm and application model for electric power system data feature.But for transformer station, data are dug
Correlational study work is not carried out in pick, it is considered to the integrated monitoring system popularization and application of intelligent substation, its unified number set up
Comprehensively analyze to intelligent transformer substation information according to storehouse and bring opportunities and challenges.Utilize data mining algorithm that data are processed, obtain
Taking same event data, the information comprehensive analysis of integrated monitoring system and intelligent alarm function provide effective basis number
According to.In the foundation that intelligent substation data are excavated the self learning type knowledge base also achieving senior application function simultaneously.
Summary of the invention
In order to overcome above-mentioned the deficiencies in the prior art, the present invention provides one to be applicable to intelligent substation integrated monitoring system
The data digging method of system, provides reliable same event data for the senior application of intelligent substation, improves intelligent substation peace
Full operation level.
In order to realize foregoing invention purpose, the present invention adopts the following technical scheme that:
Thering is provided a kind of data digging method being applicable to intelligent substation integrated monitoring system, described method includes following
Step:
Step 1: substation data is classified and pretreatment;
Step 2: the substation data through pretreatment carries out discrete time division, forms data set;
Step 3: carry out substation data excavation, it is achieved the excavation of data member correlation rule in historical data base;
Step 4: transformer station is increased newly data and excavates, it is achieved transformer station increases data newly and becomes with data in historical data base
The excavation of member's correlation rule;
Step 5: described data member correlation rule is externally exported.
Described step 1 comprises the following steps:
Step 1-1: substation data is divided into quantity of state and measurement;
Step 1-2: the single double quantity of states in described quantity of state are carried out unification process, forms the switches such as generation and elimination
The independent data of state;And the codomain scope of described measurement is carried out extensive process, formed out-of-limit, the most out-of-limit, normally with
And abnormality carries out the independent data that indicates.
In described step 1-2, the single double quantity of states in described quantity of state are carried out unification process, specifically use:
D1=0:01,1:10,2:00 | 11} or D1={0:0,1:1}
Wherein, D1For certain quantity of state;
The codomain scope of described measurement is carried out extensive process, specifically uses:
F1={0:Fllimint-Fhlimint, 1:Flllimint-Fllimint|Fhlimint-Fhhlimint, 2:
<Fzero|>Fmax, 3: < Flllimint|>Fhhlimint}
Wherein, F1For certain measurement, FllimintRepresent the codomain lower limit of measuring value, FlllimintRepresent the codomain of measuring value
Lower lower limit, FhlimintRepresent the codomain upper limit of measuring value, FhhlimintRepresent the upper limit in the codomain of measuring value, FzeroRepresent measuring value
Codomain threshold, FmaxRepresent the codomain exception upper limit of measuring value.
In described step 2, to the substation data through pretreatment according to substation operation time and historical data base
Scale carries out discrete time division, forms data set.
In described step 3, by data set is iterated, calculate support counting and the power of each substation data
Restatement number, thus realize the excavation of data member correlation rule in historical data base according to the predetermined association mining degree of depth;
Wherein, the support counting of substation data is multiple data set sums, is designated as support (X);
The number of the substation data comprised in data set A is calculated as item collection A, the number meter of substation data in data set B
For item collection B, then the weight of substation data is counted as:
Weight countsThe item collection A comprising data I counts+comprises the item collection B counting of data I.
In described step 4, excavate by transformer station is increased newly data, extract wherein data member correlation rule, and
Compare with data member correlation rule in historical data base, calculate and analyze the raw data set comprising different pieces of information collection,
Analyze the support factor of respective substation data in each data set, and repeat iterative scans historical data base and can obtain change
Power station increases the data member correlation rule in data newly, it is achieved transformer station increases data newly and associates with data member in historical data base
The excavation of rule.
The support factor of substation data is to comprise item collection A counting and the item collection B meter of data I in historical data base simultaneously
Number and data I weight counting product of institute's accounting in item collection sum, i.e.
The support factor
The item collection A counting comprising data I and item collection B count
In described step 5, pass through data mining, it is achieved transformer station increases data newly and associates with data member in historical data base
The excavation of rule, it is provided that service interface, it is achieved described data member correlation rule externally exports.
Compared with prior art, the beneficial effects of the present invention is:
1) data divide clearly, and between each step, coupling is more weak, facilitates upgrading and the transformation of program;
2) data model generated and excavation object are clear, model according to classification of type;
3) about the process of measurement, use classification codomain to process, meet substation operation regulatory requirements, the most significantly letter
Change calculating process, optimize converting algorithm;
4) about the algorithm of data mining, on the basis of based on Apriori algorithm, employing can realize data member weight
The method of counting so that data rule intensity has had quantizating index, it is simple to subsequent applications analysis and process;
5) about newly-increased data digging method, use separate type dynamical min, independently newly-increased data block is operated, pole
Improve greatly efficiency of algorithm, and realize the regular self study of knowledge base and automatically update.
Detailed description of the invention
Below in conjunction with the accompanying drawings the present invention is described in further detail.
Such as Fig. 1, the present invention provides a kind of data digging method being applicable to intelligent substation integrated monitoring system, specifically
Realize step as follows:
1) substation data classification and pretreatment;
For the data base of intelligent substation, represented information system mostly is real-valued system, it is impossible to directly use
The state value that data mining is commonly used processes, and the centrifugal pump understood and process of should first real-valued transformation being advisable, as by current value
Being converted into high and low, humidity value is converted into humidity, is dried.
Intelligent substation integrated monitoring system database can regard information system as, and information system is data set, warp
It is often expressed as tables of data.Every a line of this tables of data represents an object, and these objects can be example, event etc..And data
Every string of table is the attribute of object, and these attributes can be the feature of object, state, tolerance etc..
There is its particular attribute and deposit method in quantity of state and measurement that consideration substation comprises, can use No. ID
Carry out with the mode of status attribute combined coding.By database information unique number representative information title, quantity of state is used [0
(elimination), 1(occur), 2(fault)] three kinds of states reflect;To measurement use [3(is the most out-of-limit), 1(out-of-limit), 0(just
Often), 2(is abnormal)] represent.This coding not only encodes simply, is easily achieved, and also allows for the mining algorithm behaviour to data
Make.Following example,
{0:99-121,1:55-99|121-165,2:<0|>9999,3:<55|>165}。
For measurement, U1 represents certain 110kV busbar voltage measurement, is normal model in the codomain interval of 99-121
Enclose, represent with 0;It is out-of-limit at 55-99 or 121-165, represents with 1;Less than 0 or more than 9999 be exception, represent with 2;
0 to 55 or higher than 165 time be then the most out-of-limit, represent with 3.
For quantity of state, A1Representing certain position of the switch, transformer station typically uses on-off switch position to carry out judging it
Location status, 00 and 11 representing fault positions, 01 represents a point position, and 10 represent conjunction position.A2Representing certain signal condition, 0 representation signal disappears
Removing, 1 representation signal occurs.Quantity of state is classified by unification here, all utilizes generation and elimination to represent its state, to dibit
Switch 01 represents with 0, and 10 represent with 1, and 00 or 11 represent with 2.Following example:
A1={0:01,1:10,2:00|11};
A2={0:0,1:1}。
2) substation data through pretreatment is carried out discrete time division
Substation data through pretreatment is carried out according to substation operation time and historical data base scale
Discrete time divides, and forms data set.This data set is the least unit in substation data storehouse, represents with T.Such as gather
T={I1, I2 ..., Ik} is item collection, and I is individual data in transformer station's historical data base, and in T, the quantity of I number is K, then T claims
It is item collection for K.
In the mining process of data set, excavation scope comprises the data of full scope of standing, from time section angle analysis, by
In transformer station, a lot of event not only has influence on monospace, have partial event can and other interval or device-dependents, because of
This needs to divide data set by the time period.In data base, the time period is as data set, with T1、T2、T3、……TnRepresent
Time period.The data set attribute divided at timed intervals is as shown in table 1:
Table 1
Under the data aggregate mode of Discrete time intervals, by the time period, the quantity of state in data base and measurement are entered
Row segmentation, represents individual event data with Ix, during the signal in each period is using the item ID list in table 1 as data set
, data set list is as shown in table 2:
Table 2
After completing data set list, the number of times occurred according to actual signal in substation data storehouse counts, each section
The item ID list in the above table of signal in time as in data set every content occur number of times as the son in data set
, data set and counting are as shown in table 3:
Table 3
Upper table describes the relation between an ID and the number of times of item ID appearance simultaneously.I in table12: 2 represent signal I12?
Article 1, record T1(Record) number of times occurred in is 2, i.e. there occurs 2 I12, other the like.
For this special information system of intelligent substation, quantity of state number of times is united according to the situation of actually occurring
Count.Then need measurement to utilize above-mentioned codomain scope to carry out extensive process.As shown in table 4, field contents in table
Represent each data set ID and the number of times of generation thereof respectively, such as I14={ 0:2,1:1,2:2,3:4} are illustrated respectively in T1I in time period14
The number of times occurred: magnitude of voltage number of times between 99 to 121 is 2, and the number of times between 55 to 99 and 121 to 165 is 1, is more than
9999 and less than 0 number of times be 2, less than 55 and more than 165 number of times be 4.The rest may be inferred for other field.
Table 4
After the change frequency of each quantity of state, measurement and remote control in obtaining substation data storehouse, set up such as following table institute
The data classification chart shown, distinguishes data by the different conditions of signal, such as I14Normal (0) state be represented by I14-0, its
Out-of-limit state representation is I14-1, its abnormality is expressed as I14-2, its serious out-of-limit state representation is I14-3, concrete such as table 5 institute
Show:
Table 5
So far, after completing the data classification in substation data storehouse and pretreatment, quantity of state based on different conditions
All participate in data mining as independent data with measurement, Ix-0、Ix-1,、Ix-2、Ix-3Represent measurement I respectivelyxLetter
Number 4 kinds of states, quantity of state.For ease of understanding, simplify word and describe content, accomplish easy-to-understand as far as possible, after
Specific algorithm derive and the related content such as methods analyst still uses IxExpression way, do not carry out under complexity target and indicate.
The data model that specific algorithm uses is as shown in table 6.
Table 6
Property Name |
Explanation |
Information_Attribute |
Signal essential information |
Information_ID |
Data code name |
Information_TID |
Time interval belonging to data |
Information_time |
Signal forming time |
Information_type |
Signal type |
Information_state |
Signal condition |
Information_serialnumber |
Data unique encodings |
Information_value |
Data value |
Information_Substation |
Plant stand belonging to signal |
Substation_id |
Plant stand sequence number |
Substation_name |
Plant stand title |
Substation_voltage |
Plant stand electric pressure |
Information_Equipment |
Signal corresponding device information |
Equipment_id |
Device numbering |
Equipment_name |
Device name |
Equipment_type |
Device type |
Equipment_voltage |
Equipment electric pressure |
Information_Fault |
Fault attribute when signal occurs |
Fault_type |
Fault category |
Fault_id |
Fault is numbered |
Fault_name |
Fault title |
Information_Environment |
Environment attribute when signal occurs |
Environment_humidity |
Ambient humidity |
Environment_temp |
Ambient temperature |
3) data mining
With the data instance of table 3, it is as follows that the data mining algorithm of the present invention realizes process: first scan database, calculates
Each data set support and the counting of the support factor.The data set support obtained and weight count content (table as shown in table 7
In I12It is I Deng the physical meaning of itemx-0, the most by that analogy):
Table 7
Here choose minimal weight and be counted as 4, compare candidate's weight counting and count with minimal weight, count 5 more than weight
1-data set as shown in table 8;
Table 8
Produce candidate data collection by table 8, data set is merged, the 2-data set members list such as table 9 of generation:
Table 9
On the basis of table 9, calculate the weight counting of candidate data collection support and member, obtain 2-data set support
With weight count table, such as table 10:
Table 10
Choose 2-data set minimal weight and be counted as 10, according to table 10 compares candidate's weight counting and MINIMUM WEIGHT restatement
Number, generates 3-data set members list as shown in table 11:
Table 11
Three scanning historical data bases, calculate support and the frequency of member of candidate data collection, obtain the 3-number of table 12
According to collection support and weight counting:
Table 12
So far the correlation rule obtained, wherein { I it are derived from table 312:I21:I14:I11It it is Strong association rule.
4) newly-increased data mining
For data set and members list, i.e. data base shown in table 13, be mainly made up of two parts: include with
Front data set { T1,T2,T3,T4,…,Tn, newly-increased data set { Tn+1,Tn+2,Tn+3... }, excavate according to dynamic increment formula
Algorithm, then can make full use of previous Result, only need to excavate newly-increased part, is had by two-part Result
Combine machine, excavation and the analysis of the whole data base to new production can be completed.
Table 13
Algorithm is when scan database, and the support of enumeration data collection and the number of data set member, so digging simultaneously
While pick data set, the also quantitative relationship between mining data collection internal members, each candidate data collection is swept for the first time
When retouching data base, calculate the weight counting of its whole initial data set, support technology;And set minimum support counting, adopt
With filters, above or equal to the data set of minimum support counting as candidate data collection.When large database is entered
When row correlation rule and data set excavate, can carry out step by step, first data-base recording temporally be divided in bulk, respectively to each piece
Excavate respectively.
When newly-increased data base's part is carried out data mining, 3 steps are used to carry out.
First obtain data base and increase part newly, arrange according to the data model built, as shown in table 14:
Table 14
It is scanned newly increasing part, counts each candidate item number, such as table 15:
Table 15
Choosing the minimum support factor is 0.4, then go up table and can be reduced to the 1-more than the support factor 0.4 shown in table 16
Data set:
Table 16
Secondly newly-increased part is carried out 2-data set scanning, counts each candidate item number, 2-data set weight counting and
The support factor such as table 17:
Table 17
Finally newly-increased part is carried out 3-data set scanning, counts each candidate item number, 3-data set support counting
And weight counts such as table 18:
Table 18
Result to 3-data set, totally one data collection: { I11:5:I12:12:I31: 6), its support factor is 3,
Illustrating that the supporting rate of this data set is the highest, this data set most probable becomes the strong rule of association, and I therein11Ratio be
22%, I12Ratio be 52%, I31Ratio be classified as 26%.Visible, when utilizing data set based on quantizating index to carry out data mining,
It is possible not only to find correlation rule between each data in data base, and can be according to the pass between the shared each data of proportional analysis
System analyzes used for application function.
5) service interface
Pass through data mining, it is achieved transformer station increases data and the excavation of data member correlation rule in historical data base newly,
There is provided service interface, it is achieved described data member correlation rule externally exports.
Finally should be noted that: above example is only in order to illustrate that technical scheme is not intended to limit, to the greatest extent
The present invention has been described in detail by pipe with reference to above-described embodiment, and those of ordinary skill in the field are it is understood that still
The detailed description of the invention of the present invention can be modified or equivalent, and any without departing from spirit and scope of the invention
Amendment or equivalent, it all should be contained in the middle of scope of the presently claimed invention.