A kind of sorting technique of the high ferro power quality data based on support vector machine
Technical field
The present invention relates to power quality data sorting technique, be specifically related to a kind of sorting technique of the high ferro power quality data based on support vector machine.
Background technology
Along with take extra-high voltage grid, be that the strong intelligent grid construction of key rack, electric network coordination at different levels development accelerates, the factor that produces electrical energy power quality disturbance in electrical network constantly increases, and the power quality problem that electrical network suffers is also on the rise.For strengthening electrical network Power quality management, China partly economizes (city) Utilities Electric Co. and has successively set up the electric energy quality monitoring system in the whole province (city) as Shanghai, Jiangsu, Fujian, Shanxi, Liaoning, Jiangxi, Henan, Hunan etc., and part province company has also set up a plurality of prefecture-level electric energy quality monitoring system inside the province.Each province (city) electric energy quality monitoring system has reached the functions such as the collection of quality of power supply master data, report generation substantially at present, but aspect quality of power supply information excavating, also lack advanced analysis technology, cause the magnanimity electric energy quality monitoring data that collect to be not fully utilized, caused serious data waste.For this situation, need badly and carry out power quality data digging technology, from power quality data, extract Useful Information, thereby make electric energy quality monitoring system serve better Electric Power Network Planning, operation and maintenance etc.Yet, electric energy quality monitoring data are a kind of multidimensional massive data sets, it comprises effective Value Data and Wave data, and comprise the many indexes such as voltage deviation, frequency departure, harmonic wave, a harmonic wave, voltage fluctuation and flicker, tri-phase unbalance factor, voltage swell, voltage dip and short time voltage interruption, make electric energy quality monitoring facing data mining huge challenge.
At present, data mining is still in the junior stage in power quality analysis.Research for the disturbance discriminator problem of the quality of power supply is many, and practicality is more intense.And for interference source specificity analysis, the data while extracting interference source operation from magnanimity Monitoring Data are also also less according to the research of Monitoring Data feature judgement interference source type identification.
Summary of the invention
For the deficiencies in the prior art, the sorting technique that the object of this invention is to provide a kind of high ferro power quality data based on support vector machine, recurrence feature elimination method (SVM-RFE) based on support vector machine is carried out the quality of power supply characteristic recognition method of high iron load, carry out the feature selecting of high ferro load electric energy quality, utilize optimum character subset Training Support Vector Machines, finally complete having or not the power quality data classification of high ferro load operation.Data when high ferro operation is extracted in this invention from magnanimity Monitoring Data, and can select the quality of power supply feature while reflecting high ferro operation, for the excavation of power quality data, classify a kind of thinking and method are provided.
The object of the invention is to adopt following technical proposals to realize:
The invention provides a kind of sorting technique of the high ferro power quality data based on support vector machine, its improvements are, described method comprises the steps:
(1) by online electric energy quality monitor, obtain high ferro electric energy quality monitoring data, carry out after pre-service as training sample set;
(2) utilize training sample set Training Support Vector Machines SVM, obtain the classification accuracy rate of SVM model, training sample set and test sample book collection;
(3) the recurrence feature elimination method based on support vector machine is carried out the feature selecting of high ferro load electric energy quality;
(4) number of compressive classification accuracy and character subset is carried out quality assessment to character subset, determines optimal feature subset;
(5) utilize optimal feature subset Training Support Vector Machines SVM, obtain final supporting vector machine model;
(6) utilize final supporting vector machine model to identify the sample data of the high ferro quality of power supply, obtain the classification of high ferro power quality data.
Further, in described step (1), high ferro electric energy quality monitoring data comprise voltage deviation, current effective value, frequency departure, active power, reactive power, and applied power, electric weight, harmonic wave, a harmonic wave, voltage fluctuation and flicker, tri-phase unbalance factor, voltage swell, voltage dip and short time voltage are interrupted;
Described harmonic wave is the harmonic wave of 0-50 time.
Further, in described step (1), pre-service comprises:
1. by the high ferro electric energy quality monitoring data of sampling in every 3 seconds, every 20 sampled datas average processing, are converted into 1 sample data x ' of per minute
i;
2. sample data is normalized according to each feature.Suppose x '
i=(x '
i, 1, x '
i, 2..., x '
i,m) be i primary data sample, there are m feature, wherein x '
i,jit is the value of j feature of i sample; Normalized expression formula is as follows:
<1>;
Wherein, N represents total N sample data, x'
maxand x'
minbe respectively maximal value and the minimum value of the data vector of j feature; X'
midmean value for maximal value and minimum value; x
i,jfor the value after normalization; I data sample after normalization is x
i=(x
i, 1, x
i, 2..., x
i,m).
Further, in described step (1), described training sample set is by sample data x
iwith high ferro running status y
i=[+1 ,-1] is corresponding, and the power quality data sample that is about to high ferro running status is judged to the classification of sign "+1 "; Remainder data sample is judged to the classification of sign " 1 ", the sample set finally forming after normalization is
n is sample size.
Further, in described step (2), described support vector machines, is the sorter of two merotypes; Set up support vector machines model and refer to searching optimal classification face, its expression formula is as follows:
<2>;
Wherein: K (x, x
i) be kernel function, x
ifor support vector,
for the weights coefficient of support vector machine, b
*for deviation ratio.
Further, in described step (3), the feature selecting that the recurrence feature elimination method based on support vector machine is carried out high ferro load electric energy quality comprises the steps:
A, according to the optimal classification face in support vector machines model, be that support vector in expression formula (2) and weights coefficient are formulated feature ordering criterion, the ranking criteria mark of each feature under this criterion in calculation training sample characteristics subset;
B, the ranking criteria mark of feature is sorted to its characteristic of correspondence by descending order, select to come feature above, and eliminate and come 1 last feature, form new character subset;
C, according to new character subset, upgrade training sample set;
Training sample set after d, use are upgraded, returns to step (2) Training Support Vector Machines again;
E, iteration step (a)-(d), until finish while remaining last feature in character subset.
Further, in described step (a), according to the information in support vector machines model, formulating feature ordering criterion, is, according to each feature, the contribution of classification is defined to ranking criteria; Its ranking criteria is as follows: for linear SVM SVM, the ranking criteria mark of i feature is defined as DJ (i)=(w
i)
2, wherein
For Nonlinear Support Vector Machines SVM, the ranking criteria mark of i feature is defined as
wherein, a is column vector, and its element is
h is a matrix, and its element is y
iy
jk(x
i, x
j); H(-i) be the matrix after i feature of cancellation.
Further, in described step (4), the character subset of selection sort accuracy maximum is as optimum character subset; If the classification accuracy rate that character subset is corresponding is identical or close, get the character subset of dimension minimum as optimum character subset.
Further, in described step (6), the sample data of the high ferro quality of power supply refers to the power quality data of high ferro combined to the data after normalization according to optimum character subset, and the data after combination normalization are inputted final supporting vector machine model as sample data.
Compared with the prior art, the beneficial effect that the present invention reaches is:
1, the present invention can obtain the quality of power supply feature set that can reflect high ferro load operation feature from the power quality data that comprises up to a hundred characteristic indexs, has also improved the nicety of grading of the power quality data that has or not high ferro load operation simultaneously.
Data when 2, the present invention extracts high ferro operation from magnanimity Monitoring Data, and can select the quality of power supply feature in the time of reflecting high ferro operation, institute of the present invention extracting method can be used for the quality of power supply specificity analysis of other loads, for the excavation of power quality data, classifies a kind of thinking and method are provided.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of high ferro load electric energy qualitative data classification provided by the invention;
Fig. 2 is the process flow diagram that carries out the feature selecting of high ferro load electric energy quality based on recurrence feature elimination method SVM_RFE provided by the invention;
Fig. 3 is the result figure that carries out feature selecting and classification prediction for the high ferro quality of power supply provided by the invention.
Embodiment
Below in conjunction with accompanying drawing, the specific embodiment of the present invention is described in further detail.
The sorting technique that the invention provides a kind of high ferro power quality data based on support vector machine, its process flow diagram as shown in Figure 1, comprises the steps:
(1) the electric energy quality monitoring data pre-service at high ferro traction power supply station, sets up effective sample set.
High ferro traction power supply station electric energy quality monitoring packet is containing active power (P), reactive power (Q), electric current (I), harmonic wave (I
h, I
1expression fundamental current), a harmonic wave (I
ih), voltage (U), flickering (P
st), negative phase-sequence (μ
2) etc. 68 kinds of characteristic indexs, i.e. sample x
iit is the vector (d=68) of 68 dimensions.
Using certain Monitoring Data of 2 days in high ferro Traction Station (every 3 seconds once sampling) as original experimental data.The variation characteristic of power quality data in the time of embodying high ferro operation due to per minute data, for convenient prediction, averages processing by 20 sampled datas in per minute, is converted into 1 sample data x ' of per minute
i.By sample data x '
iwith high ferro running status y
i=[+1 ,-1] is corresponding, and the power quality data sample that is about to high ferro running status is judged to the classification of sign "+1 "; Remainder data sample is judged to the classification of sign " 1 ".High ferro station is removed without the power quality data of car period the 1st day morning, obtain 1024 data as training sample data, the number of samples that wherein belongs to sign "+1 " classification is 522, what belong to sign " 1 " classification is 502, using 1440 power quality datas of the 2nd day whole a day as test sample book data.
For improving the generalization ability of model, the time of minimizing procedural training, sample data has been carried out to normalized.All data-mappings are arrived to [1 ,+1] interval herein.Suppose x '
i=(x '
i, 1, x '
i, 2..., x '
i,m) be i primary data sample, there is m feature, x '
i,jit is the value of j feature of i sample.Normalized expression formula is as follows:
<1>;
Wherein, N represents total N sample data, x'
maxand x'
minbe respectively maximal value and the minimum value of the data vector of j feature; X'
midmean value for maximal value and minimum value; x
i,jfor the value after normalization.I data sample after normalization is x
i=(x
i, 1, x
i, 2..., x
i,m).So just having completed training sample set is
foundation.
(2) utilize training sample set Training Support Vector Machines SVM, obtain classification accuracy rate.
Single SVM is the sorter of two merotypes in essence, and its discriminant classification function is:
<2>;
Its Kernel Function mainly contains: linear kernel function K (x, x
i)=xx
i, polynomial kernel function K (x, x
i)=[xx
i+ 1]
q, radial basis kernel function
Deng.
In the present invention, select linear kernel function, discriminant classification function is
the training of SVM is exactly to find support vector x
i, support vector weights coefficient
with deviation ratio b
*process.
When feature set by 68 dimensions, the SVM model of setting up has respectively 2 and 3 samples and actual classification not to be inconsistent to the prediction classification of training sample and test sample book, and classification accuracy is up to 99.80% and 99.79%.This illustrates that the feature set of these 68 dimensions can reflect the quality of power supply feature of high ferro on the one hand, and the linear kernel function of adding the bright SVM of utilization can finely complete the judgement that has or not the operation of high ferro traction load, and forecast result of model is desirable.
(3) the recurrence feature elimination method SVM_RFE based on support vector machine carries out the feature selecting of high ferro load electric energy quality, comprises the steps:
A, according to the information in support vector machines model, formulate feature ordering criterion, calculate the ranking criteria mark of all features under this criterion.
Due to what select, be linear kernel function, so the ranking criteria mark of i feature is defined as DJ (i)=(w
i)
2, wherein
according to this criterion, calculate characteristic ranking criteria mark.
B, the ranking criteria mark of feature is sorted to its characteristic of correspondence by descending order, select to come feature above, and eliminate and come 1 last feature, form new character subset.
C, according to new character subset, upgrade sample set.
Training sample after d, use are upgraded, returns to second step and carries out Training Support Vector Machines again.
The step of e, iteration (a)-(d), until finish while remaining last feature in character subset.
Utilize the method for SVM-RFE to select the feature of the high ferro quality of power supply.Classification accuracy rate corresponding to feature selection process and this Feature Combination thereof as shown in Figure 3.
(4) number of compressive classification accuracy and character subset is carried out quality assessment to these character subsets, determines optimum character subset.
Return elimination method SVM-RFE and carry out iteratively this process, finally obtain several nested character subsets.With these character subsets, train SVM, and with the classification accuracy rate of SVM, assess the quality of these subsets, thereby obtain optimal feature subset.In general, select the character subset of accuracy maximum as optimal feature subset.If the accuracy that several character subsets are corresponding is more or less the same, get character subset that dimension is less as optimal feature subset.
As shown in Figure 3, in the present invention, to meet training sample and the test sample book mistake character subset that minute sum is minimum be optimum character subset in definition, and in optimizing process, a mistake minute total minimum value is 2.When character subset is 13 dimension, a mistake minute sum has also reached minimum value 2, and optimal characteristics collection is [P, Q, I, I so
1, I
4, I
9, I
13, I
45, I
47, I
49, I
ih2, I
ih38, μ
u2].Utilize SVM-RFE method that 68 features of original sample are reduced to 13 features, realize the optimization of sample characteristics collection.In addition, as can be seen from Figure 3, P, Q, I
1very high etc. the feature frequency of occurrences, wherein P is most important feature, and these illustrate that these feature characteristic of correspondence amounts have or not the correct identification contribution of high ferro load operation larger to this high ferro Traction Station, are outstanding feature sets.When high ferro is during at certain supply arm actual motion, the electric current that its corresponding Traction Station provides can directly can reflect the operation speed of high ferro; Traction load power characteristic can reflect the traction working condition of braking or the driving of high ferro.Outstanding feature set P, Q, I that the present invention identifies
1coincide with above-mentioned actual conditions, the Feature Selection of utilizing SVM-RFE algorithm effectively to complete the high ferro quality of power supply has been described, selected feature can reflect the quality of power supply feature of this high ferro station high ferro effectively.
(5) the sample data collection of the character subset based on optimum, training obtains final supporting vector machine model.
Train the method for final supporting vector machine model with described in (2) step.
(6) utilize final supporting vector machine model to classify to high ferro power quality data.
The power quality data that this high ferro station is concerned about combines according to optimum character subset, after normalization, as sample data, inputs final supporting vector machine model, just can complete having or not the power quality data classification of high ferro load operation.
Finally should be noted that: above embodiment is only in order to illustrate that technical scheme of the present invention is not intended to limit, although the present invention is had been described in detail with reference to above-described embodiment, those of ordinary skill in the field are to be understood that: still can modify or be equal to replacement the specific embodiment of the present invention, and do not depart from any modification of spirit and scope of the invention or be equal to replacement, it all should be encompassed in the middle of claim scope of the present invention.