The joint clustering method and device that a kind of high ferro power quality analysis data are selected
Technical field
The present invention relates to power quality data frequency analysis field, in particular to a kind of high ferro power quality analysis data
The joint clustering method and device selected.
Background technology
At traction substation measuring apparatus record set time in generate about electric parameters such as harmonic currents, be
The research that high-speed railway influences power grid power quality problem provides the foundation data.Related researcher is based on the humorous of historical accumulation
The probabilistic model that wave current data is established is representing train power quality problem that the harmonic characterisitic of train is reacted.But by
Come from the different multiple types train of harmonic characterisitic in historical data, so the method with historical data modeling analysis is come simply
The probabilistic model that harmonic current is established on ground is unreasonable.It theoretically needs to distinguish for the differentiation data of type of train
Model is created, but because the related information between data type and type of train does not record clearly, is different from type of train to sieve
Data is selected to be not easy to realize in practical operation.
Train can be roughly divided into the speed-raising stage for just driving into draft arm, intermediate even running stage by the process of draft arm
The boost phase three phases of draft arm will be finally driven out to, people are known as traction, steady and promotion stage.In different fortune
The different harmonic characterisitics that the train of row order section has, so the traveling stage residing for train should be treated during modeling with a certain discrimination, however
Correlation between data and train traveling stage is unknown, thus to the train data in different traveling stages screened can
Row is poor.
Therefore, it is necessary in view of the above-mentioned problems, provide a kind of joint clustering method, realize number needed for high ferro power quality analysis
According to select.
Invention content
To meet the needs of prior art development, the present invention provides the connection that a kind of high ferro power quality analysis data are selected
Close clustering method.
The joint clustering method that high ferro power quality analysis data provided by the invention are selected, it is improved in that institute
The method of stating includes:
According to previously given electric energy quality monitoring, recording, and selected characteristic harmonics;
Initial data layout is obtained into representation vector, and by 95 statistical values, the average speed of active power into matrix form
With characteristic harmonics to data normalized;
The representation vector after normalization is divided into group and cluster, and determine corresponding train even running with K-means algorithms
The electric energy quality monitoring data in stage.
Further, the electrical parameter of the electric energy quality monitoring, recording includes:It is voltage RMS value, current RMS value, active
Power, reactive power, apparent energy, fundamental voltage, each harmonic voltage, fundamental current and individual harmonic current;
The electric energy quality monitoring, recording includes recording the field of specific generation time.
Further, the characteristic harmonics it is selected including:
By each harmonic of whole electric energy quality monitoring, recordings according to the descending sequence of harmonic current, and record first five time
Harmonic current is integrated into identity set;The frequency that the harmonic wave of electric energy quality monitoring, recording occurs in statistics set, and with agreement
The frequency be characterized harmonic wave highest five times.
Further, initial data layout is included into matrix form:It picks out in whole trains and only has a train warp
Corresponding period during draft arm is spent, asks for the period corresponding block number evidence;
Using the block number according to the set obtained by the statistical value of corresponding monitoring data as representation vector, and by it is all represent to
Amount programs matrix in chronological order.
Further, the representation vector includes:The unique mark of the block number evidence;95 systems of each secondary characteristic harmonic current
95 statistical values of evaluation, active power;The block number is advanced flat according to corresponding initial time, end time and corresponding train
Equal speed.
Further, 95 statistical values, average speed and the characteristic harmonics by active power to the normalization of data at
Reason includes:According to the ratio of each value and maximum value of 95 statistical values of active power, average speed and characteristic harmonics in representation vector
Value does normalized.
Further, the representation vector by after normalization is divided into group and cluster includes:
Representation vector after normalization is divided into cluster by 95 statistical values and average speed based on active power, and to returning
One representation vector changed assigns cluster number;Representation vector after normalization is classified as group by the harmonic current of feature based harmonic wave, and to
Normalized representation vector assigns group number;
Based on two parameters of cluster and group, representation vector is divided into K class, per a kind of a kind of corresponding vehicle.
Further, the electric energy quality monitoring data in the determining corresponding train even running stage include:
Representation vector in each class reversely determines corresponding electric energy quality monitoring, recording, takes row sequentially in time
For the record layout of second segment of the vehicle Jing Guo draft arm process into matrix, the electric energy quality monitoring data for obtaining a kind of vehicle correspond to warp
Cross the status data of the plateau of traction power supply arm.
The joint clustering apparatus that a kind of high ferro power quality analysis data are selected, described device include:
Data selection unit has deleted the electric energy quality monitoring note containing null value in previously given data set for basis
Record, and selected characteristic harmonics;
Data processing unit, for by initial data layout into matrix form, and by 95 statistical values of active power, average
Speed and characteristic harmonics are to data normalized;
Representation vector after normalization is divided into group and cluster, and determine to correspond to by data dividing unit with K-means algorithms
The electric energy quality monitoring data of train smooth operation phase.
Further, the data selection unit includes data deletion subelement, for the record for generating monitoring device
In electric energy quality monitoring, recording containing null value delete;
Characteristic harmonics select subelement, for by the collection of quintuple harmonics electric current maximum in whole electric energy quality monitoring, recordings
Harmonic wave frequency of occurrence in conjunction is chosen to be characteristic harmonics highest five times.
Compared with the latest prior art, technical solution provided by the invention has the advantages that:
1) electric current of type of train and its speed of service, active power and characteristic harmonics in technical solution provided by the invention
Incidence relation is respectively provided with, the method using joint cluster classifies to monitoring data, has not only considered all three parameter but also very
The problem of caused by the good setting for avoiding weight is improper, improves the accuracy of data classification, is high ferro power quality analysis
Work provides accurately data supporting.
2) the electric energy matter caused by the different phase that technical solution provided by the invention is travelled for train on draft arm
The degree of amount problem is also not quite similar, and the data extraction method provided can be realized quickly corresponding to the train smooth operation phase
The extraction that monitoring data carry can distinguish data globally in type of train and Train Schedule, realize precisely dividing for data
Class.
Description of the drawings
Fig. 1 is the joint clustering method flow chart that data provided by the invention are selected;
Fig. 2 is data conversion flow chart provided by the invention.
Specific embodiment
Below with reference to Figure of description, technical solution provided by the invention is discussed in detail in a manner of specific embodiment.
In the data set given in high ferro power quality analysis, every record, which has corresponded to, sometime to be put at traction substation
The data about electric parameters such as harmonic currents that are collected into of measuring apparatus, the present invention first will using the method for joint cluster
If these records are divided into Ganlei automatically, the number for having corresponded to the train smooth operation phase is then selected from every a kind of record again
According to.Researcher can establish harmonic current probabilistic model respectively with more based on the data that the invention is selected out for different type of train
The power quality and other problems of high-speed railway are studied well.
High-speed railway power quality analysis as shown in Figure 1 selects the method flow diagram of the joint cluster of data, the present invention
The technical solution of offer specifically includes:
(1) all electric energy quality monitoring, recordings containing null value are deleted.
Monitoring device at traction substation can generate a record for every three seconds, and which depict the voltages at this time point
It is RMS value, current RMS value, active power, reactive power, apparent energy, fundamental voltage, each harmonic voltage, fundamental current, each
Subharmonic current and other electric parameters.In addition, every record also contains, there are one extra fields to identify the specific of this record
Generation time.The unstability of communication network or measuring apparatus in itself can cause the individual parameters of certain record to lack,
So each parameter of a certain monitoring record will be traversed to determine this record whether containing null value.As if it is determined that a certain monitoring record
Really then directly it is rejected from data set containing missing values.Initial data has recorded for a period of time every three successively in order
Second such electric energy quality monitoring, recording.The data A of attached drawing 2 illustrates the pattern of this data.
(2) characteristic harmonics are selected from each harmonic.
It is recorded first against each, its each harmonic according to harmonic current is descending is ranked sequentially, writes down maximum
First five time.Then identical operation is carried out, and their results are integrated into identity set to all records.Last statistics set
In the frequency that occurs of each subharmonic and arrange the highest preceding quintuple harmonics of the frequency and be characterized harmonic wave.
(3) initial data is traversed in chronological order, and picking out each has and when only train passes through draft arm pair
The period answered asks for the statistical value of the monitoring data of each such period.Below as abbreviation one section represent
Have and data when only vehicle passes through draft arm are block number evidence, as the data B in attached drawing 2 is highlighted three block numbers
According to;And the set for having corresponded to the statistical value of the monitoring data of each block number evidence will be referred to as representation vector.Each represent to
Amount contains:The 95 of the unique mark of one of this part data, each time 95 statistical values of characteristic harmonic current, active power
Initial time, end time and its corresponding train traveling average speed corresponding to statistical value, this part data.
Assuming that know that the corresponding initial time of some representation vector is Ti, end time Tj, wherein TjIt must be more than
Ti, the length for also assuming that draft arm is S, then the train average speed corresponding to this representation vector can be by S/ (Tj-Ti)
It is calculated.
(4) by all representation vectors, layout into a matrix, is returned for the following field in matrix sequentially in time
One change is handled, including:The harmonic current of 95 statistical values of active power, average speed and characteristic harmonics.
It is explained by taking the normalization of speed as an example below, it is assumed that learn that all representation vector medium velocity maximums are
The speed V of Vmax, then any one representation vector XXV can be passed throughX/ Vmax is normalized.Other fields also need to be according to identical
Method carries out normalization, as the data C in attached drawing 2 illustrates the matrix being made of the representation vector after normalizing.
(5) 95 statistical values and average speed based on active power, using K-means algorithms, after all normalization
Representation vector be classified as several clusters, then assign a value to each normalized representation vector to represent the cluster number corresponding to it,
Such as 1,2 and 3, Label1={ 1,2,3 }.
According to investigation, M vehicle is shared by way of the train of this traction substation, and each vehicle has N kind length, for letter
Just it can regard shared M*N types for the sake of as.Research is it is found that type of train has relevance with power and speed, so can incite somebody to action
All normalized representation vectors are divided into M*N cluster.
(6) harmonic current of feature based harmonic wave, using k-means algorithms, again by all normalized representation vectors
Several groups are classified as, then assigns a value to each normalized representation vector to represent its corresponding group number, such as 1 and 2,
Label2={ 1,2 }.
According to investigation, harmonic current also property relevant with the vehicle of train, however it has no obvious relation between persistence with train length, institute
All representation vectors are divided into M groups so that harmonic current can be based on.In the description of this step use " group " and no longer with (5)
" cluster " used in step is in order to which it is mutually distinguished.
(7) via (5), (6) step operation after each representation vector had been assigned a cluster number and a group
Number, the two parameters are based only upon, all representation vectors are divided into K class using k-means algorithms.In the operation in later stage
Think that every one kind is corresponding with a kind of vehicle, Label3={ 1,2 ..., K }.The citing of attached drawing 2 is illustrated through 3 life of data conversion
Into two classes.
(8) class is given, for each representation vector in such, reversely determines its corresponding electric energy quality monitoring
Record if the data E of attached drawing 2 illustrates wherein certain a kind of electric energy quality monitoring data, takes 1/3 number therein sequentially in time
The record of amount, because the monitoring data of intermediate 1/3 quantity have corresponded to shape of the train by the plateau of traction power supply arm
State.All representation vectors are similarly operated, and by their result layout into matrix, such as the data F show of attached drawing 2
Wherein certain a kind of electric energy quality monitoring data for having corresponded to the train smooth operation phase.
One matrix as above is obtained for each class according to step (8), it is believed that each such matrix is a kind of
The Power Quality Detection data of vehicle.
The joint clustering apparatus that a kind of high ferro power quality analysis data are selected, described device include:
Data selection unit has deleted the electric energy quality monitoring note containing null value in previously given data set for basis
Record, and selected characteristic harmonics;
Data processing unit, for by initial data layout into matrix form, and by 95 statistical values of active power, average
Speed and characteristic harmonics are to data normalized;
Representation vector after normalization is divided into group and cluster, and determine to correspond to by data dividing unit with K-means algorithms
The electric energy quality monitoring data of train smooth operation phase.
Further, the data selection unit includes data deletion subelement, for the record for generating monitoring device
In electric energy quality monitoring, recording containing null value delete;
Characteristic harmonics select subelement, for by the collection of quintuple harmonics electric current maximum in whole electric energy quality monitoring, recordings
Harmonic wave frequency of occurrence in conjunction is chosen to be characteristic harmonics highest five times.
The above embodiments are merely illustrative of the technical scheme of the present invention and are not intended to be limiting thereof, although with reference to above-described embodiment pair
The present invention is described in detail, those of ordinary skill in the art still can to the present invention specific embodiment into
Row modification either equivalent replacement these without departing from any modification of spirit and scope of the invention or equivalent replacement, applying
Within the claims of the pending present invention.