CN106096324A - The power transmission and transformation main equipment load data disappearance returned based on k neighbour fills up algorithm - Google Patents
The power transmission and transformation main equipment load data disappearance returned based on k neighbour fills up algorithm Download PDFInfo
- Publication number
- CN106096324A CN106096324A CN201610743642.XA CN201610743642A CN106096324A CN 106096324 A CN106096324 A CN 106096324A CN 201610743642 A CN201610743642 A CN 201610743642A CN 106096324 A CN106096324 A CN 106096324A
- Authority
- CN
- China
- Prior art keywords
- subset
- vector
- neighbour
- power transmission
- load data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16Z—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS, NOT OTHERWISE PROVIDED FOR
- G16Z99/00—Subject matter not provided for in other main groups of this subclass
Abstract
A kind of electric power power transmission and transforming equipment load data disappearance returned based on k neighbour fills up algorithm, it is characterised in that filling steps is: data set D is divided into two subsets, i.e. subset D m, subset D c.Vector x in subset D m is divided into x=[xo;xm].Calculate in corresponding moment value, vector xoWith the Euclidean distance of institute's directed quantity in subset D c.According to distance size the vector in subset Dc is carried out ascending order arrangement, obtain subset D ' c.Take subset D ' front k vector (y in c1,y2,…,yk).Calculate x weighting k neighbour's regressand value at the i-th moment missing values.Repetition step one is to step 6, until subset DmIn institute's directed quantity the most processed.It provides the benefit that: need not use training set to be trained, the advantage of nearest neighbor algorithm is to be not required for knowing the distribution function of sample to be predicted in advance, therefore has directly perceived, without features such as priori statistical knowledge, unsupervised learnings.
Description
Technical field
The present invention relates to the big Data Mining of power system, particularly a kind of power transmission and transformation main equipment load data disappearance is filled out
Mend algorithm.
Background technology
Along with improving constantly of electric power information degree and increasing rapidly of electric power big data quantity, research is applicable to electricity
The algorithm of the big data mining of power also sets up effective Knowledge Discovery Model, and the innovation of intelligent grid business model and development are had weight
Want meaning.
In power system, Various types of data collection is power system with the power transmission and transformation main equipment load data of monitoring system collection
The basis of management and running, security and stability analysis, equipment state and risk assessment.But, in the actual motion of power system, one
Aspect can cause observing data unusual situation occur due to data acquisition channel mistake, remote-terminal unit failure and other reasons, with
Cause inconsistent with most of observations;On the other hand, due to specific event, (such as line maintenance, cutting load has a power failure, major issue is impacted
Deng) cause the ANOMALOUS VARIATIONS of load, also result in observation data and go against the established rules.Additionally, data metering device or storage device event
Barrier is likely to cause sub-load shortage of data.Therefore, must be to original loads before carrying out load data analysis and modeling
Abnormal data in data is correspondingly filled up and revises.
The problem the most both at home and abroad treatment research of Power system load data disappearance being existed some universalitys.First, document
In method be both on a small scale data set, the computational efficiency for large-scale dataset is relatively low;Second, these methods are to list
The treatment effect of individual discrete bad data is preferable, and the treatment effect of bad data continuous for sheet is general.
Summary of the invention
The invention aims to solve the problems referred to above, devise a kind of electric power power transmission and transformation returned based on k neighbour and set
Standby load data disappearance fills up algorithm.Specific design scheme is:
Filling steps is:
Step one, is divided into two subsets, i.e. subset D m, subset D c by data set D.
Step 2, is divided into x=[x by the vector x in subset D mo;xm]。
Step 3, calculates in corresponding moment value, vector xoWith the Euclidean distance of institute's directed quantity in subset D c.
Step 4, according to distance size the vector in subset Dc is carried out ascending order arrangement, obtain subset D ' c.
Step 5, take subset D ' front k vector (y in c1,y2,…,yk)。
Step 6, calculating x is at weighting k neighbour's regressand value of the i-th moment missing values:
Wherein wjIt is vector yjWeight.
Step 7, repetition step one is to step 6, until subset DmIn institute's directed quantity the most processed.
In step one, described subset D m is the load curve set comprising missing values, and subset D c is not comprise missing values
Load curve set.
In step 2, described vector xoFor intact misorientation amount, vector xmFor disappearance vector.
In step 6, greater weight should be taken with xo apart near vector, less weight should be taken with xo apart from remote vector.
In step 6, weight function computing formula is:
In weight function computing formula, described dist (X1, Yj) represents the Euclidean distance of vector X1 Yu Yj.
The electric power power transmission and transforming equipment load data based on k neighbour recurrence obtained by the technique scheme of the present invention is lacked
Algorithm is filled up in mistake, and it provides the benefit that:
K nearest neighbor algorithm is a kind of inertia learning algorithm, it is not necessary to using training set to be trained, its time complexity is O
(n), the sample number during wherein n is training set.The advantage of k nearest neighbor algorithm is to be not required for knowing the distribution of sample to be predicted in advance
Function, therefore has directly perceived, without features such as priori statistical knowledge, unsupervised learnings.
Accompanying drawing explanation
Fig. 1 be the present invention under the conditions of single missing values (t=23), by average fill obtain fill up result;
Fig. 2 be the present invention under the conditions of single missing values (t=23), by kNN return fill w3 obtain fill up result;
Under the conditions of Fig. 3 is consecutive miss value (t=21-25) of the present invention, fill up result by what average filling obtained;
Under the conditions of Fig. 4 is consecutive miss value (t=21-25) of the present invention, fill up knot by what kNN recurrence filling w3 obtained
Really;
Detailed description of the invention
Below in conjunction with the accompanying drawings the present invention is specifically described.
Filling steps is:
Step one, is divided into two subsets, i.e. subset D m, subset D c by data set D.
Step 2, is divided into x=[x by the vector x in subset D mo;xm]。
Step 3, calculates in corresponding moment value, vector xoWith the Euclidean distance of institute's directed quantity in subset D c.
Step 4, according to distance size the vector in subset Dc is carried out ascending order arrangement, obtain subset D ' c.
Step 5, take subset D ' front k vector (y in c1,y2,…,yk)。
Step 6, calculating x is at weighting k neighbour's regressand value of the i-th moment missing values:
Wherein wjIt is vector yjWeight
Step 7, repetition step one is to step 6, until subset DmIn institute's directed quantity the most processed.
In step one, described subset D m is the load curve set comprising missing values, and subset D c is not comprise missing values
Load curve set.
In step 2, described vector xoFor intact misorientation amount, vector xmFor disappearance vector.
In step 6, greater weight should be taken with xo apart near vector, less weight should be taken with xo apart from remote vector.
In step 6, weight function computing formula is:
In weight function computing formula, described dist (X1, Yj) represents the Euclidean distance of vector X1 Yu Yj.
Embodiment 1
Take certain provincial power network 185 main apparatus load data of a year 365 days, totally 67525 load curves.Every
48 points of load curve record whole day, totally 3241200 data points.Wherein 2 curves are artificially manufactured part missing data, bent
Line 1 is single missing values, lacks at t=23;Curve 2 is consecutive miss value, and data lack at t=21-25
Data set D is all 67525 and meets collection of curves, and being classified as two load curve set Dm and Dc, Dm is
Having two collection of curves of missing values, Dc is that remaining is worth collection of curves without disappearance.
Consider the curve X1 of single missing values, be classified as intact become estranged disappearance two parts X1=[x1 ..x22,
x24,…x48;x23].
Calculating X1 and the Euclidean distance of all curves in subset D c, the load curve in Dc uses owning in addition to x23
Value.
According to distance size, the vector in Dc is carried out ascending order arrangement, obtain D ' c.
According to k value take front k vector (Y1, Y2 ..., Yk).
Calculating X1 is at weighting k neighbour's regressand value of the i-th moment missing values:
Three kinds of weight w of consideration in the present embodiment:
Wherein dist (X1, Yj) represents the Euclidean distance of vector X1 Yu Yj.
After completing the above-mentioned steps of the present embodiment, by existing average fill method, complete described in the present embodiment
Certain saves shortage of data and calculates.
Fig. 1 be the present invention under the conditions of single missing values (t=23), by average fill obtain fill up result;Fig. 2 is
The present invention, under the conditions of single missing values (t=23), fills up result by what kNN recurrence filling w3 obtained, institute as shown in Figure 1, Figure 2
Showing, for single discrete missing values, the filling effect that k neighbour returns is substantially better than average filling mode.
Under the conditions of Fig. 3 is consecutive miss value (t=21-25) of the present invention, fill up result by what average filling obtained;Fig. 4
Under the conditions of being consecutive miss value (t=21-25) of the present invention, fill up result for connecting in flakes by what kNN recurrence filling w3 obtained
Continuous missing values, the filling effect that k neighbour returns is substantially better than average filling mode.
Embodiment 2
Take certain provincial power network 185 main apparatus load data of a year 365 days, totally 67525 load curves.Every
48 points of load curve record whole day, totally 3241200 data points.Wherein 2 curves are artificially manufactured part missing data, bent
Line 1 is single missing values, lacks at t=23;Curve 2 is consecutive miss value, and data lack at t=21-25
Data set D is all 67525 and meets collection of curves, and being classified as two load curve set Dm and Dc, Dm is
Having two collection of curves of missing values, Dc is that remaining is worth collection of curves without disappearance.
Consider the curve X1 of single missing values, be classified as intact become estranged disappearance two parts X1=[x1 ..x22,
x24,…x48;x23].
Calculating X1 and the Euclidean distance of all curves in subset D c, the load curve in Dc uses owning in addition to x23
Value.
According to distance size, the vector in Dc is carried out ascending order arrangement, obtain D ' c.
According to k value take front k vector (Y1, Y2 ..., Yk).
Calculating X1 is at weighting k neighbour's regressand value of the i-th moment missing values:
Three kinds of weight w of consideration in the present embodiment:
Wherein dist (X1, Yj) represents the Euclidean distance of vector X1 Yu Yj.
After completing the above-mentioned steps of the present embodiment, by existing average fill method, complete described in the present embodiment
Certain saves shortage of data and calculates;
After completing the above-mentioned steps of the present embodiment, by existing linear straight cutting method, complete described in the present embodiment
Certain saves shortage of data and calculates;
After completing the above-mentioned steps of the present embodiment, by existing batten straight cutting method, complete described in the present embodiment
Certain saves shortage of data and calculates;
Return (taking three kinds of weight functions) lack with average filling, linear interpolation, cubic spline interpolation, k neighbour respectively
Value is filled up, with using mean absolute percentage error (mean absolute percentage error, MAPE) index evaluation
X1 load curve fill up effect, described MAPE index computing formula is
Wherein xiIt is actual value,It it is predictive value.The accuracy of visible MAPE the least explanation prediction is the highest.
Table 1 is the MAPE index (t=23) of the various complementing method of single missing values
The MAPE (t=21-25) that table 2 is the various complementing method of consecutive miss value
As shown in table 1, table 2,
1., for single discrete missing values, the filling effect that k neighbour returns is substantially better than other modes.
2., for consecutive miss value in blocks, the filling effect that k neighbour returns is better than additive method.
3. use the forecast error of exponential weight function w3 less than other two kinds of weight functions.
Technique scheme only embodies the optimal technical scheme of technical solution of the present invention, those skilled in the art
Some variations may made some of which part all embody the principle of the present invention, belong to protection scope of the present invention it
In.
Claims (6)
1. the electric power power transmission and transforming equipment load data disappearance returned based on k neighbour fills up algorithm, it is characterised in that fill up step
Suddenly it is:
Step one, is divided into two subsets, i.e. subset D m, subset D c by data set D.
Step 2, is divided into x=[x by the vector x in subset D mo;xm]。
Step 3, calculates in corresponding moment value, vector xoWith the Euclidean distance of institute's directed quantity in subset D c.
Step 4, according to distance size the vector in subset Dc is carried out ascending order arrangement, obtain subset D ' c.
Step 5, take subset D ' front k vector (y in c1,y2,…,yk)。
Step 6, calculating x is at weighting k neighbour's regressand value of the i-th moment missing values:
Wherein wjIt is vector yjWeight
Step 7, repetition step one is to step 6, until subset DmIn institute's directed quantity the most processed.
2. fill up algorithm according to the electric power power transmission and transforming equipment load data disappearance based on k neighbour recurrence described in claim 1,
It is characterized in that, in step one, described subset D m is the load curve set comprising missing values, and subset D c is not comprise missing values
Load curve set.
3. fill up algorithm according to the electric power power transmission and transforming equipment load data disappearance based on k neighbour recurrence described in claim 1,
It is characterized in that, in step 2, described vector xoFor intact misorientation amount, vector xmFor disappearance vector.
4. fill up algorithm according to the electric power power transmission and transforming equipment load data disappearance based on k neighbour recurrence described in claim 1,
It is characterized in that, in step 6, greater weight should be taken with xo apart near vector, relatively little Quan should be taken with xo apart from remote vector
Weight.
5. fill up algorithm according to the electric power power transmission and transforming equipment load data disappearance based on k neighbour recurrence described in claim 1,
It is characterized in that, in step 6, weight function computing formula is:
6. fill up algorithm according to the electric power power transmission and transforming equipment load data disappearance based on k neighbour recurrence described in claim 5,
It is characterized in that, in weight function computing formula, described dist (X1,Yj) represent vector X1With YjEuclidean distance.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610743642.XA CN106096324A (en) | 2016-08-26 | 2016-08-26 | The power transmission and transformation main equipment load data disappearance returned based on k neighbour fills up algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610743642.XA CN106096324A (en) | 2016-08-26 | 2016-08-26 | The power transmission and transformation main equipment load data disappearance returned based on k neighbour fills up algorithm |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106096324A true CN106096324A (en) | 2016-11-09 |
Family
ID=57223807
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610743642.XA Pending CN106096324A (en) | 2016-08-26 | 2016-08-26 | The power transmission and transformation main equipment load data disappearance returned based on k neighbour fills up algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106096324A (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106651651A (en) * | 2016-12-12 | 2017-05-10 | 全球能源互联网研究院 | Data filling method and device for utilization power curve of grid user |
CN107193876A (en) * | 2017-04-21 | 2017-09-22 | 美林数据技术股份有限公司 | A kind of missing data complementing method based on arest neighbors KNN algorithms |
CN109408767A (en) * | 2018-10-17 | 2019-03-01 | 国网四川省电力公司乐山供电公司 | A kind of complementing method towards power grid missing data |
CN110274995A (en) * | 2019-06-18 | 2019-09-24 | 深圳市美兆环境股份有限公司 | Fill the determination method, apparatus and computer equipment of data |
CN111768034A (en) * | 2020-06-29 | 2020-10-13 | 上海积成能源科技有限公司 | Method for interpolating and supplementing missing value based on neighbor algorithm in power load prediction |
CN111861798A (en) * | 2020-08-07 | 2020-10-30 | 上海积成能源科技有限公司 | Residential electricity data missing value interpolation method based on neighbor algorithm |
WO2021016995A1 (en) * | 2019-08-01 | 2021-02-04 | 深圳大学 | Data processing method and apparatus, computer device, and storage medium |
CN113219499A (en) * | 2021-04-07 | 2021-08-06 | 中铁第四勘察设计院集团有限公司 | Position time series abnormity detection method and device and computer storage medium |
CN113972664A (en) * | 2021-10-29 | 2022-01-25 | 国网上海市电力公司 | Electric power data complement method and system |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103177088A (en) * | 2013-03-08 | 2013-06-26 | 北京理工大学 | Biomedicine missing data compensation method |
CN103525917A (en) * | 2013-09-24 | 2014-01-22 | 北京百迈客生物科技有限公司 | Construction and evaluation of parting High Map on basis of high throughput |
CN103778329A (en) * | 2014-01-13 | 2014-05-07 | 成都国科海博信息技术股份有限公司 | Method for constructing data complement value |
CN105117988A (en) * | 2015-10-14 | 2015-12-02 | 国家电网公司 | Method for interpolating missing data in electric power system |
CN105224507A (en) * | 2015-09-29 | 2016-01-06 | 杭州天宽科技有限公司 | A kind of disappearance association rule mining method based on tensor resolution |
CN105893610A (en) * | 2016-04-26 | 2016-08-24 | 中国科学院信息工程研究所 | Deficiency-source completion method of multi-source heterogeneous large data |
-
2016
- 2016-08-26 CN CN201610743642.XA patent/CN106096324A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103177088A (en) * | 2013-03-08 | 2013-06-26 | 北京理工大学 | Biomedicine missing data compensation method |
CN103525917A (en) * | 2013-09-24 | 2014-01-22 | 北京百迈客生物科技有限公司 | Construction and evaluation of parting High Map on basis of high throughput |
CN103778329A (en) * | 2014-01-13 | 2014-05-07 | 成都国科海博信息技术股份有限公司 | Method for constructing data complement value |
CN105224507A (en) * | 2015-09-29 | 2016-01-06 | 杭州天宽科技有限公司 | A kind of disappearance association rule mining method based on tensor resolution |
CN105117988A (en) * | 2015-10-14 | 2015-12-02 | 国家电网公司 | Method for interpolating missing data in electric power system |
CN105893610A (en) * | 2016-04-26 | 2016-08-24 | 中国科学院信息工程研究所 | Deficiency-source completion method of multi-source heterogeneous large data |
Non-Patent Citations (1)
Title |
---|
郝胜轩 等: "基于近邻噪声处理的KNN缺失数据填补算法", 《计算机仿真》 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106651651A (en) * | 2016-12-12 | 2017-05-10 | 全球能源互联网研究院 | Data filling method and device for utilization power curve of grid user |
CN106651651B (en) * | 2016-12-12 | 2021-10-01 | 全球能源互联网研究院有限公司 | Power grid user power consumption curve data filling method and device |
CN107193876A (en) * | 2017-04-21 | 2017-09-22 | 美林数据技术股份有限公司 | A kind of missing data complementing method based on arest neighbors KNN algorithms |
CN107193876B (en) * | 2017-04-21 | 2020-10-09 | 美林数据技术股份有限公司 | Missing data filling method based on nearest neighbor KNN algorithm |
CN109408767A (en) * | 2018-10-17 | 2019-03-01 | 国网四川省电力公司乐山供电公司 | A kind of complementing method towards power grid missing data |
CN110274995A (en) * | 2019-06-18 | 2019-09-24 | 深圳市美兆环境股份有限公司 | Fill the determination method, apparatus and computer equipment of data |
WO2021016995A1 (en) * | 2019-08-01 | 2021-02-04 | 深圳大学 | Data processing method and apparatus, computer device, and storage medium |
CN111768034A (en) * | 2020-06-29 | 2020-10-13 | 上海积成能源科技有限公司 | Method for interpolating and supplementing missing value based on neighbor algorithm in power load prediction |
CN111861798A (en) * | 2020-08-07 | 2020-10-30 | 上海积成能源科技有限公司 | Residential electricity data missing value interpolation method based on neighbor algorithm |
CN113219499A (en) * | 2021-04-07 | 2021-08-06 | 中铁第四勘察设计院集团有限公司 | Position time series abnormity detection method and device and computer storage medium |
CN113972664A (en) * | 2021-10-29 | 2022-01-25 | 国网上海市电力公司 | Electric power data complement method and system |
CN113972664B (en) * | 2021-10-29 | 2024-02-20 | 国网上海市电力公司 | Electric power data complement method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106096324A (en) | The power transmission and transformation main equipment load data disappearance returned based on k neighbour fills up algorithm | |
Gao et al. | Multi-criteria group decision-making framework for offshore wind farm site selection based on the intuitionistic linguistic aggregation operators | |
CN102426674B (en) | Power system load prediction method based on Markov chain | |
CN102509173B (en) | A kind of based on markovian power system load Accurate Prediction method | |
CN103023065B (en) | Wind power short-term power prediction method based on relative error entropy evaluation method | |
CN104574217A (en) | Intelligent power distribution network online risk assessment method | |
CN105139095A (en) | Power distribution network running state evaluation method based on attribute area module | |
CN103279639A (en) | Receiving-end network voltage stabilization overall process situation assessment and prevention and control method based on responses | |
CN104933505A (en) | Decision and evaluation method for intelligent power distribution network group based on fuzzy assessment | |
US20210273858A1 (en) | Machine-Learned Prediction of Network Resources and Margins | |
CN104037776A (en) | Reactive power grid capacity configuration method for random inertia factor particle swarm optimization algorithm | |
CN106503851A (en) | A kind of improved Short-Term Load Forecasting Method based on wavelet analysises | |
CN115063020B (en) | Multi-dimensional safety scheduling device and method for cascade hydropower station based on risk monitoring fusion | |
CN105514990A (en) | Power transmission line utilization rate improving platform and method taking economic benefits and safety into integrated consideration | |
Chen et al. | Air-conditioning load forecasting for prosumer based on meta ensemble learning | |
CN102509026A (en) | Comprehensive short-term output power forecasting model for wind farm based on maximum information entropy theory | |
CN103971175A (en) | Short-term load prediction method of multistage substations | |
CN102904248B (en) | Electric power system dispatching method based on wind electricity output uncertainty aggregation | |
CN105529714A (en) | Normal distribution combination characteristic-based rapid probabilistic power flow calculation method | |
CN103400213B (en) | A kind of bulk transmission grid survivability evaluation method based on LDA Yu PCA | |
CN102904252B (en) | Method for solving uncertainty trend of power distribution network with distributed power supply | |
Wang et al. | Short-term wind power prediction based on DBSCAN clustering and support vector machine regression | |
CN106655266B (en) | It is a kind of access new energy area power grid can flexible modulation power configuration method | |
CN115577996B (en) | Risk assessment method, system, equipment and medium for power grid power failure plan | |
CN116362421A (en) | Energy supply distribution prediction system and method based on comprehensive overall analysis of energy sources |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20161109 |
|
RJ01 | Rejection of invention patent application after publication |