CN107301499B - Distribution feeder statistical line loss rate data cleaning method based on AMI data - Google Patents

Distribution feeder statistical line loss rate data cleaning method based on AMI data Download PDF

Info

Publication number
CN107301499B
CN107301499B CN201710395527.2A CN201710395527A CN107301499B CN 107301499 B CN107301499 B CN 107301499B CN 201710395527 A CN201710395527 A CN 201710395527A CN 107301499 B CN107301499 B CN 107301499B
Authority
CN
China
Prior art keywords
line
data
line loss
distribution feeder
statistical
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710395527.2A
Other languages
Chinese (zh)
Other versions
CN107301499A (en
Inventor
王守相
董鹏飞
田英杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
State Grid Shanghai Electric Power Co Ltd
Original Assignee
Tianjin University
State Grid Shanghai Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University, State Grid Shanghai Electric Power Co Ltd filed Critical Tianjin University
Priority to CN201710395527.2A priority Critical patent/CN107301499B/en
Publication of CN107301499A publication Critical patent/CN107301499A/en
Application granted granted Critical
Publication of CN107301499B publication Critical patent/CN107301499B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06315Needs-based resource requirements planning or analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/06Electricity, gas or water supply

Abstract

The invention relates to a distribution feeder statistical line loss rate data cleaning method based on AMI data, which comprises the following steps: extracting line characteristic parameters, transformer characteristic parameters, theoretical line loss rate and statistical line loss rate related to the line loss of the distribution feeder from AMI data, and constructing a distribution feeder line loss characteristic database; carrying out missing value and abnormal value detection on the statistical line loss data of the power distribution feeder line, and dividing all the data into data to be cleaned and normal data, namely training data; establishing a power distribution feeder statistical line loss estimation model based on XGBOST, and determining model parameters by using training data; and correcting the data to be cleaned by utilizing the estimation model.

Description

Distribution feeder statistical line loss rate data cleaning method based on AMI data
Technical Field
The invention belongs to the field of line loss management of an electric power system.
Background
The line loss rate is a comprehensive technical and economic index reflecting power grid planning design and operation management, and has an important guiding function for grid optimization, energy conservation and loss reduction. The loss of a 10kV medium-voltage power distribution network (namely a power distribution feeder) accounts for 24.7 percent of the total loss of the power distribution network, and the loss accounts for the highest percentage in each voltage class and is a heavy loss layer. Therefore, the research on the data condition of the statistical line loss of the distribution feeder line has important significance for the line loss management of the power system.
By analyzing the statistical line loss of the distribution feeder in AMI data provided at a certain place, the number of samples containing missing values in 44172 data samples is 15283, accounting for 34.6%; the number of samples with outliers was 35378, accounting for 80.1%. Therefore, the data quality of the line loss counted by the power distribution feeder line is very poor, and the phenomena of data loss and data abnormity are serious.
Disclosure of Invention
The invention provides a cleaning method for power distribution feeder line statistical line loss data based on AMI data and theoretical line loss rate influence, aiming at the characteristics of large data size, multiple data types and complex relation among multi-source data of intelligent power distribution and power distribution big data. The technical scheme is as follows:
a distribution feeder statistical line loss rate data cleaning method based on AMI data comprises the following steps:
the method comprises the following steps: extracting line characteristic parameters, transformer characteristic parameters, theoretical line loss rate and statistical line loss rate related to the line loss of the distribution feeder from AMI data, and constructing a distribution feeder line loss characteristic database;
step two: carrying out missing value and abnormal value detection on the statistical line loss data of the power distribution feeder line, and dividing all the data into data to be cleaned and normal data, namely training data;
step three: establishing a power distribution feeder statistical line loss estimation model based on XGBOST, and determining model parameters by using training data, wherein the method comprises the following steps of:
1) setting an initial estimated value and iteration times of the statistical line loss rate of the distribution feeder, and repeating the following steps from 2) to 4) for each iteration;
2) calculating the first and second derivatives g of the loss functioniAnd hiI.e. by
Figure BDA0001307243000000011
Figure BDA0001307243000000021
Wherein y isiAnd
Figure BDA0001307243000000022
respectively counting the actual value of the line loss rate of the ith distribution feeder line and the estimated value of the model in (t-1) iteration;
Figure BDA0001307243000000023
3) traversing each tree structure by a greedy algorithm to find a tree structure f which minimizes the following objective function Objt(xi) And calculating the optimal weight of each leaf node
Figure BDA0001307243000000024
Figure BDA0001307243000000025
Figure BDA0001307243000000026
Wherein the content of the first and second substances,
Figure BDA0001307243000000027
Ijcounting a line loss example set for the power distribution feeder of the jth leaf node; t is the number of leaf nodes; lambda and gamma are adjustable coefficients;
4) f in the previous stept(xi) Added to the model, i.e.
Figure BDA0001307243000000028
5) The decision trees established in each iteration are overlapped to obtain an estimation model of the statistical line loss rate of the distribution feeder, namely
Figure BDA0001307243000000029
Step four: and correcting the data to be cleaned by utilizing the estimation model.
Line characteristic parameters related to line loss of the distribution feeder line comprise line type, total line length, line power supply amount and line commissioning time; transformer parameters related to distribution feeder line loss, including distribution transformer rated capacity, short circuit loss, no-load loss, and commissioning time; the theoretical line loss rate of the distribution feeder is usually calculated by an equivalent resistance method.
The XGB OST algorithm is easy to realize distributed and parallel computation and is suitable for large-scale data sets with various data types and complex data relationships. The XGB OST algorithm is applied to the data cleaning of the distribution feeder line statistical line loss rate based on AMI data, so that the accuracy of data cleaning can be improved, and the data quality can be effectively improved.
Drawings
Fig. 1 is a flow chart of a distribution feeder line loss characteristic database construction.
Fig. 2 is a flow chart of XGBOOST based statistical line loss estimation for a distribution feeder.
FIG. 3 is a diagram of a cleaning process of statistical line loss data of a power distribution feeder based on AMI data, accounting for theoretical line loss rate influence and applying an XGB OST algorithm.
Fig. 4 is an example of a decision tree constructed in the XGBOOST-based statistical line loss estimation model of the distribution feeder according to an embodiment of the present invention.
Fig. 5 is an estimation result of a statistical line loss estimation model of a distribution feeder based on the XGBOOST algorithm on a training set according to an embodiment of the present invention.
Detailed Description
The invention discloses a cleaning method for statistical line loss data of a power distribution feeder line, which comprises the following steps:
the method comprises the following steps: and extracting line characteristic parameters, transformer characteristic parameters, theoretical line loss rate and statistical line loss rate related to the line loss of the distribution feeder from the AMI data to construct a distribution feeder line loss characteristic database. The method comprises the following specific steps:
1) and extracting data. Extracting a distribution feeder (10kV line) line equipment ID, a line equipment name, a date, a line power supply amount, a line loss rate and a theoretical line loss rate from a line statistical line loss database; extracting the line type, the total line length and the line commissioning time from a line parameter table in an equipment ledger database through the line equipment name; and determining the ID of the distribution transformer governed by the distribution feeder line through the line equipment ID, and further extracting the rated capacity, short-circuit loss, no-load loss and commissioning time of the distribution transformer from a distribution transformer parameter table in an equipment ledger database.
2) And constructing a distribution feeder line loss characteristic database. Line loss data, line parameter data and transformer parameter data are counted by associating lines through line equipment ID, commissioning time is converted into month difference from line loss statistics, irrelevant variables are eliminated, and a distribution feeder line loss characteristic database which only contains line types, total line lengths, line commissioning time, line power supply quantity, distribution transformer rated capacity, distribution transformer short circuit loss, distribution transformer no-load loss, distribution transformer commissioning time, statistical line loss rate and ten variables of theoretical line loss rate is constructed.
Step two: and (3) detecting missing values and abnormal values of the statistical line loss data of the power distribution feeder line, and dividing all the data into data to be cleaned and normal data (namely training data).
Step three: and establishing a power distribution feeder line statistical line loss estimation model based on XGBOST, and determining model parameters by using training data. The method comprises the following specific steps:
1) and optimally selecting model parameters. All training data is used as input to the XGBOOST model to optimize its parameters. Firstly, determining a learning rate and the optimal number of decision trees by a cross validation mode according to initial parameters, wherein nrounds is corresponding to eta, and the smaller eta can improve the robustness of the model, but the nrounds is increased to influence the calculation speed of the model; secondly, for given eta and nrounds, the values of max _ depth, min _ child _ weight, gamma, subsample and colsample _ byte are sequentially determined, and the reasonable values of the parameters can increase the robustness of the model and prevent over-fitting and under-fitting; and thirdly, optimizing regular parameters, and effectively preventing overfitting due to the complexity of the lambda parameter characterization model. The main parameters of the XGBOOST model are shown in table 1.
TABLE 1
Figure BDA0001307243000000031
2) And (5) training and verifying the model. All training data are randomly divided into a training set (accounting for 80%) and a testing set (accounting for 20%), the training set is used for training the model under the optimized model parameters, the structures of all decision trees (126 in this example) are further determined, and the model is verified in the testing set. The invention measures the accuracy of the model by Root Mean Square Error (RMSE), namely
Figure BDA0001307243000000041
FIG. 4 shows an example of a decision tree in a model, where the boxes represent non-leaf nodes, and each non-leaf node can be divided according to a certain feature parameter, and all feature parameters and their simplified representations are shown in Table 2. Each node also comprises Gain and Cover information, wherein the Gain value is the basis for dividing the node and is similar to the information Gain in the traditional decision tree model; cover represents the number of samples a node contains.
TABLE 2
Figure BDA0001307243000000042
Fig. 5 is an estimation result of the power distribution feeder statistical line loss estimation model based on the XGBOOST algorithm on the training set, in order to facilitate observation of the fitting effect of the model, the samples are renumbered according to the order of increasing the actual value of the statistical line loss rate, and the relationship between the actual value and the estimated value is shown. As can be seen from the figure, the algorithm can effectively fit the actual value of the statistical line loss, and meanwhile, the RMSE of the model is 0.508, so that the estimation accuracy is high.
Step four: and correcting the data to be cleaned by utilizing the estimation model. And extracting characteristic parameters related to the data to be cleaned from the distribution feeder line loss characteristic database, and correcting the data to be cleaned by taking the characteristic parameters as input of a XGB OST-based distribution feeder line statistical line loss estimation model. Table 3 shows the original statistical line loss values and the corrected values of a part of the data to be cleaned.
TABLE 3
Figure BDA0001307243000000043
Figure BDA0001307243000000051

Claims (2)

1. A distribution feeder statistical line loss rate data cleaning method based on AMI data comprises the following steps:
the method comprises the following steps: extracting line characteristic parameters, transformer characteristic parameters, theoretical line loss rate and statistical line loss rate related to the line loss of the distribution feeder from AMI data, and constructing a distribution feeder line loss characteristic database;
step two: carrying out missing value and abnormal value detection on the statistical line loss data of the power distribution feeder line, and dividing all the data into data to be cleaned and normal data, wherein the normal data is training data;
step three: establishing a power distribution feeder statistical line loss estimation model based on XGBOST, and determining model parameters by using training data, wherein the method comprises the following steps of:
1) setting an initial estimated value and iteration times of the statistical line loss rate of the distribution feeder, and repeating the following steps from 2) to 4) for each iteration;
2) calculating the first and second derivatives g of the loss functioniAnd hiI.e. by
Figure FDA0002533194800000011
Figure FDA0002533194800000012
Wherein y isiAnd
Figure FDA0002533194800000013
respectively counting the line loss rate for the ith distribution feederAnd (t-1) estimated values of the model at the iterations;
Figure FDA0002533194800000014
3) traversing each tree structure by a greedy algorithm to find a tree structure f which minimizes the following objective function Objt(xi) And calculating the optimal weight of each leaf node
Figure FDA0002533194800000015
Figure FDA0002533194800000016
Figure FDA0002533194800000017
Wherein the content of the first and second substances,
Figure FDA0002533194800000018
Ijcounting a line loss example set for the power distribution feeder of the jth leaf node; t is the number of leaf nodes; lambda and gamma are adjustable coefficients;
4) f in the previous stept(xi) Added to the model, i.e.
Figure FDA0002533194800000019
5) The decision trees established in each iteration are overlapped to obtain an estimation model of the statistical line loss rate of the distribution feeder, namely
Figure FDA00025331948000000110
xiIs the i-th line characteristic parameter related to the line loss of the distribution feeder, ft(xi) The ith decision tree model is used for training the ith line characteristic parameter related to the line loss of the power distribution feeder line during the tth iteration;
step four: and correcting the data to be cleaned by utilizing the estimation model.
2. The cleaning method according to claim 1, wherein the line characteristic parameters related to the line loss of the distribution feeder line include a line type, a total line length, a line power supply amount and a line commissioning time; transformer parameters related to distribution feeder line loss, including distribution transformer rated capacity, short circuit loss, no-load loss, and commissioning time; and calculating the theoretical line loss rate of the distribution feeder line by using an equivalent resistance method.
CN201710395527.2A 2017-05-27 2017-05-27 Distribution feeder statistical line loss rate data cleaning method based on AMI data Active CN107301499B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710395527.2A CN107301499B (en) 2017-05-27 2017-05-27 Distribution feeder statistical line loss rate data cleaning method based on AMI data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710395527.2A CN107301499B (en) 2017-05-27 2017-05-27 Distribution feeder statistical line loss rate data cleaning method based on AMI data

Publications (2)

Publication Number Publication Date
CN107301499A CN107301499A (en) 2017-10-27
CN107301499B true CN107301499B (en) 2020-09-15

Family

ID=60137366

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710395527.2A Active CN107301499B (en) 2017-05-27 2017-05-27 Distribution feeder statistical line loss rate data cleaning method based on AMI data

Country Status (1)

Country Link
CN (1) CN107301499B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111612053B (en) * 2020-05-14 2023-06-27 国网河北省电力有限公司电力科学研究院 Calculation method for reasonable interval of line loss rate
CN111860605B (en) * 2020-06-24 2022-12-13 广州明珞汽车装备有限公司 Process beat processing method, system, device and storage medium
CN112232667A (en) * 2020-10-16 2021-01-15 国家电网有限公司 Load moment-based method for quantizing cost of synchronous line loss of power distribution network

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102231144A (en) * 2011-06-03 2011-11-02 中国电力科学研究院 Method for predicting theoretical line loss of power distribution network based on Boosting algorithm
CN103177341A (en) * 2013-03-29 2013-06-26 山东电力集团公司 Line loss lean comprehensive management system and method
CN106127387A (en) * 2016-06-24 2016-11-16 中国电力科学研究院 A kind of platform district based on BP neutral net line loss per unit appraisal procedure
CN106339811A (en) * 2016-08-26 2017-01-18 中国电力科学研究院 Low-voltage distribution network precise line loss analysis method
CN106372747A (en) * 2016-08-27 2017-02-01 天津大学 Random forest-based zone area reasonable line loss rate estimation method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102231144A (en) * 2011-06-03 2011-11-02 中国电力科学研究院 Method for predicting theoretical line loss of power distribution network based on Boosting algorithm
CN103177341A (en) * 2013-03-29 2013-06-26 山东电力集团公司 Line loss lean comprehensive management system and method
CN106127387A (en) * 2016-06-24 2016-11-16 中国电力科学研究院 A kind of platform district based on BP neutral net line loss per unit appraisal procedure
CN106339811A (en) * 2016-08-26 2017-01-18 中国电力科学研究院 Low-voltage distribution network precise line loss analysis method
CN106372747A (en) * 2016-08-27 2017-02-01 天津大学 Random forest-based zone area reasonable line loss rate estimation method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Extreme Gradient Boosting as a Method for Quantitative Structure–Activity Relationships;Robert P. Sheridan etc.;《Joural of Chemical information and modeling》;20161123;第2353-2360页 *
XGBoost: A Scalable Tree Boosting System;Tianqi Chen etc.;《KDD "16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining》;20160831;第785–794页 *
应用AMI 数据的低压配电网精确线损分析;赵磊等;《电网技术》;20151130;第39卷(第11期);第3189-3194页 *

Also Published As

Publication number Publication date
CN107301499A (en) 2017-10-27

Similar Documents

Publication Publication Date Title
CN109308571B (en) Distribution line variable relation detection method
CN110082699A (en) A kind of low-voltage platform area intelligent electric energy meter kinematic error calculation method and its system
CN107301499B (en) Distribution feeder statistical line loss rate data cleaning method based on AMI data
CN106779277B (en) Classified evaluation method and device for network loss of power distribution network
CN105656031B (en) The methods of risk assessment of power system security containing wind-powered electricity generation based on Gaussian Mixture distribution characteristics
CN108599172B (en) Transmission and distribution network global load flow calculation method based on artificial neural network
CN109659973B (en) Distributed power supply planning method based on improved direct current power flow algorithm
CN109818369B (en) Distributed power supply planning method considering output fuzzy randomness
CN111159638A (en) Power distribution network load missing data recovery method based on approximate low-rank matrix completion
CN105046584A (en) K-MEANS algorithm-based ideal line loss rate calculation method
CN112149873A (en) Low-voltage transformer area line loss reasonable interval prediction method based on deep learning
CN104716641B (en) Method for assessing power supply capacity of power distribution network provided with distributed generation
CN114629128B (en) User low-voltage management method and system based on marketing and distribution data fusion
CN108596514A (en) Power equipment mixing Weibull Reliability Modeling based on fuzzy genetic algorithm
CN103279661B (en) Substation capacity Optimal Configuration Method based on Hybrid quantum inspired evolution algorithm
CN110910026B (en) Cross-provincial power transmission line loss intelligent management and decision method and system
CN110889565B (en) Distribution network routing inspection period calculation method based on multi-dimensional matrix decision
Song et al. Stochastic processes in renewable power systems: From frequency domain to time domain
CN111091141B (en) Photovoltaic backboard fault diagnosis method based on layered Softmax
CN115860797B (en) Electric quantity demand prediction method suitable for new electricity price reform situation
CN109918612A (en) A kind of platform area topological structure method of calibration based on sparse study
CN111552911B (en) Quantitative analysis method for technical line loss influence factors based on multi-scene generation
CN108564249B (en) Power distribution network confidence peak clipping benefit evaluation method considering distributed photovoltaic randomness
CN110571791B (en) Optimal configuration method for power transmission network planning under new energy access
CN108899905B (en) Identification method and device for key nodes in complex power grid

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant