CN110690701A

CN110690701A - Analysis method for influence factors of abnormal line loss

Info

Publication number: CN110690701A
Application number: CN201910983595.XA
Authority: CN
Inventors: 周静龙; 贾黎亮; 李春华; 李强仁; 王宗宝; 虎爱燕; 杨平礼
Original assignee: Individual
Current assignee: Individual
Priority date: 2019-10-16
Filing date: 2019-10-16
Publication date: 2020-01-14

Abstract

The invention discloses a method for analyzing influence factors of abnormal line loss, which comprises the following steps: s1, selecting a technical route, S2, preparing data, S3 and mining and analyzing the data. On one hand, a transportation inspection responsible person of a test point application unit can make a reasonable threshold value based on a current line loss prediction result, and when the line loss exceeds an acceptable range, the inspection is carried out; on the other hand, according to the model result, the influence degree of each factor change on the line loss can be analyzed, corresponding loss reduction measures are made, the transmission loss, particularly variable loss, of the power grid is reduced, and in the characteristic factor screening process, an algorithm combining characteristic correlation and characteristic importance is adopted, so that the selection of the influence factor of the line loss is more reasonable. Meanwhile, the training set of the random forest algorithm is random and the features are random, so that the algorithm is suitable for high-dimensional data and is more accurate in prediction.

Description

Analysis method for influence factors of abnormal line loss

Technical Field

The invention relates to the technical field of electric power, in particular to an analysis method for influence factors of abnormal line loss.

Background

Line loss is the loss of electrical energy during transmission of the electrical power grid due to resistance effects, magnetic field effects and management aspects. The line loss can comprehensively reflect the planning design of the power grid, the quality of the running state of the power grid and the level of the management and operation level of the power grid. The line loss rate is abnormal due to reasons such as improper management and technology, and energy waste is caused, so line loss management work must be carried out, and the management level of line loss and the fine level of power grid operation management are effectively improved. In the line loss management work, analyzing abnormal line loss occupies a very important position. The work can effectively classify abnormal line loss, factors such as the cause, the property, the proportion of each component and the like of the unit line loss can be deeply known through line loss analysis, main factors influencing the loss are found out, corresponding measures are pertinently taken, and a larger loss reduction effect and economic benefit are obtained with less investment.

The traditional line loss management method is that a distribution network is partitioned, divided into partial areas, component partitions and distribution areas through professional cooperation of multiple systems and data resource fusion, on-line monitoring is conducted on all the parts, and then factors are eliminated one by one. Therefore, the difficulty of power grid management is increasing day by day, and the line loss renovation work becomes a problem which needs to be solved at present.

Disclosure of Invention

The invention aims to solve the defects in the prior art, and provides an abnormal line loss influence factor analysis method which is based on line loss prediction and influence factor analysis, develops line technology loss reduction, adopts measures of adjusting a power grid operation mode, controlling reactive power, adjusting voltage and the like, provides decision support for the line technology loss reduction, and assists a power grid to operate efficiently and economically.

In order to achieve the purpose, the invention adopts the following technical scheme:

an analysis method for influence factors of abnormal line loss comprises the following steps:

s1, selecting a technical route: the technical route selection comprises a data storage part, a data calculation processing part and a data mining part;

the data storage part considers the requirements of project data collection and storage, selects an Oracle database for storage at the early stage, and selects a Hive (data warehouse) of a large data platform for data storage according to the data volume condition at the later stage;

the data calculation processing part is used for calculating, processing and training the data by using Pycharm, Anaconda and other related integrated tool environments;

the data mining analysis part is used for data preparation and data processing;

secondly, data rules are explored through data distribution, trend analysis, characteristic correlation analysis and other ways, and basis is provided for subsequent modeling;

analyzing line loss influence factors based on feature correlation and feature importance;

fourthly, constructing a line loss prediction model based on a random forest regression algorithm, and finally integrating to form an online application result;

s2, preparing data;

preprocessing data; the data of the section area of the wire is empty or modified according to the data which does not conform to the type of the wire: firstly, obtaining a model corresponding to data that the wire sectional area data is empty or does not conform to the wire model, then processing the model data to obtain the wire sectional area corresponding to the model, and finally filling a missing value or an abnormal value of the wire sectional area according to the corresponding relation: modifying the tower span data to be null or 0 value data: filling the tower span missing value by using the median; and (3) modifying the tower altitude for null data: filling with an average value: processing the data of the diameter of the insulator core rod to be empty or 0 value: filling or replacing according to the diameter of the core rod of the corresponding manufacturer and model; filling missing values of voltage and current: performing filling-up on the voltage or current data with the deficiency by adopting a smoothing method so as to fit and form a complete 96-point power utilization curve; if the numerical value exceeds the normal reasonable range and sudden increase and sudden decrease exist, the numerical value is regarded as a pulse value, and the pulse value is emptied and then is filled; deleting fields, such as component numbers, component names, line main keys, dates and the like, which are irrelevant to line loss analysis in the data width table;

checking normality; using a normality test method to perform normality test on data in a line loss data wide table of irrelevant fields such as deleted element numbers, wherein the original hypothesis of the test is that a test sample conforms to normal distribution, a judgment threshold value is 0.001, if the numerical value of a test result is less than 0.001, the original hypothesis is rejected, namely the data distribution does not conform to the normal distribution, the test results of all the fields in the wide table are less than the threshold value, all the fields do not conform to the normal distribution, and when the characteristic correlation is calculated, a calculation method based on the normal distribution cannot be used;

s3, data mining analysis: based on four types of parameter data such as weather, equipment, operation, power failure and the like, wide-list data based on line loss influence factor service are generated in an integrated mode, line loss influence factors are analyzed from two aspects of feature relevance and feature importance, data items with higher feature relevance and feature importance in a sorted mode are selected respectively, and the final line loss influence factors are determined and obtained.

Preferably, in step S2, the data source understands, based on the analysis in the service demand, that the internal data of the electric power for the analysis is obtained from the electric power information systems such as PMS2.0, marketing, synchronization line loss, and acquisition, the data period is from 2018 to 2019, and the total number of data records is about 152 ten thousand.

Preferably, the step S2 further includes: in the reliability aspect, line equipment data come from a PMS2.0 system, operation parameter data come from an electricity utilization information acquisition system, and although certain abnormal acquisition data exist, subsequent data mining is not influenced; in the aspect of integrity, the data of the line equipment has the problems that the data of the section area of the wire is empty or does not conform to the type of the wire, the data of the span of the tower is empty or 0 value, the altitude of the tower is empty, and the diameter of the core rod is empty or 0 value, and can be improved through subsequent data processing.

The invention has the following beneficial effects:

1. on one hand, a transportation inspection responsible person of a test point application unit can make a reasonable threshold value based on a current line loss prediction result, and when the line loss exceeds an acceptable range, the inspection is carried out; on the other hand, according to the model result, the influence degree of each factor change on the line loss can be analyzed, and corresponding loss reduction measures are made, so that the transmission loss, particularly variable loss, of the power grid is reduced.

2. In the process of screening the characteristic factors, an algorithm combining the characteristic correlation and the characteristic importance is adopted, so that the selection of the line loss influence factors is more reasonable. Meanwhile, the training set of the random forest algorithm is random and the features are random, so that the algorithm is suitable for high-dimensional data and is more accurate in prediction.

Detailed Description

In order that the above objects, features and advantages of the present invention will be readily understood, numerous specific details are set forth in the following description in order to provide a thorough understanding of the present invention. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein, but rather should be construed as broadly as the present invention is capable of modification in various respects, all without departing from the spirit and scope of the present invention.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

the data mining analysis part is used for data preparation and data processing;

s2, preparing data;

in step S2, the data source is understood based on the analysis in the service demand, the internal data of the electric power of the analysis is obtained from the electric power information systems such as PMS2.0, marketing, line loss at the same period, utilization and collection, the data period is from 2018 to 2019 and 6, and the total number of data records is about 152 ten thousand.

In step S2, the method further includes: in the reliability aspect, line equipment data come from a PMS2.0 system, operation parameter data come from an electricity utilization information acquisition system, and although certain abnormal acquisition data exist, subsequent data mining is not influenced; in the aspect of integrity, the data of the line equipment has the problems that the data of the section area of the wire is empty or does not conform to the type of the wire, the data of the span of the tower is empty or 0 value, the altitude of the tower is empty, and the diameter of the core rod is empty or 0 value, and can be improved through subsequent data processing.

Supplementary explanation:

feature correlation analysis

Line loss classification

And classifying the data in the data width table according to the line loss. The classification criteria and their class names are as follows:

table1 line loss classification

Classification criteria	Category name
		Line loss<-0.8%	Negative loss
-0.8%<Loss of wire = line<0	Normal 0
		0<Loss of wire = line<=3%	Normal 1
Line loss>3%	High loss

Computing feature correlations

As can be seen from data exploration, the data are not in accordance with normal distribution, and the sperman correlation algorithm is used for calculating the field characteristic correlation.

Feature importance analysis

Deleting feature-strong fields

According to the feature correlation calculation result of each category, the fields with the feature correlation exceeding 0.9 are counted and only one of the fields with strong feature correlation is reserved according to the service requirement.

Algorithm selection

The currently popular feature importance calculation methods mainly include: decision tree, random forest, Xgboost. The decision tree algorithm has the problems of easy overfitting and easy falling into a local optimal solution, and the other two algorithms improve the defects of the decision tree by establishing a plurality of decision trees. For random forest and Xgboost algorithms, the random forest algorithm not only randomly collects samples, but also randomly selects features, so that the overfitting prevention capability is stronger. In combination with business practice, the characteristics in the existing sample may have no relation with line loss, and compared with the Xgboost training using all characteristics, the random forest randomly selects the characteristics of the characteristics, and may obtain better results. So a random forest is selected as the algorithm for calculating the feature importance.

Computing feature importance

And calculating the feature importance by using the data with the strong correlation field deleted and a random forest algorithm after necessary parameter adjustment.

Feature merging

And screening the characteristic correlation and characteristic importance fields, and combining the obtained fields. And deleting the fields with strong feature correlation in the fields after combination to obtain a feature combination result.

Feature selection

And (4) calculating the feature importance again by using the feature merging result, and deleting the fields with lower feature importance and taking the rest fields as the analysis results of the line loss influence factors of the line.

Table2 line loss influence factor analysis result

Category name	Field(s)
		General of	Input electric quantity, voltage difference between two sides, average reactive power, load rate, average voltage, line length and line running time
Negative loss	Input electric quantity, voltage difference between two sides, average reactive power, load rate, average voltage, daytime wind power level and wire average operation time
		Normal 0	Input electric quantity, voltage difference between two sides, average reactive power, load rate and average operation time of lead
Normal 1	Input electric quantity, voltage difference between two sides, average reactive power, load rate, average voltage, line length and line running time
		High loss	Input electric quantity, voltage difference between two sides, average reactive power, average voltage, line running time and wire average operation time

Line loss prediction model based on random forest

The model is constructed by utilizing the random forest algorithm selected from three algorithms of decision trees, random forests and Xgboost through cross validation according to the data corresponding to the line loss influence factors of each category obtained by analyzing the line loss influence factors. And (4) adjusting parameters of necessary parameters in the algorithm by using grid search, and finally obtaining line loss prediction models of various categories.

Sample selection

The total data set is used as a sample of the line loss prediction model of the line because the type of the future line loss numerical value cannot be determined, a line loss prediction model of a specific type cannot be used, and only the line loss prediction model trained by using all data can be constructed.

Algorithm selection

Analyzing the line loss data, the problem is regression problem. The regression algorithm used in common use mainly includes the following: linear regression, SVR, decision tree, random forest, Xgboost. The above algorithm uses cross-validation to use the overall data set and default parameters for error results and its advantages and disadvantages as follows:

table3 algorithm selection

Algorithm	Advantages of the invention	Disadvantages of	Running error
				Decision tree	Simple and intuitive without data preprocessing basically	Easy overfitting and easy obtaining of local optimal solution	3.0870
Random forest	Adapt to data with large data volume and high dimensionality	Some data that are too noisy are prone to overfitting	0.2947
				Xgboost	Strong over-fitting prevention capability and parallelization support	Large memory occupation	0.4994

Parameter training

For the n _ estimators and min _ weight _ fraction _ leaf parameters, the grid search parameter is used, and the default values for the other parameters are used.

After parameter adjustment, the value of n _ estimators is 100, the value of min _ weight _ fraction _ leaf is 0, the training set error is 0.2080, and the test set error is 0.2600.

Results of the model

The actual prediction results of the model are as follows:

table4 predicts results

Input device	Voltage difference between two sides	Average reactive power	Rate of load	Average voltage	Line length	Line running time	Line loss	Predicting line loss
									14000	-13.62	0.34	15.25	34686.39	20.07	35.23	-0.50	-0.28
29960	-4.04	1.14	11.36	35407.54	6.66	11.89	0.00	0.08
									108360	228.13	5.19	27.36	36451.37	4.48	76.85	0.39	0.47
38220	128.46	1.21	40.35	39311.33	8.86	9.01	0.00	0.14
									113120	749.14	3.76	45.32	39666.85	20.61	7.83	1.73	1.36
21560	0.00	0.61	16.44	36065.54	28.20	1.70	1.30	1.04
									201600	253.17	4.11	121.40	36174.69	3.09	16.40	0.35	0.42
12320	-299.87	0.43	18.30	36858.28	20.00	28.56	-0.57	-0.27
									135520	97.34	3.67	58.91	38040.62	2.58	18.81	0.21	0.26

On one hand, a transportation inspection responsible person of a test point application unit can make a reasonable threshold value based on a current line loss prediction result, and when the line loss exceeds an acceptable range, the inspection is carried out; on the other hand, according to the model result, the influence degree of each factor change on the line loss can be analyzed, and corresponding loss reduction measures are made to reduce the transmission loss of the power grid. In particular variable losses. In the process of screening the characteristic factors, an algorithm combining the characteristic correlation and the characteristic importance is adopted, so that the selection of the line loss influence factors is more reasonable. Meanwhile, the training set of the random forest algorithm is random and the features are random, so that the algorithm is suitable for high-dimensional data and is more accurate in prediction.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention should be equivalent or changed within the scope of the present invention.

Claims

1. An analysis method for influence factors of abnormal line loss is characterized by comprising the following steps:

the data storage part considers the requirements of project data collection and storage, selects an Oracle database for storage at the early stage, and selects a Hive of a large data platform for data storage according to the data volume condition at the later stage;

the data mining analysis part is used for data preparation and data processing;

s2, preparing data;

2. The method as claimed in claim 1, wherein in step S2, the data source is obtained from PMS2.0, marketing, line loss in the same period, and power information system such as mining, based on the analysis understanding in the business requirement, the data period is from 2018 to 2019 and 6 months, and the total number of data records is about 152 ten thousand.

3. The method for analyzing abnormal line loss influence factors according to claim 1, wherein the step S2 further includes the following steps: in the reliability aspect, line equipment data come from a PMS2.0 system, operation parameter data come from an electricity utilization information acquisition system, and although certain abnormal acquisition data exist, subsequent data mining is not influenced; in the aspect of integrity, the data of the line equipment has the problems that the data of the section area of the wire is empty or does not conform to the type of the wire, the data of the span of the tower is empty or 0 value, the altitude of the tower is empty, and the diameter of the core rod is empty or 0 value, and can be improved through subsequent data processing.