AU2021105453A4

AU2021105453A4 - Method for forecasting line loss rate in low-voltage station area based on extreme gradient lifting decision tree

Info

Publication number: AU2021105453A4
Application number: AU2021105453A
Authority: AU
Inventors: Biyun Chen; Jiateng CHEN; Lianqiong Gan; Huiying LAN; Bin Li; Peijie LI; Junchao Liang; Yiran Zeng; Chi Zhang; Yun Zhu
Original assignee: Guangxi University
Current assignee: Guangxi University
Priority date: 2021-08-13
Filing date: 2021-08-13
Publication date: 2021-10-14
Anticipated expiration: 2029-08-13

Abstract

The invention discloses a method for predicting the line loss rate of a low-voltage station area based on an extreme gradient lifting decision tree, which comprises the following steps of: Collecting original data of the low-voltage station area, and preprocessing the original data of the low-voltage station area to obtain target data of the low-voltage station area; Based on the target data of low-pressure station area, the key features are screened by feature engineering, and the characteristic index system of low-pressure station area is constructed; The second GS-XGBoost prediction model is constructed, and the line loss rate of classified low-voltage stations is predicted by the second GS-XGBoost prediction model, and the prediction results are analyzed and evaluated. The method can accurately and quickly calculate the line loss rate of the low-voltage station area, improve the accurate loss reduction capability, realize the lean management of the line loss, and provide a basis for power supply enterprises to formulate reasonable loss reduction measures. 1/6 S1' Collectionoforiginal data in low-voltage stationarea iPrepr the original data oflow-voltage S2 station area Feature engineering screens key features and constructs S3 a feature index system for low-pressure stations S4 Classi fication of low-voltage station area The GS-XGBoost line loss rate prediction model is established 5 to predict the line loss rate of various low-voltage stations, and the prediction results are analyzed and evaluated Figure 1

Description

1/6

S1' Collectionoforiginal data in low-voltage stationarea

iPrepr the original data oflow-voltage S2 station area

Feature engineering screens key features and constructs S3 a feature index system for low-pressure stations

S4 Classi fication of low-voltage station area

The GS-XGBoost line loss rate prediction model is established 5 to predict the line loss rate of various low-voltage stations, and the prediction results are analyzed and evaluated

Figure 1

Method for forecasting line loss rate in low-voltage station area based on extreme

gradient lifting decision tree

TECHNICAL FIELD

The invention belongs to the technical field of distribution network line loss

calculation, and particularly relates to a method for predicting the line loss rate of a low

voltage station area based on an extreme gradient lifting decision tree.

BACKGROUND

With the steady development of economy and the continuous improvement of living

standards, the power load of the power grid is increasing, with 10 KV power grid and 0.4

KV power grid losing the largest proportion,the medium and low voltage distribution

network accounts for 55% of the total power loss, and the 10 KV power grid loses

26.28% of the total power loss, there are serious line losses in the station area, and the

line loss problem is becoming more and more prominent. There are three main reasons

for line loss in low-voltage distribution station area:

(1) Fixed loss, including resistance loss and excitation loss caused by inner winding

and iron core of transformer; Resistance loss caused by cable lines of power grid

transmission; Electric energy loss caused by capacitor and reactance equipment deployed

in power transmission network; Electric energy loss caused by protection devices in

power network; Loss caused by medium and loss caused by power grid metering device;

(2) Management reasons, mainly referring to meter reading problems and

insufficient management of electricity stealing, etc.;

(3) Technical reasons, which mainly refer to the inconsistency of business

accounting data and the inconsistency of household change relationship.

Nowadays, traditional line loss calculation methods such as equivalent resistance

method, voltage loss method, average current method and root mean square current

method have been widely used in actual production of power enterprises. However, in the

actual operation of the power grid system, the low-voltage network, as the "hardest hit

area" of the power grid, has a large number, serious line aging, various power supply

modes, and irregular load distribution along the line. Therefore, there are bottlenecks in

line loss calculation, and the traditional line loss calculation method can not extract

valuable information from historical data for related line loss calculation. Traditional line

loss qualification rate assessment method can no longer meet the requirements of line loss

lean management. Power supply enterprises urgently need to find an effective method to

calculate line loss, dynamically predict reasonable line loss in each station area, and

provide basis for energy saving, loss reduction and planning and transformation of power

grid.

Therefore, it is an urgent technical problem to provide a fast and accurate calculation

method of line loss in the station area.

SUMMARY

In view of this, the purpose of the present invention is to provide a prediction

method of line loss rate in low-voltage station area based on extreme gradient lifting

decision tree, the method applies feature engineering and machine learning algorithm to

the prediction of line loss rate in low-voltage station area, improves the ability of accurate

loss reduction and realizes lean management of line loss through accurate line loss

prediction model, solves the problems disclosed in the background art, simplifies the

calculation process of line loss, and improves the calculation efficiency and accuracy.

To achieve the above purpose, the present invention provides the following scheme:

a prediction method of line loss rate in low-voltage station area based on extreme

gradient lifting decision tree includes:

Collecting original data of low-voltage station area, and preprocessing the original

data of low-voltage station area to obtain target data of low-voltage station area;

Selecting key features through feature engineering based on the target data of the

low-pressure station area, constructing a low-pressure station area characteristic index

system, and classifying the low-pressure station area based on the low-pressure station

area characteristic index system;

Establishing a second GS-XGBoost prediction model, predicting the line loss rate of

the classified low-voltage station area through the second GS-XGBoost prediction model,

and analyzing and evaluating the prediction results.

Preferably, collecting the original data of the low-voltage station area includes

obtaining the cross-sectional area of the main line, the total number of low-voltage

meters, the power supply, the average load rate, the total length of the line, the

distribution capacity and the power factor which reflect the station area and load

characteristics.

Preferably, the pretreatment process comprises:

Processing the missing value of the original data of the low-voltage station area

based on a sparse matrix to obtain first data; Performing abnormal data detection on the

first data to obtain second data; Extracting characteristic data based on the second data,

and carrying out standardization processing on the characteristic data to obtain target data

of a low-voltage station area.

Preferably, the key features of feature engineering screening include:

The feature index weights of the original data of low-voltage station area are

evaluated by F-test filtering method and mutual information method, and the feature set is

obtained by combining MSE. The feature set is input into the first GS-XGBoost

prediction model, the mean square error value is calculated, and the feature set with the

smallest mean square error value is compared and selected as the feature index system of

low-voltage station area.

Preferably, constructing the low-pressure station characteristic index system further

comprises determining the number of key indexes of the low-pressure station

characteristic index system.

Preferably, classifying the low-pressure station area comprises:

Determining the number of categories to be clustered and the clustering center by

inputting a low-voltage station data set into the low-voltage station characteristic index

system; The cluster center closest to the low-pressure station sample data is obtained by

calculating the distance from the low-pressure station sample data to the cluster center,

and the low-pressure station sample data is assigned to the nearest cluster center to

complete the classification of the low-pressure station.

Preferably, predicting the line loss rate of the low voltage station area comprises:

And constructing the second GS-XGBoost prediction model based on the first GS

XGBoost prediction model and the extreme gradient lifting decision tree, and inputting

the low-voltage station data set into the second GS-XGBoost prediction model to obtain a

line loss rate prediction result.

Preferably, the prediction result is analyzed and evaluated by the mean square error

MSE, the mean absolute error MAE and the root mean square error RMSE;

The mean square error MSE is the average value of the sum of squares of minimized

errors and cost functions in linear regression model fitting.

The invention discloses the following technical effects:

The invention discloses a method for predicting the line loss rate of a low-voltage

station area based on an extreme gradient lifting decision tree, which ensures the

rationality of data and improves the data quality through data preprocessing; Through

feature engineering, the redundant features are eliminated and the burden of data

collection is reduced; Through the classification of low-pressure stations, all kinds of

stations have practical and obvious characteristics and significance; The line loss rate

prediction model is constructed to predict the line loss rate in low-voltage stations, and

grid search is combined to improve the performance of the model, which greatly

improves the prediction accuracy.

According to the method, seven mainstream characteristic factors in the low-voltage

station area are converted into four main factors, so that all data characteristics can be

included, the analysis difficulty can be simplified, and the extraction of key characteristic

indexes of line loss in the low-voltage station area can be realized; By mining the line

loss data in low-voltage station area, the nonlinear relationship between the electrical

characteristic index and the line loss rate is revealed. By analyzing and evaluating the line

loss result data through an accurate line loss rate prediction model, the line loss rate in

low-voltage station area can be calculated accurately and quickly, which provides

theoretical basis and decision support for rapid evaluation, accurate calculation and loss reduction planning of line loss data in low-voltage station area, improves the ability of accurate loss reduction, and realizes lean management of line loss, thus effectively improving the standardization of line loss in low-voltage station area.

BRIEF DESCRIPTION OF THE FIGURES

In order to explain the embodiments of the present invention or the technical scheme

in the prior art more clearly, the drawings needed in the embodiments will be briefly

introduced below, obviously, the drawings in the following description are only some

embodiments of the present invention, and for ordinary technicians in the field, other

drawings can be obtained according to these drawings without paying creative labor.

Brief description of the drawings Figure 1 is a flow diagram of a method for

predicting line loss rate in low-voltage stations based on extreme gradient lifting decision

tree provided by the present invention;

Figure 2 is a graph showing filtering results of F-test and mutual information method

in an embodiment of the present invention;

Figure 3 is a line chart representing mean square error values under different feature

numbers in an embodiment of the present invention;

Figure 4 is a structural schematic diagram of a GS-XGBoost line loss prediction

model in an embodiment of the present invention;

Figure 5 is a comparison graph of line loss prediction results in an embodiment of

the present invention;

Figure 6 is a graph of line loss rate prediction results of an extreme gradient boost

decision tree (XGBoost) without parameter adjustment in an embodiment of the present

invention;

Figure 7 is a graph of line loss rate prediction results of an unadjusted random forest

(RF) model in an embodiment of the present invention.

DESCRIPTION OF THE INVENTION

The technical scheme in the embodiments of the present invention will be described

clearly and completely with reference to the drawings in the embodiments of the present

invention, obviously, the described embodiments are only part of the embodiments of the

present invention, not all of them. Based on the embodiments of the present invention, all

other embodiments obtained by ordinary technicians in the field without creative labor

belong to the scope of protection of the present invention.

In order to make the above objects, features and advantages of the present invention

more obvious and easy to understand, the present invention will be further explained in

detail with reference to the drawings and specific embodiments.

As shown in Figure 1, the invention provides a method for predicting the line loss

rate of a low-voltage station area based on an extreme gradient lifting decision tree,

which comprises the following steps:

Si: Collecting the original data of the low-voltage station area;

S2: Preprocessing the original data of the low-voltage station area;

S3: Screening key features by feature engineering, and constructing a feature index

system of low-pressure station area;

S4: Classifying the low-voltage station area;

S5: Establishing a GS-XGBoost line loss rate prediction model, predicting the line

loss rates of various low-voltage stations, and analyzing and evaluating the prediction

results.

Among them, the collection of the original data of the low-voltage station area

specifically includes the following steps:

From the line loss management system and automatic measurement acquisition

system, seven main electrical characteristics and line loss rate data, which can best reflect

the characteristics of the station area and load, including the total number of low-voltage

meters, power supply, average load rate, total line length, distribution capacity and power

factor, are obtained.

Preprocessing the original data of low-voltage station area to ensure the rationality

of the data, improve the data quality, make the data obey normal distribution, overcome

the weight difference caused by different magnitude of characteristic index, and facilitate

modeling; The method specifically comprises the following steps:

(1) Sparse matrix is used to treat missing values, and XGBoost can automatically

process missing values, for missing values, the missing value data will be divided into the

left subtree and the right subtree to calculate the loss respectively, and the better one will

be selected, and this direction will be taken as the splitting direction of missing values to

improve the sample data set;

(2) Anomaly data detection uses the isolation forest algorithm to process continuous

data, and identifies points with scattered distribution, low density and far away from high

density areas as outliers in the station data;

(3) Feature data are extracted and standardized.

Specifically, the characteristic data is normalized by Z-Score, and its transformation

function is as follows:

-'

Wherein, is the average value of the original data and is the standard

deviation.

The characteristic data is normalized by Z-Score, and the characteristic data is

transformed into dimensionless values between [0,1], so that the variable values are in the

same position in order of magnitude.

Selecting key features by feature engineering and constructing feature index system

of low-voltage station area can eliminate redundant features and reduce the burden of

data collection, including the following steps:

(1) Initially select seven mainstream electrical characteristics which are usually

available and can best reflect the station area and load characteristics;

(2) F test filtering method and mutual information method are used to evaluate the

importance of each characteristic index;

(3) Combining MSE, different numbers of feature indexes are combined into

multiple feature sets, and the feature sets are input into GS-XGBoost model, and their

corresponding mean square error values in the model are calculated respectively; The

GS-XGBoost model proposed here is not the final model, but by comparing the mean

square error values of different models, the feature set that minimizes the mean square

error value is selected as the final key feature index system, and the number of key

indicators in the final feature index system is determined.

(4) Select the feature set that minimizes the mean square error value as the final key

feature index system, and determine the number of key indicators in the final feature

index system.

Specifically, F-test filtering method, also known as variance homogeneity test, is a

filtering method used to capture the linear relationship between each feature, and features

with p value less than 0.01 or 0.05 are selected as significant linear correlation features,

X={X,,X 2,...,x} the F-test filtering law stipulates that feature data and line loss rate

y={yI,y2 ,...,yn)

are two data sets that obey normal distribution, and the distribution

F(n-1,n-1) calculation formula of F-test filtering method is as follows:

s2 F= x S2y

s2 xy SY2 In the above formula, and are the corresponding variance, and the

calculation formula is as follows:

n - 1 ,1)

in S = 1 $1 iY2

In the above formula, and are the corresponding mean value, and the

calculation formula is as follows:

In n i=1

1 n Y 1yi n =1

Furthermore, the mutual information method evaluates the correlation between

independent variables and dependent variables by capturing the arbitrary relationship

between each feature and dependent variables. The value range of MI is [0,1], where 0

means that the two variables are independent of each other, 1 means that the two

variables are completely related, and the greater the value of (0,1), the more significant

the correlation is.

The calculation formula of mutual information is as follows:

I(X; Y)= IIP(x, y)log '(,

) .x 'P(x)P(y)

x In the formula, the probability of feature appearing in the whole training set is

P(x) P(y) y expressed as indicates the probability of appearing in the whole

training set.

The formula for calculating MSE is as follows:

MSE = y- f(i))2 n =1

;i) "(I)

In the formula, is the true value and is the predicted value, and the smaller

the mean square error, the more accurate the prediction result of the model is.

In this embodiment, the correlation of characteristic indexes in the low-voltage

station area is shown in Figure 2. it can be seen from Figure 2 that the F value and MI

x1 value of the cross-sectional area of the trunk line of characteristic and the characteristic score scores are 1, 1 and 14.19, which are the largest, indicating that they are strongly correlated indexes. secondly, the F value and MI value of the total length of

X5

line are relatively large, and the characteristic score is 7.38, which is relatively stable,

X2 4 at the same time, the values of features and and MI are both 0 and the feature

score is the lowest, which indicates that the total number of low-voltage meters and the

average load rate are weakly correlated with the line loss rate. Therefore, it can be said

that the F-test filtering method and the mutual information method are consistent, so that

the related features can be filtered.

The final feature index system of low-pressure station area is shown in Figure 3, it

can be seen from Figure 3 that when the number of features is 4, the mean square error

value is the smallest, which means that the prediction performance is the best at this time,

so the optimal number of features is 4.

Classifying the low-voltage station area specifically includes the following steps:

Letting the set of sample points in the station area be

L ={(XI,yj),(X2, y2),.---,(X ,yj)) X, =(x,Xa,1''Xi.n) in which each variable is

k Inputting the data set of low-voltage station area, select the number of categories

k {C1,C 2 ,.-,Ck} 1<k:n to be clustered, and select clustering centers, ,

Calculating separately the standardized Euclidean distance between each sample

point and each cluster center, and finding the nearest cluster center for each sample point,

the calculation formula is:

dis (X,, Cj| ,-,3

xi C. In this formula, represents the i-th sample point, represents the j-th cluster

1 j ! k xi, 1t m center , represents the t-th feature of the i-th sample point, ,and

cit represents the t-th feature of thej-th cluster center.

Compare the distance from each sample point to each cluster center in turn, and

k assign the sample points to the cluster of the nearest cluster center, so as to obtain

{SlS 2 ,---,Sk} cluster

In this embodiment, K-Means algorithm is used to calculate the clustering center for

the characteristic indexes in the characteristic index system, and the clustering results are

shown in the following Table 1:

Table 1

Characteristic index Distibuiom TYPe et (m)Totalimeiagth (kn) t--f--e (KVA) "

A 61.29-1 37T26.21 19S. 29 0.91 7 71.283 8 153; 192.09 092

To sum up, each low-pressure station area has its practical significance, which

shows that the clustering effect is quite good. Line aging, line diameter, transformer

upgrade, etc. will lead to a large fluctuation of line loss rate, so it is normal for clustering

results to change accordingly.

The GS-XGBoost line loss rate prediction model is constructed to predict the line

loss rates of various low-voltage stations, and grid search is combined to improve the

performance of the model and improve the prediction accuracy, which specifically

includes the following steps:

As shown in Figure 4, the GS-XGBoost prediction model is constructed by

combining grid search, and X is taken as the input station feature vector to calculate the

final predicted value of the line loss rate of low voltage station, the calculation formula is

as follows:

F,, =,80 +p1fl(X,) )+162f2 ( Xi) +---+p,,f, (X, )

F In which, m is the final predicted value, Pm is the shrinkage coefficient of the mth

tree, and fm(x) is the predicted value corresponding to the m-th tree.

In order to prevent over-fitting, regular terms are added and the complexity function

of decision tree is introduced:

1T Q(fm) yT±+ Al 2 Wi

In the formula, 7 is the coefficient of leaf nodes, is the coefficient of L2

square modulus, T is the total number of leaf nodes of the tree, and wi is the output

score value of thej-th leaf node of the tree;

The formula for constructing objective function is as follows:

Obj("_=) ((Y )pf (x+))+Q(f )+C

In which, Yi means to keep the model prediction of previous round M-1, and

C is a constant term.

The formula for optimizing the objective function by Taylor second-order expansion

is as follows:

1 22 ] Obj(" 'Ta'or + g (xf.,(xi)+ -h f (xJ] +Q(f )+C

In the formula, 9 and hi are the first and second derivatives of the loss function of

the m-th round, respectively;

The simplified objective function formula is as follows:

Obj(") G,,w. + I(1H+ A)8 wWp + yIT b=1 . 2

In which Gi is the sum of the first derivative of the loss function in the m-th round,

and Hi is the sum of its second derivative, and the formula is as follows:

G, = Yg, H, = Yh, sI is

When building a decision tree, the following steps are executed cyclically:

(1) Add one tree in each cycle;

h (yy 9 1 ,ya2, B ,ni) + ())

(2) a) and are calculated at the beginning of each

cycle;

1T G Obj("=-! +yT (3) Greedy algorithm is used to grow trees f" W, 2 Hi ;

(4) Add f (X)to the model and update the GS-XGBoost line loss prediction

model: "

Pay attention to o " as a contraction coefficient, that is, stepping, which means that

we are not doing a complete optimization in each step, leaving room for future

circulation, making the model better to learn and effectively preventing over-fitting.

Input the key feature data in the feature index system into the GS-XGBoost line loss

rate prediction model, and output the line loss rate prediction results.

The analysis and evaluation of the line loss rate prediction result includes three

evaluation indexes, namely MSE, MAE and RMSE, to compare the prediction results.

Mean square error is the average value of the cost function of minimizing sum of

squares of errors (SSE) in linear regression model fitting. The better the prediction effect,

the closer the value is to 0; otherwise, the farther the value is from 0, and its calculation

formula is as follows:

MSE= I yO - y() n =1

In the formula, is the true value and is the predicted value. The smaller

the mean square error, the more accurate the prediction result of the model is.

The average absolute error is calculated as follows:

MAE = y( - POi n

The root mean square error is calculated as follows:

RMSE=

n (i)

) n y0 yPO In which is the number of samples, is the actual value and is the

predicted value.

The loss function of the model takes the Mean Squared Error function expression as

follows:

L(y, 0 -)y2 2

An extreme gradient lifting decision tree for line loss rate prediction is established.

For convenience of explanation, a regression tree is established, and the maximum depth

of the tree is 2.

Input the data from the characteristic index system into the GS-XGBoost line loss

rate prediction model, and get the line loss rate prediction curve and other model

comparison curves, as shown in Figure 5- Figure 7. We compare the prediction accuracy

with XGBoost and RF models. We can see that the GS-XGBoost model performs well in

the fitting degree between the predicted value and the actual value. Compared with the

random forest (RF) model, its prediction performance is obviously higher than that of

XGBoost model. Therefore, the prediction accuracy of GS-XGBoost model is higher than

XGBoost and RF.

Analyze and evaluate the prediction results, as shown in Table 2:

Table 2

Model MSE(%) RMSE(%) MAE(%)

RF 0.1278 3,5748 2.8247

XGBoost 0.1747 4.1793 3.0460

GS-XGBoost 0-1129 3,3597 2.5754

The above table shows that GS-XGBoost model has excellent performance in MSE,

RMSE and MAE.

The above comparison shows that GS-XGBoost model has higher prediction

performance than XGBoost model of the same type and Random Forest (RF) model

which performs well in line loss rate prediction. Thereby verifying the possibility of the

GS-XGBoost model in line loss rate prediction and its excellent prediction performance.

In this embodiment, the ensemble learning algorithm is applied to the prediction of

line loss rate in low-voltage station area, and the prediction accuracy is significantly

improved; The process design of feature index construction and feature selection is novel

and reasonable; It provides a scientific and reasonable basis for specifying loss reduction

planning, thus improving the line loss management level in low-voltage stations, and has

strong practicability and generalization ability.

The above embodiments only describe the preferred mode of the invention, but do

not limit the scope of the invention, on the premise of not departing from the design spirit

of the invention, various modifications and improvements made by ordinary technicians in the field to the technical scheme of the invention shall fall within the protection scope determined by the claims of the invention.

Claims

THE CLAIMS DEFINING THE INVENTION ARE AS FOLLOWS:

1. A method for predicting line loss rate in low-voltage station area based on

extreme gradient lifting decision tree,characterized in that,comprising:

area characteristic index system;

and analyzing and evaluating the prediction results.

2. The line loss rate prediction method of low-voltage station area based on extreme

gradient lifting decision tree according to claim 1, wherein,

Collecting the original data of the low-voltage station area includes obtaining the

cross-sectional area of the main line, the total number of low-voltage meters, the power

supply quantity, the average load rate, the total length of the line, the distribution

transformer capacity and the power factor which reflect the station area and load

characteristics.

3. The method for predicting line loss rate in low-voltage station area based on

extreme gradient lifting decision tree according to claim 1, wherein,

The pretreatment process comprises the following steps:

of a low-voltage station area.

4. The method for predicting line loss rate in low-voltage station area based on

extreme gradient lifting decision tree according to claim 2, wherein,

The key features of feature engineering screening include:

The feature index weights of the original data of low-voltage station area are

obtained by combining MSE, the feature set is input into the first GS-XGBoost prediction

model, the mean square error value is calculated, and the feature set with the smallest

mean square error value is compared and selected as the feature index system of low

voltage station area.

5. The method for predicting line loss rate in low-voltage station area based on

extreme gradient lifting decision tree according to claim 4, wherein,

Constructing the low-pressure station characteristic index system also includes

determining the number of key indexes of the low-pressure station characteristic index

system.

6. The method for predicting line loss rate in low-voltage station area based on

extreme gradient lifting decision tree according to claim 1, wherein,

Classifying the low-pressure station area comprises:

complete the classification of the low-pressure station.

7. The method for predicting line loss rate in low-voltage station area based on

extreme gradient lifting decision tree according to claim 1, wherein,

Predicting the line loss rate of the low voltage station area comprises:

And constructing the second GS-XGBoost prediction model based on the first GS

line loss rate prediction result.

8. The method for predicting line loss rate in low-voltage station area based on

extreme gradient lifting decision tree according to claim 1, wherein,

The forecast result is analyzed and evaluated by means of mean square error MSE,

mean absolute error MAE and root mean square error RMSE;

errors and cost functions in linear regression model fitting.