CN111881961A

CN111881961A - Power distribution network fault risk grade prediction method based on data mining

Info

Publication number: CN111881961A
Application number: CN202010690948.XA
Authority: CN
Inventors: 周佳威; 冒烨颖; 杨启明; 王小蕾; 马骏昶; 董晓峰; 顾佳; 方琪; 杭泱
Original assignee: Suzhou Power Supply Co of State Grid Jiangsu Electric Power Co Ltd
Current assignee: Suzhou Power Supply Co of State Grid Jiangsu Electric Power Co Ltd
Priority date: 2020-07-17
Filing date: 2020-07-17
Publication date: 2020-11-03

Abstract

The invention relates to a power distribution network fault risk grade prediction method based on data mining, which comprises the following steps of: step 1: acquiring various types of original data related to the power distribution network, including historical data and prediction data; step 2: processing the acquired data to obtain a data table containing historical data and prediction data of the labeled fault risk level; and step 3: analyzing the characteristic attribute of the historical data in the data table; and 4, step 4: training a fault risk grade prediction model by using the extracted historical data of the characteristic attributes and based on prediction data to obtain a trained fault risk grade prediction model; and 5: and predicting the risk level of the power distribution network fault by using the trained fault risk level prediction model. The method and the device can reliably predict the fault risk level of the power distribution network in a period of time in the future, are beneficial to eliminating hidden dangers in advance, reduce loss, reduce fault rate and improve the reliability of the power grid.

Description

Power distribution network fault risk grade prediction method based on data mining

Technical Field

The invention relates to the technical field of power distribution network fault prediction, in particular to a power distribution network fault risk level prediction method based on data mining.

Background

The power distribution network is used as the last ring of power transmission to users by the power grid, the connection with the users is the closest, the influence on normal power utilization of the users is more direct, and the faults of the power distribution network directly cause the power loss of the users so as to influence normal production and life. In order to avoid risks brought by power loss, the causes of power distribution network faults need to be well understood and known, the inherent rules of the power distribution network faults are mastered, meanwhile, the power distribution network faults can be predicted to a certain extent, hidden dangers are eliminated in advance in a targeted mode, enough first-aid repair materials are configured, and first-aid repair personnel are arranged to be on site to guard, so that losses brought by the power distribution network faults are reduced to the greatest extent, the fault rate of the power distribution network is reduced, and the reliability of the power distribution network is improved.

Disclosure of Invention

The invention aims to provide a method for predicting the fault risk level of a power distribution network more reliably.

In order to achieve the purpose, the invention adopts the technical scheme that:

a power distribution network fault risk grade prediction method based on data mining comprises the following steps:

step 1: data acquisition: acquiring various types of original data related to the power distribution network, wherein the original data comprises historical data and prediction data;

step 2: data processing: processing the acquired data to obtain a data table containing the historical data and the predicted data with the labeled fault risk level;

and step 3: analyzing the characteristic attribute of the data: analyzing the characteristic attributes of the historical data in the data table, and extracting the characteristic attributes of the historical data;

and 4, step 4: training a fault risk grade prediction model: establishing a fault risk grade prediction model, and training the fault risk grade prediction model by using the historical data with the extracted characteristic attributes and based on the prediction data to obtain a trained fault risk grade prediction model;

and 5: and (3) predicting the failure risk level: and predicting the risk level of the power distribution network fault by using the trained fault risk level prediction model.

The power distribution network fault risk level prediction method based on data mining takes a feeder line in the power distribution network as an analysis object.

In the step 1, the type of the original data includes three types of feeder line self attribute, feeder line operation attribute and feeder line surrounding environment attribute.

The step 2 comprises the following substeps:

substep 2-1: merging the data tables: integrating the original data by taking a feeder line as a related vehicle to obtain an original data table;

substep 2-2: data cleaning: cleaning the abnormal items in the original data table to obtain a processed data table;

substeps 2-3: and (3) fault risk grade marking: and respectively judging and marking the risk level of each fault according to the historical data in the processed data table.

In the substep 2-3, the number of times of feeder failures and the time of failure power failure in the unit time are used as the judgment criteria of the failure risk level to judge and mark the risk level of each failure.

In the step 3, a RelieF algorithm is adopted to analyze the characteristic attributes of the historical data in the data table, and the characteristic attributes of the historical data are extracted.

The step 3 comprises the following substeps:

substep 3-1: randomly extracting a fault record d from the data table, and searching a most adjacent fault record H with the same fault risk level as the fault record d and a most adjacent fault record M with the different fault risk level from the fault record d;

substep 3-2: for each characteristic attribute, determining whether the characteristic attribute is beneficial to the fault risk level of the fault record d according to the magnitude relation of the characteristic difference between the fault record d and the fault record H and the fault record M respectively, so as to adjust the weight of the characteristic attribute to the fault risk level of the fault record d; if the characteristic difference between the fault record d and the fault record H is smaller than the characteristic difference between the fault record d and the fault record M, the characteristic attribute is beneficial to the fault risk level of the fault record d, the weight of the characteristic attribute to the fault risk level of the fault record d is increased, and otherwise, the weight of the characteristic attribute to the fault risk level of the fault record d is reduced;

repeating the substep 3-1 and the substep 3-2 for T times, and iterating to obtain the weight of each characteristic attribute corresponding to each fault risk level.

In the step 4, an AdaBoost algorithm is adopted to train the fault risk level prediction model.

The step 4 comprises the following substeps:

substep 4-1: setting a series of same or different basic prediction algorithms, setting the initial weights of the basic prediction algorithms to be the same, and setting the initial weights of the historical data extracted with the characteristic attributes to be the same;

substep 4-2: selecting a first basic prediction algorithm as the currently selected basic prediction algorithm, and taking the historical data with the same initial weight of the extracted characteristic attributes as the current historical data;

substep 4-3: inputting current historical data into the currently selected basic prediction algorithm to obtain prediction data based on the currently selected basic prediction algorithm, and updating the weight of the currently calculated basic prediction algorithm according to the prediction data based on the currently selected basic prediction algorithm;

step 4-4: updating the weight of each piece of historical data according to whether the currently selected prediction data of the basic prediction algorithm is accurate or not;

and 4-5: judging whether a series of basic prediction algorithms are all updated in weight, if so, executing the step 4-6, otherwise, selecting the next basic prediction algorithm as the currently selected basic prediction algorithm, taking each piece of history data with the updated weight as the current history data, and returning to the step 4-3;

and 4-6: and constructing a trained fault risk level prediction model based on each basic prediction algorithm and the updated weight thereof, wherein the trained fault risk level prediction model is a weighted voting result of each basic prediction algorithm.

In the sub-step 4-3, the method of updating the weight of the currently calculated basic prediction algorithm based on the prediction data based on the currently selected basic prediction algorithm is: calculating the prediction error of the currently selected basic prediction algorithm

Wherein, theta_t(i) A variable representing the correctness of the prediction of the ith data in the current historical data, theta, for the currently selected basic prediction algorithm_t(i) 1 denotes prediction error, θ_t(i) When the classification is correct, m is the number of data in the current historical data, D_t(i) Weights before updating for the currently selected base prediction algorithm; recalculating updated weights for the currently selected base prediction algorithm

In sub-step 4-4, the method for updating the weight of each piece of history data includes: calculate each of saidUpdated weights of historical data

Wherein L is_t(i) For prediction data based on the currently selected basic prediction algorithm, y_iFor corresponding predicted data in said historical data, Z_t+1Is a normalization factor.

Due to the application of the technical scheme, compared with the prior art, the invention has the following advantages: the method can reliably predict the fault risk level of the power distribution network in a period of time in the future, thereby helping a power grid operation maintenance unit to pertinently eliminate hidden dangers in advance, allocating enough first-aid repair materials, arranging the first-aid repair personnel to stay on spot to guard, reducing the loss caused by the power grid fault to the maximum extent, reducing the fault rate of the power distribution network and improving the reliability of the power distribution network.

Drawings

Fig. 1 is a schematic flow chart of a power distribution network fault risk level prediction method based on data mining.

Fig. 2 is a schematic flow diagram of a RelieF algorithm in the data mining-based power distribution network fault risk level prediction method of the present invention.

Fig. 3 is a schematic flow diagram of an AdaBoost algorithm in the data mining-based power distribution network fault risk level prediction method of the present invention.

Detailed Description

The invention will be further described with reference to examples of embodiments shown in the drawings to which the invention is attached.

The first embodiment is as follows: as shown in fig. 1, a power distribution network fault risk level prediction method based on data mining includes the following steps:

step 1: and (6) acquiring data.

The method comprises the steps of obtaining various types of related original data of the power distribution network, wherein data sources comprise a power distribution production management system, a power distribution automation system, a power distribution network geographic information system, an intelligent public distribution transformer monitoring system, a power distribution network intelligent auxiliary management and control system, a marketing business management system, a power utilization information acquisition system, an enterprise resource management system, an external meteorological system and the like in a power distribution network company.

The power distribution network fault risk grade prediction method based on data mining takes a feeder line in a power distribution network as an analysis object, namely a basic research unit. The types of the acquired original data include feeder line self attribute, feeder line operation attribute and feeder line surrounding environment attribute, which are specifically shown in the following table:

the intrinsic attribute of the feeder line is defined as the attribute of the feeder line, the attribute of the feeder line changing along with the operation of time is defined as the operation attribute of the feeder line, and the attribute of the physical environment where the feeder line is located is defined as the attribute of the surrounding environment, for example, the weather attribute belongs to the attribute of the surrounding environment.

The acquired original data comprises historical data and prediction data according to time, the historical data can be used for training a prediction model subsequently, and the prediction data is used for predicting the power distribution network fault.

The self attribute of the feeder line can be regarded as inherent unchanged within a period of time, and no prediction data exists; the prediction data of the feeder line operation attribute is operation prediction data obtained by using a conventional power load prediction method, such as a regression analysis method or an elastic coefficient method; the ambient environment attribute uses weather forecast information as prediction data.

Step 2: and (6) data processing.

And processing the acquired data to obtain a data table containing historical data and prediction data marking the fault risk level.

Step 2 comprises the following substeps:

substep 2-1: and merging the data tables.

And integrating the original data of different data sources by taking the feeder line as a related vehicle to obtain an original data table.

Substep 2-2: and (6) data cleaning.

And cleaning the abnormal entries in the original data table to obtain a processed data table. Specifically, the method comprises the steps of removing repeated values, manually supplementing empty values and manually correcting abnormal values.

Substeps 2-3: and (4) marking the fault risk level.

And respectively judging and marking the risk level of each fault according to the historical data in the processed data table, so as to obtain the historical data containing the marked fault risk level, and further obtain the data table containing the historical data and the prediction data marked with the fault risk level. In the substep, the number of feeder failures and the failure blackout time in unit time are used as the judgment criteria of the failure risk level to judge and mark the risk level of each failure, as shown in the following table:

risk rating	Number of monthly failures	Accumulated power failure time of month
			1	＝1	[0,60]
2	≥2	[60,240]
			3	≥4	≥240

And step 3: and analyzing the characteristic attribute of the data.

And analyzing the characteristic attributes of the historical data in the data table by adopting a RelieF algorithm, and extracting the characteristic attributes of the historical data.

As shown in fig. 2, step 3 comprises the following substeps:

substep 3-1: randomly extracting a fault record D from a training set D formed by a data table, searching a most adjacent fault record H with the same (same) fault risk level as the fault record D and a most adjacent fault record M with the different (different) fault risk level from the fault record D from the data table, recording the fault record H as nearhit, and recording the fault record M as nearmiss.

Substep 3-2: and for each characteristic attribute, determining whether the characteristic attribute is beneficial to the fault risk level of the fault record d according to the size relationship of the characteristic difference between the fault record d and the fault records H and M, so as to adjust the weight of the characteristic attribute to the fault risk level of the fault record d.

The specific rule is as follows: if the characteristic difference between the fault record d and the fault record H is smaller than the characteristic difference between the fault record d and the fault record M on a certain characteristic attribute, the characteristic attribute is beneficial to the fault risk level of the fault record d, and the weight of the characteristic attribute to the fault risk level of the fault record d should be increased; conversely, it is not beneficial to reduce the weight of the feature attribute to the failure risk level of the failure record d.

The weights of the final characteristic attributes can be obtained by iterating the process for T times, and the larger the weight is, the stronger the correlation between the characteristic attributes and the fault risk level is. And K samples are extracted from the same-class records and the different-class records, so that the RelieF algorithm for processing the multi-class problems is realized. Based on the above, repeating the substep 3-1 and the substep 3-2 for T times, the weights of the characteristic attributes corresponding to the fault risk levels can be obtained through iteration, and further the weight average value of the characteristic attributes (characteristic vectors) can be calculated to output the optimal fault feature set.

In the above scheme, the characteristic difference can be calculated by the following formula:

wherein diff (a, X, Y) is a difference value (i.e. a feature difference) between the vector X and the vector Y on the feature attribute a, the vector X and the vector Y respectively correspond to two fault records requiring feature difference calculation, such as a fault record d and a fault record H or a fault record d and a fault record M, max (a) is a maximum value of the feature attribute a, and min (a) is a minimum value of the feature attribute a.

The corresponding weight W can be updated and calculated for the characteristic attribute a in the training set D_a', the calculation method is as follows:

wherein, W_aIs the weight before updating the characteristic attribute a, K is the number of records nearest to the fault record d, T is the number of iterations, H_kRepresenting the closest homogeneous record set, M, from the fault record d_kRepresenting the heterogeneous record set nearest to the fault record d, wherein N (M) is the ratio of the number of M records in the total number of records, class (d) represents the record class to which the record d belongs, and N (class (d)) is the ratio of the number of samples of the record class to which the record d belongs in the total number of samples.

And 4, step 4: and (5) training a fault risk grade prediction model.

And establishing a fault risk level prediction model, and training the fault risk level prediction model by using the extracted historical data of the characteristic attributes and based on prediction data by adopting an AdaBoost algorithm to obtain the trained fault risk level prediction model.

As shown in fig. 3, step 4 comprises the following substeps:

substep 4-1: setting a series of same or different basic prediction algorithms, setting the initial weights of the basic prediction algorithms to be the same, and setting the initial weights of the historical data extracted with the characteristic attributes to be the same. The established fault risk level prediction model is obtained based on a series of basic prediction algorithms and their corresponding weights. In this example, the Cart decision Tree prediction algorithm was chosen.

Substep 4-2: and selecting the first basic prediction algorithm as the currently selected basic prediction algorithm, and taking the historical data with the same initial weight after the characteristic attributes are extracted as the current historical data.

Substep 4-3: inputting the current historical data into the currently selected basic prediction algorithm in the classifier to obtain prediction data based on the currently selected basic prediction algorithm, and updating the weight of the currently calculated basic prediction algorithm according to the prediction data based on the currently selected basic prediction algorithm.

In this substep, the method of updating the weight of the currently calculated basic prediction algorithm based on the prediction data based on the currently selected basic prediction algorithm is:

firstly, the prediction error theta of the currently selected basic prediction algorithm t is calculated_tThe calculation method comprises the following steps:

wherein, theta_t(i) A variable representing the correctness of the prediction of the ith data in the current historical data, θ, for the currently selected basic prediction algorithm_t(i) 1 denotes prediction error, θ_t(i) When the classification is correct, m is the number of data in the current historical data, D_t(i) The pre-update weights for the currently selected base prediction algorithm.

Then calculating the updated weight alpha of the currently selected basic prediction algorithm t_tThe calculation method comprises the following steps:

step 4-4: and updating the weight of each piece of historical data according to whether the prediction data of the currently selected basic prediction algorithm is accurate or not.

In sub-step 4-4, the method for updating the weight of each piece of historical data comprises: calculating updated weight D of each piece of historical data_t+1The calculation method comprises the following steps:

wherein L is_t(i) For prediction data based on the currently selected underlying prediction algorithm, y_iFor corresponding predictive data in the historical data, Z_t+1To normalize the factor so that

After each basic prediction algorithm, the mispredicted recording weight will be increased and the correctly predicted recording weight will be decreased.

And 4-5: judging whether a series of basic prediction algorithms have updated weights, if so, executing the step 4-6, otherwise, selecting the next basic prediction algorithm as the currently selected basic prediction algorithm, taking each piece of historical data of which the weights are updated at the time as the current historical data, returning to the step 4-3, namely, taking the historical data of which the weights are updated at the time as the input of the next basic prediction algorithm, and circularly executing the steps.

And 5: and predicting the failure risk level.

And predicting the risk level of the power distribution network fault by using the trained fault risk level prediction model to obtain the fault risk level of the power distribution network in a period of time in the future. The prediction result of the fault risk level can help a power grid operation maintenance unit to pertinently eliminate hidden dangers in advance, configure enough first-aid repair materials, arrange the first-aid repair personnel to perform stationing and protection, reduce loss caused by power grid faults to the maximum extent, reduce the fault rate of a power distribution network and improve the reliability of the power grid.

The above embodiments are merely illustrative of the technical ideas and features of the present invention, and the purpose thereof is to enable those skilled in the art to understand the contents of the present invention and implement the present invention, and not to limit the protection scope of the present invention. All equivalent changes and modifications made according to the spirit of the present invention should be covered within the protection scope of the present invention.

Claims

1. A power distribution network fault risk grade prediction method based on data mining is characterized by comprising the following steps: the power distribution network fault risk level prediction method based on data mining comprises the following steps:

2. The data mining-based power distribution network fault risk level prediction method of claim 1, characterized in that: the power distribution network fault risk level prediction method based on data mining takes a feeder line in the power distribution network as an analysis object.

3. The data mining-based power distribution network fault risk level prediction method of claim 2, characterized in that: in the step 1, the type of the original data includes three types of feeder line self attribute, feeder line operation attribute and feeder line surrounding environment attribute.

4. The data mining-based power distribution network fault risk level prediction method of claim 2, characterized in that: the step 2 comprises the following substeps:

5. The data mining-based power distribution network fault risk level prediction method of claim 4, wherein: in the substep 2-3, the number of times of feeder failures and the time of failure power failure in the unit time are used as the judgment criteria of the failure risk level to judge and mark the risk level of each failure.

6. The data mining-based power distribution network fault risk level prediction method of claim 2, characterized in that: in the step 3, a RelieF algorithm is adopted to analyze the characteristic attributes of the historical data in the data table, and the characteristic attributes of the historical data are extracted.

7. The data mining-based power distribution network fault risk level prediction method of claim 6, wherein: the step 3 comprises the following substeps:

8. The data mining-based power distribution network fault risk level prediction method of claim 2, characterized in that: in the step 4, an AdaBoost algorithm is adopted to train the fault risk level prediction model.

9. The data mining-based power distribution network fault risk level prediction method of claim 8, wherein: the step 4 comprises the following substeps:

10. The data mining-based power distribution network fault risk level prediction method of claim 9, wherein: in the sub-step 4-3, the method of updating the weight of the currently calculated basic prediction algorithm based on the prediction data based on the currently selected basic prediction algorithm is: calculating the prediction error of the currently selected basic prediction algorithm

In sub-step 4-4, the method for updating the weight of each piece of history data includes: calculating the updated weight of each piece of historical data