CN114528949A

CN114528949A - Parameter optimization-based electric energy metering abnormal data identification and compensation method

Info

Publication number: CN114528949A
Application number: CN202210294793.7A
Authority: CN
Inventors: 李伟东; 宋晶晶
Original assignee: Harbin University of Science and Technology
Current assignee: Harbin University of Science and Technology
Priority date: 2022-03-24
Filing date: 2022-03-24
Publication date: 2022-05-24

Abstract

The invention provides a parameter optimization-based identification and compensation method for electric energy metering abnormal data. Secondly, anomaly detection is carried out on the power data set by adopting an algorithm based on an isolated forest (iForest), a genetic algorithm (genetic algorithm) is introduced to construct a new anomaly detection model GA-iForest, load data is judged and an anomaly type is determined, and finally compensation and correction are carried out on the data through a trained LSTM deep neural network, so that the purpose of data compensation of power consumption metering users is achieved, and the beneficial effect of improving the power load metering precision can be obtained.

Description

Parameter optimization-based electric energy metering abnormal data identification and compensation method

Technical Field

The invention relates to the technical field of power load prediction, in particular to an electric energy metering abnormal data identification and compensation method based on parameter optimization.

Technical Field

With the continuous development of the smart grid technology, more and more ways for acquiring power data are provided, and more power operation data are also provided. Therefore, it becomes very important to acquire real and reliable data from the collected massive power operation data. Due to the use of a large number of matching systems, unstructured data can be generated, and a large number of abnormal data can be caused by various equipment faults, power grid fluctuation, communication faults and the like, so that the requirements on the accuracy, the instantaneity and the dynamics of the electric energy metering data are greatly influenced. By analyzing the abnormal data characteristics, important information stored behind the user is fully mined, such as fault positioning detection, accurate load prediction, demand side response and the like, the reason for generating the abnormality is analyzed, reference can be provided for preventing the abnormality, and the occurrence of the abnormality is reduced. Therefore, it is of great significance to analyze, identify and correct abnormal data.

Disclosure of Invention

Aiming at the existing problems, the invention provides a parameter optimization-based identification and compensation method for abnormal data of electric energy metering, which improves the accuracy of power load prediction in compensation through cluster analysis, feature extraction and abnormal data identification.

The purpose of the invention can be realized by the following technical scheme that the power load prediction method based on the optimized selection of the typical daily load curve comprises the following steps:

acquiring load original data;

preprocessing load original data to obtain a training data set;

clustering the user load data of each category by adopting an improved density clustering algorithm to obtain clustering results and extract user characteristics

The improved particle swarm optimization density clustering algorithm is utilized, and the specific process is as follows:

inputting: the method comprises the steps of a sample data set, the total number k of samples, the number M of clusters, the example population size M, the maximum iteration times Maxlter, acceleration constants c1 and c2, inertia weight omega and MinPts values.

And (3) outputting: dividing m clusters of the data set, and obtaining the optimal fitness value and the initial cluster center and the Eps value represented by the corresponding particles.

Begin

Initialization: setting the position Z of the particles_iIs [0.001, k ] of]K is the average dissimilarity value of the data set;

velocity V of the particles_iHas a search space of [ -V ]_max，+V_max]Setting; v_max＝k

Initializing a population P (0);

fort 1to maximum number of iterations do

Calculating the fitness of the DBSCAN clustering result of each example individual in the particle group P (t);

if particle fitness value<P_idFitness value of

Updating P_id；

end

if particle fitness value<P_gdFitness value of

Updating P_gd

end

Respectively updating the particle speed and the particle position;

end for

outputting the class cluster division corresponding to the minimum fitness value found in the whole search space;

End

during the algorithm initialization process, the average dissimilarity D of the data set is defined as follows:

n is the number of all samples in the data set and s (i, j) is the degree of dissimilarity between sample i and sample j in the data set. The average dissimilarity value of the samples can approximately describe the data characteristics of the entire data set, which the algorithm uses as an upper limit value of the particle swarm optimization search Eps range space.

And extracting features, inputting the extracted features into an anomaly detection model GA-iForest, judging user load data through the trained model, and determining the type of the anomaly.

Wherein, the construction process of the GA-iForest model comprises the following steps:

the optimization of iForest is considered from the basic steps of GA, the most important of which include initialization, crossover, mutation and selection. The genetic algorithm optimization isolation forest model construction process is described as follows:

inputting ListN, Time, mp and cp;

ListN is the initial isolated forest population;

time is the maximum iteration number optimized by the genetic algorithm;

mp is the mutation probability;

cp is the crossover probability;

outputting the List of the optimal isolation forest individuals

Step 1, building iTree and Forest by continuously dividing a data set space;

step 2, population initialization, namely, regarding a single forest as an individual, wherein the individual comprises a plurality of iTrees, namely the 'gene coding' of each individual, each forest is a linked List, wherein List [ i ] represents a corresponding iTree, and N individuals are constructed to form an initialization population ListN (N is 1,2,3 and 4), namely, the initial populations are List1, List2, List3 and List 4;

step 3, For i is 1to Time:

(a) calculating the fitness value of each individual, and calculating the fitness value of each List in the initial population;

(b) crossing, wherein N linked lists in the initial group are crossed with each other according to the probability cp, and front and rear half sections of each individual linked List are interchanged, namely a List1[1.. i ] is interchanged with a List2[ i +1.. N ], and similarly, a List3[ i ] and a List4[ i ]; after crossing, individual N in the initial population changes from N-4 to N-8 (List1, List2, List 3.., List 8);

(c) performing mutation, namely performing mutation on a single individual List generated by crossing according to probability mp, wherein List [ i ] is a binary iTree tree in a corresponding forest, randomly selecting a tree List [ i ] to perform mutation operation, and reconstructing the selected iTree by the mutation operation;

(d) selecting, namely selecting a forest List according to a Fitness value Fitness, wherein a Fitness function consists of precision and difference, and selecting a new solution generated after cross mutation and original individuals in an initial population by referring to a random sorting rule in a selection mode to select N (N is 4) individuals of a next generation;

(e) outputting the result if a termination condition is met, otherwise returning to the step (a) to perform the next iteration, wherein the termination condition is that the iteration is stopped or the maximum iteration number is reached when the fitness values of the individuals in a plurality of continuous generations (Times) in the GA optimization process are all higher than that of the traditional IsolationsForest model;

and 4, outputting the optimal individual List in the current group, namely the forest iForest with a better fitness value.

4. The method as claimed in claim 1, wherein the step of inputting the training set into an LSTM deep neural network for training to obtain a power load prediction model comprises:

step 1, establishing an LSTM model. Parameters to be optimized, the number of neurons L1, the learning rate epsilon and the number of training iterations k are determined, and the respective optimization ranges are determined.

Name of parameter	Lower limit of parameter	Upper limit of parameter	Minimum velocity	Maximum speed
					Number of neurons L	1	400	-2	2
Learning rate epsilon	0.001	0.01	-0.001	0.001
					Number of training times k	200	1000	-2	2

The LSTM time series prediction model mainly comprises the following four parts:

1) the time series data preprocessing is to be noted that, because the model belongs to a learning training model, the data set needs to be normalized and the like, so that the divergence of the training process is prevented, and the convergence of the model in the training process is ensured.

2) The input dimensions, number of network layers, and output dimensions of the LSTM model are defined.

3) Setting a trained optimizer, model initial parameters and a loss function, and starting training.

4) And obtaining a trained model, and predicting the load data by using the trained model.

And 2, initializing PSO parameters. Including initial velocity and position of the particle, learning weight, training times and scale, etc.

And 3, determining a fitness function of the particles. The MAPE value of the prediction model is used as a fitness function of the particles to find the optimal model parameters.

And 4, comparing the fitness value of the particles. And searching the individual optimal position and the global optimal position, and updating the optimal fitness value.

And 5, judging whether the maximum iteration times is reached. And if the maximum iteration times are reached, transmitting the obtained optimal parameters to the LSTM model, and carrying out training and prediction. And (5) if the requirement is not met, returning to the step (5). Taking 3 hyper-parameters of the LSTM as parameters to be optimized of the PSO, and setting the fitness function as f (x) Min (MAPE) (2) where the Mean Absolute Percentage Error (MAPE) refers to the average of the percentage errors between the real and predicted values of the power consumption:

of the above formula, Y_actualRepresenting the actual value of the load, Y_preThe predicted value of the load is represented, and n represents the number of predicted load points. Meanwhile, the population size is set to be 20, the maximum iteration number is set to be 50, the learning factor c1 is 2, and the learning factor c2 is 2. The setting range of the LSTM hyperparameter is shown in table 1.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.

Fig. 1 is a schematic flowchart of an embodiment of a method for identifying and compensating abnormal data of electric energy metering based on parameter optimization according to the present application.

Fig. 2 is an LSTM internal network structure.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings and specific embodiments. The present embodiment is implemented on the premise of the technical solution of the present invention, and a detailed implementation manner and a specific operation process are given, but the scope of the present invention is not limited to the following embodiments.

Embodiment example fig. 1 shows a method for identifying and compensating abnormal data of electric energy metering based on parameter optimization, which includes the following steps:

step 1, acquiring historical power load data to form a load data set;

step 2, preprocessing the load data set;

step 3, training by adopting an improved particle swarm optimization density clustering algorithm and extracting statistical characteristics of the electric energy metering load data of the user to obtain user load data of each category and user data characteristics of each category;

step 4, carrying out anomaly detection on the power data set by adopting an algorithm based on an isolated forest (iForest), introducing a genetic algorithm (genetic algorithm) to construct a new anomaly detection model GA-iForest, judging load data and determining an anomaly type;

and 5, compensating and correcting the data through the trained LSTM deep neural network. The main process comprises the following steps:

1. data pre-processing

Adopting a maximum and minimum normalization method, wherein the formula is as follows:

wherein y is a normalization result;

y_maxand y_minRespectively the minimum value and the maximum value of the normalized variable,

x is a variable to be normalized;

x_maxand x_minThe minimum value and the maximum value of the variable to be normalized are respectively.

2. Clustering algorithm

And clustering the obtained training data set through an optimized density clustering algorithm, and obtaining user load data of each category and user data characteristics of each category according to a clustering result.

The method comprises the following specific steps of utilizing an improved particle swarm optimization density clustering algorithm:

Begin

Initializing a population P (0);

fort 1to maximum number of iterations do

if particle fitness value<P_idFitness value of

Updating P_id；

end

if particle fitness value<P_gdFitness value of

Updating P_gd

end

Respectively updating the particle speed and the particle position;

end for

End

The adopted DBSCAN clustering algorithm and fitness calculation formula are as follows:

step 1, initializing an Eps value of a DBSCAN algorithm according to the particle coding value, wherein a MinPts value is fixed to be 5

Step 2, selecting any core object P which does not belong to any cluster in the data set, and creating a new cluster;

step 3, according to the core objects in the cluster, circularly collecting the core objects with the density reaching to add into the cluster until no new core objects are added;

step 4, if no core object which does not belong to any cluster exists, turning to step 5, otherwise, returning to step 2 to continue execution;

step 5, classifying the boundary objects connected with the core object density into the corresponding core object belonging clusters;

and 6, calculating the particle fitness value according to the DBSCAN clustering result and the following formula.

Wherein n is the number of the clusters generated by the clustering result of the DBSCAN algorithm, and k is the input expected clustering number. According to the cluster expansion characteristics of the DBSCAN density clustering algorithm, the fitness function evaluates the clustering result by observing the coincidence degree of the number of the clustering result and the input expected clustering number, and when the number of the clusters in the clustering result is the expected clustering number, the fitness function value is 0.

Construction of GA-iForest model

inputting ListN, Time, mp and cp;

ListN is the initial isolated forest population;

time is the maximum iteration number optimized by the genetic algorithm;

mp is the mutation probability;

cp is the crossover probability;

outputting the List of the optimal isolation forest individuals

Step 1, building iTree and Forest by continuously dividing a data set space;

step 3.For i ═ 1to T:

(e) outputting the result if a termination condition is met, otherwise returning to the step < a) to perform the next iteration, wherein the termination condition is that the iteration is stopped or the maximum iteration number is reached when the fitness values of the individuals in a plurality of continuous generations (Times) in the GA optimization process are all higher than that of the traditional IsolationsForest model;

4. Training power load prediction model

Referring to fig. 2, the unit cells of the long-short term memory network mainly include an input gate (input gate), an output gate (output gate), a forgetting gate (forget gate), and a Cell State (Cell State). Among them, the three gate structures make LSTM have selective memory function, which can control the memory process of unit cell A. In particular, a portion of irrelevant secondary information is selectively forgotten by the gate control and important information is preserved, in this way expanding the memory range of the network. The output gate will produce an output h and state control information C at time t. H is the same as the output of the recurrent neural network and represents the prediction result of the model; and C is the cell state and is used to control the opening and closing of the inner door of the cell.

1) the time series data preprocessing is noteworthy in that the model belongs to a learning training model, and a data set needs to be normalized and the like, so that divergence of a training process is prevented, and convergence of the model in the training process is guaranteed.

Parameter name	Lower limit of parameter	Upper limit of parameter	Minimum velocity	Maximum speed
					Number of neurons L	1	400	-2	2
Learning rate epsilon	0.001	0.01	-0.001	0.001
					Number of training times k	200	1000	-2	2

And 5, judging whether the maximum iteration times is reached. And if the maximum iteration times are reached, transmitting the obtained optimal parameters to the LSTM model, and carrying out training and prediction. And (5) if the requirement is not met, returning to the step (5).

5. Wherein the average absolute percentage error (MAPE) is an average of percentage errors between the actual and predicted power usage values:

of the above formula, Y_actualRepresenting the actual value of the load, Y_preThe predicted value of the load is represented, and n represents the number of predicted load points. Meanwhile, the population size is set to be 20, the maximum iteration number is set to be 50, the learning factor c1 is 2, and the learning factor c2 is 2. A

In summary, the invention firstly adopts the improved particle swarm optimization density clustering algorithm to train and extract the statistical characteristics of the electric energy metering load data of the user, so as to obtain the load data of each category of users and the characteristics of the user data of each category. Secondly, anomaly detection is carried out on the power data set by adopting an algorithm based on an isolated forest (iForest), a genetic algorithm (genetic algorithm) is introduced to construct a new anomaly detection model GA-iForest, load data is judged and an anomaly type is determined, and finally compensation and correction are carried out on the data through a trained LSTM deep neural network, so that the purpose of data compensation of power consumption metering users is achieved, and the beneficial effect of improving the power load metering precision can be obtained.

The foregoing description details the preferred embodiments of the invention. It should be understood that numerous modifications and variations could be devised by those skilled in the art in light of the present teachings without departing from the inventive concepts. The scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the present invention, and therefore, the scope of the present invention shall be defined by the scope of the appended claims.

Claims

1. A method for identifying and compensating abnormal data of electric energy metering based on parameter optimization is characterized by comprising the following steps: acquiring power load original data, preprocessing the data, and normalizing all the data to obtain a training data set; clustering the obtained training data set through an optimized density clustering algorithm, and obtaining user load data of each category and user data characteristics of each category according to a clustering result; extracting characteristics, inputting the characteristics into an anomaly detection model GA-iForest, judging user load data through the trained model, and determining an anomaly type; inputting the training set into an LSTM deep neural network for training to obtain a power load prediction model; and performing compensation correction on the data through a trained LSTM deep neural network.

2. The method for identifying and compensating the abnormal data of the electric energy metering based on the parameter optimization as claimed in claim 1, wherein the data normalization method adopts a maximum and minimum normalization method, and the formula is as follows:

wherein y is a normalization result;

x is a variable to be normalized;

Begin

Initializing a population P (0);

fort 1to maximum number of iterations do

if particle fitness value<P_idFitness value of

Updating P_id；

end

if particle fitness value<P_gdFitness value of

Updating P_gd

end

Respectively updating the particle speed and the particle position;

end for

End

and 6, calculating the particle adaptability value according to the DBSCAN clustering result and the following formula.

3. The method for identifying and compensating the abnormal data of the electric energy metering based on the parameter optimization as claimed in claim 1, characterized by comprising the following steps:

inputting ListN, Time, mp and cp;

ListN is the initial isolated forest population;

time is the maximum iteration number optimized by the genetic algorithm;

mp is the mutation probability;

cp is the crossover probability;

outputting the List of the optimal isolation forest individuals

Step 1, building iTree and Forest by continuously dividing a data set space;

step 3, For i is 1to Time:

(e) outputting the result if a termination condition is met, otherwise returning to the step < a) to perform the next iteration, wherein the termination condition is that the iteration is stopped or the maximum iteration number is reached when the fitness values of individuals in a plurality of continuous generations (Times) in the GA optimization process are all higher than that of the traditional IsolationsForest model;

and 4, outputting the optimal individual List in the current group, namely the forest iForest with the better fitness value.

4. The method for identifying and compensating abnormal data of electric energy metering based on parameter optimization according to claim 1, wherein the training set is input into an LSTM deep neural network for training to obtain a power load prediction model, and the method comprises the following steps:

of the above formula, Y_actualRepresenting the actual value of the load, Y_preThe predicted value of the load is represented, and n represents the number of predicted load points. Meanwhile, the population size is set to be 20, the maximum iteration number is set to be 50, the learning factor c1 is 2, and the learning factor c2 is 2.