CN114528949A - Parameter optimization-based electric energy metering abnormal data identification and compensation method - Google Patents

Parameter optimization-based electric energy metering abnormal data identification and compensation method Download PDF

Info

Publication number
CN114528949A
CN114528949A CN202210294793.7A CN202210294793A CN114528949A CN 114528949 A CN114528949 A CN 114528949A CN 202210294793 A CN202210294793 A CN 202210294793A CN 114528949 A CN114528949 A CN 114528949A
Authority
CN
China
Prior art keywords
value
data
model
fitness
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210294793.7A
Other languages
Chinese (zh)
Inventor
李伟东
宋晶晶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin University of Science and Technology
Original Assignee
Harbin University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin University of Science and Technology filed Critical Harbin University of Science and Technology
Priority to CN202210294793.7A priority Critical patent/CN114528949A/en
Publication of CN114528949A publication Critical patent/CN114528949A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models
    • G06N3/126Evolutionary algorithms, e.g. genetic algorithms or genetic programming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Business, Economics & Management (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Biology (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Molecular Biology (AREA)
  • Economics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Marketing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Development Economics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Physiology (AREA)
  • Genetics & Genomics (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • Primary Health Care (AREA)

Abstract

The invention provides a parameter optimization-based identification and compensation method for electric energy metering abnormal data. Secondly, anomaly detection is carried out on the power data set by adopting an algorithm based on an isolated forest (iForest), a genetic algorithm (genetic algorithm) is introduced to construct a new anomaly detection model GA-iForest, load data is judged and an anomaly type is determined, and finally compensation and correction are carried out on the data through a trained LSTM deep neural network, so that the purpose of data compensation of power consumption metering users is achieved, and the beneficial effect of improving the power load metering precision can be obtained.

Description

Parameter optimization-based electric energy metering abnormal data identification and compensation method
Technical Field
The invention relates to the technical field of power load prediction, in particular to an electric energy metering abnormal data identification and compensation method based on parameter optimization.
Technical Field
With the continuous development of the smart grid technology, more and more ways for acquiring power data are provided, and more power operation data are also provided. Therefore, it becomes very important to acquire real and reliable data from the collected massive power operation data. Due to the use of a large number of matching systems, unstructured data can be generated, and a large number of abnormal data can be caused by various equipment faults, power grid fluctuation, communication faults and the like, so that the requirements on the accuracy, the instantaneity and the dynamics of the electric energy metering data are greatly influenced. By analyzing the abnormal data characteristics, important information stored behind the user is fully mined, such as fault positioning detection, accurate load prediction, demand side response and the like, the reason for generating the abnormality is analyzed, reference can be provided for preventing the abnormality, and the occurrence of the abnormality is reduced. Therefore, it is of great significance to analyze, identify and correct abnormal data.
Disclosure of Invention
Aiming at the existing problems, the invention provides a parameter optimization-based identification and compensation method for abnormal data of electric energy metering, which improves the accuracy of power load prediction in compensation through cluster analysis, feature extraction and abnormal data identification.
The purpose of the invention can be realized by the following technical scheme that the power load prediction method based on the optimized selection of the typical daily load curve comprises the following steps:
acquiring load original data;
preprocessing load original data to obtain a training data set;
clustering the user load data of each category by adopting an improved density clustering algorithm to obtain clustering results and extract user characteristics
The improved particle swarm optimization density clustering algorithm is utilized, and the specific process is as follows:
inputting: the method comprises the steps of a sample data set, the total number k of samples, the number M of clusters, the example population size M, the maximum iteration times Maxlter, acceleration constants c1 and c2, inertia weight omega and MinPts values.
And (3) outputting: dividing m clusters of the data set, and obtaining the optimal fitness value and the initial cluster center and the Eps value represented by the corresponding particles.
Begin
Initialization: setting the position Z of the particlesiIs [0.001, k ] of]K is the average dissimilarity value of the data set;
velocity V of the particlesiHas a search space of [ -V ]max,+Vmax]Setting; vmax=k
Initializing a population P (0);
fort 1to maximum number of iterations do
Calculating the fitness of the DBSCAN clustering result of each example individual in the particle group P (t);
if particle fitness value<PidFitness value of
Updating Pid
end
if particle fitness value<PgdFitness value of
Updating Pgd
end
Respectively updating the particle speed and the particle position;
end for
outputting the class cluster division corresponding to the minimum fitness value found in the whole search space;
End
during the algorithm initialization process, the average dissimilarity D of the data set is defined as follows:
Figure BDA0003562869060000021
n is the number of all samples in the data set and s (i, j) is the degree of dissimilarity between sample i and sample j in the data set. The average dissimilarity value of the samples can approximately describe the data characteristics of the entire data set, which the algorithm uses as an upper limit value of the particle swarm optimization search Eps range space.
And extracting features, inputting the extracted features into an anomaly detection model GA-iForest, judging user load data through the trained model, and determining the type of the anomaly.
Wherein, the construction process of the GA-iForest model comprises the following steps:
the optimization of iForest is considered from the basic steps of GA, the most important of which include initialization, crossover, mutation and selection. The genetic algorithm optimization isolation forest model construction process is described as follows:
inputting ListN, Time, mp and cp;
ListN is the initial isolated forest population;
time is the maximum iteration number optimized by the genetic algorithm;
mp is the mutation probability;
cp is the crossover probability;
outputting the List of the optimal isolation forest individuals
Step 1, building iTree and Forest by continuously dividing a data set space;
step 2, population initialization, namely, regarding a single forest as an individual, wherein the individual comprises a plurality of iTrees, namely the 'gene coding' of each individual, each forest is a linked List, wherein List [ i ] represents a corresponding iTree, and N individuals are constructed to form an initialization population ListN (N is 1,2,3 and 4), namely, the initial populations are List1, List2, List3 and List 4;
step 3, For i is 1to Time:
(a) calculating the fitness value of each individual, and calculating the fitness value of each List in the initial population;
(b) crossing, wherein N linked lists in the initial group are crossed with each other according to the probability cp, and front and rear half sections of each individual linked List are interchanged, namely a List1[1.. i ] is interchanged with a List2[ i +1.. N ], and similarly, a List3[ i ] and a List4[ i ]; after crossing, individual N in the initial population changes from N-4 to N-8 (List1, List2, List 3.., List 8);
(c) performing mutation, namely performing mutation on a single individual List generated by crossing according to probability mp, wherein List [ i ] is a binary iTree tree in a corresponding forest, randomly selecting a tree List [ i ] to perform mutation operation, and reconstructing the selected iTree by the mutation operation;
(d) selecting, namely selecting a forest List according to a Fitness value Fitness, wherein a Fitness function consists of precision and difference, and selecting a new solution generated after cross mutation and original individuals in an initial population by referring to a random sorting rule in a selection mode to select N (N is 4) individuals of a next generation;
(e) outputting the result if a termination condition is met, otherwise returning to the step (a) to perform the next iteration, wherein the termination condition is that the iteration is stopped or the maximum iteration number is reached when the fitness values of the individuals in a plurality of continuous generations (Times) in the GA optimization process are all higher than that of the traditional IsolationsForest model;
and 4, outputting the optimal individual List in the current group, namely the forest iForest with a better fitness value.
4. The method as claimed in claim 1, wherein the step of inputting the training set into an LSTM deep neural network for training to obtain a power load prediction model comprises:
step 1, establishing an LSTM model. Parameters to be optimized, the number of neurons L1, the learning rate epsilon and the number of training iterations k are determined, and the respective optimization ranges are determined.
Name of parameter Lower limit of parameter Upper limit of parameter Minimum velocity Maximum speed
Number of neurons L 1 400 -2 2
Learning rate epsilon 0.001 0.01 -0.001 0.001
Number of training times k 200 1000 -2 2
The LSTM time series prediction model mainly comprises the following four parts:
1) the time series data preprocessing is to be noted that, because the model belongs to a learning training model, the data set needs to be normalized and the like, so that the divergence of the training process is prevented, and the convergence of the model in the training process is ensured.
2) The input dimensions, number of network layers, and output dimensions of the LSTM model are defined.
3) Setting a trained optimizer, model initial parameters and a loss function, and starting training.
4) And obtaining a trained model, and predicting the load data by using the trained model.
And 2, initializing PSO parameters. Including initial velocity and position of the particle, learning weight, training times and scale, etc.
And 3, determining a fitness function of the particles. The MAPE value of the prediction model is used as a fitness function of the particles to find the optimal model parameters.
And 4, comparing the fitness value of the particles. And searching the individual optimal position and the global optimal position, and updating the optimal fitness value.
And 5, judging whether the maximum iteration times is reached. And if the maximum iteration times are reached, transmitting the obtained optimal parameters to the LSTM model, and carrying out training and prediction. And (5) if the requirement is not met, returning to the step (5). Taking 3 hyper-parameters of the LSTM as parameters to be optimized of the PSO, and setting the fitness function as f (x) Min (MAPE) (2) where the Mean Absolute Percentage Error (MAPE) refers to the average of the percentage errors between the real and predicted values of the power consumption:
Figure BDA0003562869060000031
of the above formula, YactualRepresenting the actual value of the load, YpreThe predicted value of the load is represented, and n represents the number of predicted load points. Meanwhile, the population size is set to be 20, the maximum iteration number is set to be 50, the learning factor c1 is 2, and the learning factor c2 is 2. The setting range of the LSTM hyperparameter is shown in table 1.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
Fig. 1 is a schematic flowchart of an embodiment of a method for identifying and compensating abnormal data of electric energy metering based on parameter optimization according to the present application.
Fig. 2 is an LSTM internal network structure.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings and specific embodiments. The present embodiment is implemented on the premise of the technical solution of the present invention, and a detailed implementation manner and a specific operation process are given, but the scope of the present invention is not limited to the following embodiments.
Embodiment example fig. 1 shows a method for identifying and compensating abnormal data of electric energy metering based on parameter optimization, which includes the following steps:
step 1, acquiring historical power load data to form a load data set;
step 2, preprocessing the load data set;
step 3, training by adopting an improved particle swarm optimization density clustering algorithm and extracting statistical characteristics of the electric energy metering load data of the user to obtain user load data of each category and user data characteristics of each category;
step 4, carrying out anomaly detection on the power data set by adopting an algorithm based on an isolated forest (iForest), introducing a genetic algorithm (genetic algorithm) to construct a new anomaly detection model GA-iForest, judging load data and determining an anomaly type;
and 5, compensating and correcting the data through the trained LSTM deep neural network. The main process comprises the following steps:
1. data pre-processing
Adopting a maximum and minimum normalization method, wherein the formula is as follows:
Figure BDA0003562869060000041
wherein y is a normalization result;
ymaxand yminRespectively the minimum value and the maximum value of the normalized variable,
x is a variable to be normalized;
xmaxand xminThe minimum value and the maximum value of the variable to be normalized are respectively.
2. Clustering algorithm
And clustering the obtained training data set through an optimized density clustering algorithm, and obtaining user load data of each category and user data characteristics of each category according to a clustering result.
The method comprises the following specific steps of utilizing an improved particle swarm optimization density clustering algorithm:
inputting: the method comprises the steps of a sample data set, the total number k of samples, the number M of clusters, the example population size M, the maximum iteration times Maxlter, acceleration constants c1 and c2, inertia weight omega and MinPts values.
And (3) outputting: dividing m clusters of the data set, and obtaining the optimal fitness value and the initial cluster center and the Eps value represented by the corresponding particles.
Begin
Initialization: setting the position Z of the particlesiIs [0.001, k ] of]K is the average dissimilarity value of the data set;
velocity V of the particlesiHas a search space of [ -V ]max,+Vmax]Setting; vmax=k
Initializing a population P (0);
fort 1to maximum number of iterations do
Calculating the fitness of the DBSCAN clustering result of each example individual in the particle group P (t);
if particle fitness value<PidFitness value of
Updating Pid
end
if particle fitness value<PgdFitness value of
Updating Pgd
end
Respectively updating the particle speed and the particle position;
end for
outputting the class cluster division corresponding to the minimum fitness value found in the whole search space;
End
during the algorithm initialization process, the average dissimilarity D of the data set is defined as follows:
Figure BDA0003562869060000051
n is the number of all samples in the data set and s (i, j) is the degree of dissimilarity between sample i and sample j in the data set. The average dissimilarity value of the samples can approximately describe the data characteristics of the entire data set, which the algorithm uses as an upper limit value of the particle swarm optimization search Eps range space.
The adopted DBSCAN clustering algorithm and fitness calculation formula are as follows:
step 1, initializing an Eps value of a DBSCAN algorithm according to the particle coding value, wherein a MinPts value is fixed to be 5
Step 2, selecting any core object P which does not belong to any cluster in the data set, and creating a new cluster;
step 3, according to the core objects in the cluster, circularly collecting the core objects with the density reaching to add into the cluster until no new core objects are added;
step 4, if no core object which does not belong to any cluster exists, turning to step 5, otherwise, returning to step 2 to continue execution;
step 5, classifying the boundary objects connected with the core object density into the corresponding core object belonging clusters;
and 6, calculating the particle fitness value according to the DBSCAN clustering result and the following formula.
Figure BDA0003562869060000052
Wherein n is the number of the clusters generated by the clustering result of the DBSCAN algorithm, and k is the input expected clustering number. According to the cluster expansion characteristics of the DBSCAN density clustering algorithm, the fitness function evaluates the clustering result by observing the coincidence degree of the number of the clustering result and the input expected clustering number, and when the number of the clusters in the clustering result is the expected clustering number, the fitness function value is 0.
Construction of GA-iForest model
The optimization of iForest is considered from the basic steps of GA, the most important of which include initialization, crossover, mutation and selection. The genetic algorithm optimization isolation forest model construction process is described as follows:
inputting ListN, Time, mp and cp;
ListN is the initial isolated forest population;
time is the maximum iteration number optimized by the genetic algorithm;
mp is the mutation probability;
cp is the crossover probability;
outputting the List of the optimal isolation forest individuals
Step 1, building iTree and Forest by continuously dividing a data set space;
step 2, population initialization, namely, regarding a single forest as an individual, wherein the individual comprises a plurality of iTrees, namely the 'gene coding' of each individual, each forest is a linked List, wherein List [ i ] represents a corresponding iTree, and N individuals are constructed to form an initialization population ListN (N is 1,2,3 and 4), namely, the initial populations are List1, List2, List3 and List 4;
step 3.For i ═ 1to T:
(a) calculating the fitness value of each individual, and calculating the fitness value of each List in the initial population;
(b) crossing, wherein N linked lists in the initial group are crossed with each other according to the probability cp, and front and rear half sections of each individual linked List are interchanged, namely a List1[1.. i ] is interchanged with a List2[ i +1.. N ], and similarly, a List3[ i ] and a List4[ i ]; after crossing, individual N in the initial population changes from N-4 to N-8 (List1, List2, List 3.., List 8);
(c) performing mutation, namely performing mutation on a single individual List generated by crossing according to probability mp, wherein List [ i ] is a binary iTree tree in a corresponding forest, randomly selecting a tree List [ i ] to perform mutation operation, and reconstructing the selected iTree by the mutation operation;
(d) selecting, namely selecting a forest List according to a Fitness value Fitness, wherein a Fitness function consists of precision and difference, and selecting a new solution generated after cross mutation and original individuals in an initial population by referring to a random sorting rule in a selection mode to select N (N is 4) individuals of a next generation;
(e) outputting the result if a termination condition is met, otherwise returning to the step < a) to perform the next iteration, wherein the termination condition is that the iteration is stopped or the maximum iteration number is reached when the fitness values of the individuals in a plurality of continuous generations (Times) in the GA optimization process are all higher than that of the traditional IsolationsForest model;
and 4, outputting the optimal individual List in the current group, namely the forest iForest with a better fitness value.
4. Training power load prediction model
Referring to fig. 2, the unit cells of the long-short term memory network mainly include an input gate (input gate), an output gate (output gate), a forgetting gate (forget gate), and a Cell State (Cell State). Among them, the three gate structures make LSTM have selective memory function, which can control the memory process of unit cell A. In particular, a portion of irrelevant secondary information is selectively forgotten by the gate control and important information is preserved, in this way expanding the memory range of the network. The output gate will produce an output h and state control information C at time t. H is the same as the output of the recurrent neural network and represents the prediction result of the model; and C is the cell state and is used to control the opening and closing of the inner door of the cell.
Step 1, establishing an LSTM model. Parameters to be optimized, the number of neurons L1, the learning rate epsilon and the number of training iterations k are determined, and the respective optimization ranges are determined.
The LSTM time series prediction model mainly comprises the following four parts:
1) the time series data preprocessing is noteworthy in that the model belongs to a learning training model, and a data set needs to be normalized and the like, so that divergence of a training process is prevented, and convergence of the model in the training process is guaranteed.
2) The input dimensions, number of network layers, and output dimensions of the LSTM model are defined.
3) Setting a trained optimizer, model initial parameters and a loss function, and starting training.
Parameter name Lower limit of parameter Upper limit of parameter Minimum velocity Maximum speed
Number of neurons L 1 400 -2 2
Learning rate epsilon 0.001 0.01 -0.001 0.001
Number of training times k 200 1000 -2 2
4) And obtaining a trained model, and predicting the load data by using the trained model.
And 2, initializing PSO parameters. Including initial velocity and position of the particle, learning weight, training times and scale, etc.
And 3, determining a fitness function of the particles. The MAPE value of the prediction model is used as a fitness function of the particles to find the optimal model parameters.
And 4, comparing the fitness value of the particles. And searching the individual optimal position and the global optimal position, and updating the optimal fitness value.
And 5, judging whether the maximum iteration times is reached. And if the maximum iteration times are reached, transmitting the obtained optimal parameters to the LSTM model, and carrying out training and prediction. And (5) if the requirement is not met, returning to the step (5).
5. Wherein the average absolute percentage error (MAPE) is an average of percentage errors between the actual and predicted power usage values:
Figure BDA0003562869060000071
of the above formula, YactualRepresenting the actual value of the load, YpreThe predicted value of the load is represented, and n represents the number of predicted load points. Meanwhile, the population size is set to be 20, the maximum iteration number is set to be 50, the learning factor c1 is 2, and the learning factor c2 is 2. A
In summary, the invention firstly adopts the improved particle swarm optimization density clustering algorithm to train and extract the statistical characteristics of the electric energy metering load data of the user, so as to obtain the load data of each category of users and the characteristics of the user data of each category. Secondly, anomaly detection is carried out on the power data set by adopting an algorithm based on an isolated forest (iForest), a genetic algorithm (genetic algorithm) is introduced to construct a new anomaly detection model GA-iForest, load data is judged and an anomaly type is determined, and finally compensation and correction are carried out on the data through a trained LSTM deep neural network, so that the purpose of data compensation of power consumption metering users is achieved, and the beneficial effect of improving the power load metering precision can be obtained.
The foregoing description details the preferred embodiments of the invention. It should be understood that numerous modifications and variations could be devised by those skilled in the art in light of the present teachings without departing from the inventive concepts. The scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the present invention, and therefore, the scope of the present invention shall be defined by the scope of the appended claims.

Claims (4)

1. A method for identifying and compensating abnormal data of electric energy metering based on parameter optimization is characterized by comprising the following steps: acquiring power load original data, preprocessing the data, and normalizing all the data to obtain a training data set; clustering the obtained training data set through an optimized density clustering algorithm, and obtaining user load data of each category and user data characteristics of each category according to a clustering result; extracting characteristics, inputting the characteristics into an anomaly detection model GA-iForest, judging user load data through the trained model, and determining an anomaly type; inputting the training set into an LSTM deep neural network for training to obtain a power load prediction model; and performing compensation correction on the data through a trained LSTM deep neural network.
2. The method for identifying and compensating the abnormal data of the electric energy metering based on the parameter optimization as claimed in claim 1, wherein the data normalization method adopts a maximum and minimum normalization method, and the formula is as follows:
Figure FDA0003562869050000011
wherein y is a normalization result;
ymaxand yminRespectively the minimum value and the maximum value of the normalized variable,
x is a variable to be normalized;
xmaxand xminThe minimum value and the maximum value of the variable to be normalized are respectively.
And clustering the obtained training data set through an optimized density clustering algorithm, and obtaining user load data of each category and user data characteristics of each category according to a clustering result.
The method comprises the following specific steps of utilizing an improved particle swarm optimization density clustering algorithm:
inputting: the method comprises the steps of a sample data set, the total number k of samples, the number M of clusters, the example population size M, the maximum iteration times Maxlter, acceleration constants c1 and c2, inertia weight omega and MinPts values.
And (3) outputting: dividing m clusters of the data set, and obtaining the optimal fitness value and the initial cluster center and the Eps value represented by the corresponding particles.
Begin
Initialization: setting the position Z of the particlesiIs [0.001, k ] of]K is the average dissimilarity value of the data set;
velocity V of the particlesiHas a search space of [ -V ]max,+Vmax]Setting; vmax=k
Initializing a population P (0);
fort 1to maximum number of iterations do
Calculating the fitness of the DBSCAN clustering result of each example individual in the particle group P (t);
if particle fitness value<PidFitness value of
Updating Pid
end
if particle fitness value<PgdFitness value of
Updating Pgd
end
Respectively updating the particle speed and the particle position;
end for
outputting the class cluster division corresponding to the minimum fitness value found in the whole search space;
End
during the algorithm initialization process, the average dissimilarity D of the data set is defined as follows:
Figure FDA0003562869050000021
n is the number of all samples in the data set and s (i, j) is the degree of dissimilarity between sample i and sample j in the data set. The average dissimilarity value of the samples can approximately describe the data characteristics of the entire data set, which the algorithm uses as an upper limit value of the particle swarm optimization search Eps range space.
The adopted DBSCAN clustering algorithm and fitness calculation formula are as follows:
step 1, initializing an Eps value of a DBSCAN algorithm according to the particle coding value, wherein a MinPts value is fixed to be 5
Step 2, selecting any core object P which does not belong to any cluster in the data set, and creating a new cluster;
step 3, according to the core objects in the cluster, circularly collecting the core objects with the density reaching to add into the cluster until no new core objects are added;
step 4, if no core object which does not belong to any cluster exists, turning to step 5, otherwise, returning to step 2 to continue execution;
step 5, classifying the boundary objects connected with the core object density into the corresponding core object belonging clusters;
and 6, calculating the particle adaptability value according to the DBSCAN clustering result and the following formula.
Figure FDA0003562869050000022
Wherein n is the number of the clusters generated by the clustering result of the DBSCAN algorithm, and k is the input expected clustering number. According to the cluster expansion characteristics of the DBSCAN density clustering algorithm, the fitness function evaluates the clustering result by observing the coincidence degree of the number of the clustering result and the input expected clustering number, and when the number of the clusters in the clustering result is the expected clustering number, the fitness function value is 0.
3. The method for identifying and compensating the abnormal data of the electric energy metering based on the parameter optimization as claimed in claim 1, characterized by comprising the following steps:
and extracting features, inputting the extracted features into an anomaly detection model GA-iForest, judging user load data through the trained model, and determining the type of the anomaly.
Wherein, the construction process of the GA-iForest model comprises the following steps:
the optimization of iForest is considered from the basic steps of GA, the most important of which include initialization, crossover, mutation and selection. The genetic algorithm optimization isolation forest model construction process is described as follows:
inputting ListN, Time, mp and cp;
ListN is the initial isolated forest population;
time is the maximum iteration number optimized by the genetic algorithm;
mp is the mutation probability;
cp is the crossover probability;
outputting the List of the optimal isolation forest individuals
Step 1, building iTree and Forest by continuously dividing a data set space;
step 2, population initialization, namely, regarding a single forest as an individual, wherein the individual comprises a plurality of iTrees, namely the 'gene coding' of each individual, each forest is a linked List, wherein List [ i ] represents a corresponding iTree, and N individuals are constructed to form an initialization population ListN (N is 1,2,3 and 4), namely, the initial populations are List1, List2, List3 and List 4;
step 3, For i is 1to Time:
(a) calculating the fitness value of each individual, and calculating the fitness value of each List in the initial population;
(b) crossing, wherein N linked lists in the initial group are crossed with each other according to the probability cp, and front and rear half sections of each individual linked List are interchanged, namely a List1[1.. i ] is interchanged with a List2[ i +1.. N ], and similarly, a List3[ i ] and a List4[ i ]; after crossing, individual N in the initial population changes from N-4 to N-8 (List1, List2, List 3.., List 8);
(c) performing mutation, namely performing mutation on a single individual List generated by crossing according to probability mp, wherein List [ i ] is a binary iTree tree in a corresponding forest, randomly selecting a tree List [ i ] to perform mutation operation, and reconstructing the selected iTree by the mutation operation;
(d) selecting, namely selecting a forest List according to a Fitness value Fitness, wherein a Fitness function consists of precision and difference, and selecting a new solution generated after cross mutation and original individuals in an initial population by referring to a random sorting rule in a selection mode to select N (N is 4) individuals of a next generation;
(e) outputting the result if a termination condition is met, otherwise returning to the step < a) to perform the next iteration, wherein the termination condition is that the iteration is stopped or the maximum iteration number is reached when the fitness values of individuals in a plurality of continuous generations (Times) in the GA optimization process are all higher than that of the traditional IsolationsForest model;
and 4, outputting the optimal individual List in the current group, namely the forest iForest with the better fitness value.
4. The method for identifying and compensating abnormal data of electric energy metering based on parameter optimization according to claim 1, wherein the training set is input into an LSTM deep neural network for training to obtain a power load prediction model, and the method comprises the following steps:
step 1, establishing an LSTM model. Parameters to be optimized, the number of neurons L1, the learning rate epsilon and the number of training iterations k are determined, and the respective optimization ranges are determined.
Parameter name Lower limit of parameter Upper limit of parameter Minimum velocity Maximum speed Number of neurons L 1 400 -2 2 Learning rate epsilon 0.001 0.01 -0.001 0.001 Number of training times k 200 1000 -2 2
The LSTM time series prediction model mainly comprises the following four parts:
1) the time series data preprocessing is noteworthy in that the model belongs to a learning training model, and a data set needs to be normalized and the like, so that divergence of a training process is prevented, and convergence of the model in the training process is guaranteed.
2) The input dimensions, number of network layers, and output dimensions of the LSTM model are defined.
3) Setting a trained optimizer, model initial parameters and a loss function, and starting training.
4) And obtaining a trained model, and predicting the load data by using the trained model.
And 2, initializing PSO parameters. Including initial velocity and position of the particle, learning weight, training times and scale, etc.
And 3, determining a fitness function of the particles. The MAPE value of the prediction model is used as a fitness function of the particles to find the optimal model parameters.
And 4, comparing the fitness value of the particles. And searching the individual optimal position and the global optimal position, and updating the optimal fitness value.
And 5, judging whether the maximum iteration times is reached. And if the maximum iteration times are reached, transmitting the obtained optimal parameters to the LSTM model, and carrying out training and prediction. And (5) if the requirement is not met, returning to the step (5). Taking 3 hyper-parameters of the LSTM as parameters to be optimized of the PSO, and setting the fitness function as f (x) Min (MAPE) (2) where the Mean Absolute Percentage Error (MAPE) refers to the average of the percentage errors between the real and predicted values of the power consumption:
Figure FDA0003562869050000031
of the above formula, YactualRepresenting the actual value of the load, YpreThe predicted value of the load is represented, and n represents the number of predicted load points. Meanwhile, the population size is set to be 20, the maximum iteration number is set to be 50, the learning factor c1 is 2, and the learning factor c2 is 2.
CN202210294793.7A 2022-03-24 2022-03-24 Parameter optimization-based electric energy metering abnormal data identification and compensation method Pending CN114528949A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210294793.7A CN114528949A (en) 2022-03-24 2022-03-24 Parameter optimization-based electric energy metering abnormal data identification and compensation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210294793.7A CN114528949A (en) 2022-03-24 2022-03-24 Parameter optimization-based electric energy metering abnormal data identification and compensation method

Publications (1)

Publication Number Publication Date
CN114528949A true CN114528949A (en) 2022-05-24

Family

ID=81626987

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210294793.7A Pending CN114528949A (en) 2022-03-24 2022-03-24 Parameter optimization-based electric energy metering abnormal data identification and compensation method

Country Status (1)

Country Link
CN (1) CN114528949A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115570228A (en) * 2022-11-22 2023-01-06 苏芯物联技术(南京)有限公司 Intelligent feedback control method and system for welding pipeline gas supply
CN115834424A (en) * 2022-10-09 2023-03-21 国网甘肃省电力公司临夏供电公司 Method for identifying and correcting abnormal data of line loss of power distribution network
CN115880102A (en) * 2023-03-08 2023-03-31 国网福建省电力有限公司 Electric energy metering method, system, equipment and storage medium
CN116738376A (en) * 2023-07-06 2023-09-12 广东筠诚建筑科技有限公司 Signal acquisition and recognition method and system based on vibration or magnetic field awakening
CN117150233A (en) * 2023-10-30 2023-12-01 广东电网有限责任公司湛江供电局 Power grid abnormal data management method, system, equipment and medium
CN117970168A (en) * 2024-03-29 2024-05-03 国网山东省电力公司莱芜供电公司 High-efficiency processing method for monitoring data of dual-power conversion device

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
朱剑飞 等: "基于ARIMA与LSTM在电力负荷预测中的对比讨论", 理论分析, vol. 41, no. 2, 28 February 2022 (2022-02-28), pages 27 - 30 *
李佳威: "基于GA-iForest与ARIMA-LSTM的WAMS异常数据清洗研究", 中国优秀硕士学位论文全文数据库(信息科技辑), 31 July 2021 (2021-07-31), pages 138 - 192 *
王晓辉 等: "基于PSO-LSTM的电力负荷预测模型", 上海节能, 28 February 2022 (2022-02-28), pages 164 - 169 *
王纵虎: "聚类分析优化关键技术研究", 中国博士学位论文全文数据库(信息科技辑), 30 November 2013 (2013-11-30), pages 138 - 19 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115834424A (en) * 2022-10-09 2023-03-21 国网甘肃省电力公司临夏供电公司 Method for identifying and correcting abnormal data of line loss of power distribution network
CN115834424B (en) * 2022-10-09 2023-11-21 国网甘肃省电力公司临夏供电公司 Identification and correction method for abnormal data of power distribution network line loss
CN115570228A (en) * 2022-11-22 2023-01-06 苏芯物联技术(南京)有限公司 Intelligent feedback control method and system for welding pipeline gas supply
CN115880102A (en) * 2023-03-08 2023-03-31 国网福建省电力有限公司 Electric energy metering method, system, equipment and storage medium
CN116738376A (en) * 2023-07-06 2023-09-12 广东筠诚建筑科技有限公司 Signal acquisition and recognition method and system based on vibration or magnetic field awakening
CN116738376B (en) * 2023-07-06 2024-01-05 广东筠诚建筑科技有限公司 Signal acquisition and recognition method and system based on vibration or magnetic field awakening
CN117150233A (en) * 2023-10-30 2023-12-01 广东电网有限责任公司湛江供电局 Power grid abnormal data management method, system, equipment and medium
CN117150233B (en) * 2023-10-30 2024-02-13 广东电网有限责任公司湛江供电局 Power grid abnormal data management method, system, equipment and medium
CN117970168A (en) * 2024-03-29 2024-05-03 国网山东省电力公司莱芜供电公司 High-efficiency processing method for monitoring data of dual-power conversion device
CN117970168B (en) * 2024-03-29 2024-05-28 国网山东省电力公司莱芜供电公司 High-efficiency processing method for monitoring data of dual-power conversion device

Similar Documents

Publication Publication Date Title
CN114528949A (en) Parameter optimization-based electric energy metering abnormal data identification and compensation method
CN113962364B (en) Multi-factor power load prediction method based on deep learning
CN111882446B (en) Abnormal account detection method based on graph convolution network
CN109034194B (en) Transaction fraud behavior deep detection method based on feature differentiation
CN111178611B (en) Method for predicting daily electric quantity
CN111861756B (en) Group partner detection method based on financial transaction network and realization device thereof
CN117349782B (en) Intelligent data early warning decision tree analysis method and system
CN116721537A (en) Urban short-time traffic flow prediction method based on GCN-IPSO-LSTM combination model
CN111695666A (en) Wind power ultra-short term conditional probability prediction method based on deep learning
CN115564114A (en) Short-term prediction method and system for airspace carbon emission based on graph neural network
CN104732067A (en) Industrial process modeling forecasting method oriented at flow object
CN116415177A (en) Classifier parameter identification method based on extreme learning machine
CN115423008A (en) Method, system and medium for cleaning operation data of power grid equipment
CN113762591B (en) Short-term electric quantity prediction method and system based on GRU and multi-core SVM countermeasure learning
CN107066468A (en) A kind of case search method based on genetic algorithm and nearest neighbor algorithm
Mao et al. An XGBoost-assisted evolutionary algorithm for expensive multiobjective optimization problems
CN113033898A (en) Electrical load prediction method and system based on K-means clustering and BI-LSTM neural network
CN113537607B (en) Power failure prediction method
CN112465253B (en) Method and device for predicting links in urban road network
CN115083511A (en) Peripheral gene regulation and control feature extraction method based on graph representation learning and attention
CN112348275A (en) Regional ecological environment change prediction method based on online incremental learning
Zheng et al. Combustion process modeling based on deep sparse least squares support vector regression
CN118133065B (en) Association processing method and device for electric power data elements
Liu et al. A hybrid model integrating improved fuzzy c-means and optimized mixed kernel relevance vector machine for classification of coal and gas outbursts
CN118174294A (en) Short-term power load prediction method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination