CN110674999A - Cell load prediction method based on improved clustering and long-short term memory deep learning - Google Patents
Cell load prediction method based on improved clustering and long-short term memory deep learning Download PDFInfo
- Publication number
- CN110674999A CN110674999A CN201910948947.8A CN201910948947A CN110674999A CN 110674999 A CN110674999 A CN 110674999A CN 201910948947 A CN201910948947 A CN 201910948947A CN 110674999 A CN110674999 A CN 110674999A
- Authority
- CN
- China
- Prior art keywords
- residential
- residential district
- clustering
- long
- value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
- G06Q50/06—Electricity, gas or water supply
Abstract
The application discloses a residential load prediction method based on improved clustering and long-short term memory deep learning, residential category division is carried out through an improved clustering algorithm according to residential influence factors of different categories, a corresponding prediction model is respectively established for each category of residential areas by using an LSTM algorithm, and Dropout processing is carried out on the LSTM algorithm, so that local optimization is avoided, a predicted load value is obtained, the power consumption difference between the reported capacity of the residential area and the actual load is reduced, and reasonable planning of a transformer area is realized. And performing improved clustering analysis according to each attribute value of the newly-built cell to obtain the cell type, and performing load prediction by using a prediction model of the corresponding type, thereby predicting the business expansion capacity and guiding the construction of the transformer area.
Description
Technical Field
The invention belongs to the field of auxiliary construction of distribution network industry expansion, and particularly relates to a cell load prediction method based on improved clustering and long-short term memory deep learning.
Background
With the continuous acceleration of urbanization construction and the vigorous implementation of supply side structure reformation, the power consumption of each region is frequently and innovatively high, but the distribution of social power resources is uneven, and the difference of power consumption of cells is large, so that the adverse phenomena that the load of most residential power utilization regions is frequently heavy and overloaded, and simultaneously, the load is light, no load or even idle in certain regions are caused. Therefore, reasonable platform load planning and capacity expansion are particularly critical to urban power grid planning, and prediction of distribution load of the platform area is important early work of capacity expansion planning. The report load of the residential district is used as an important component of the report load of the urban district, and the reasonable planning and scientific prediction of the report capacity of the residential district can provide powerful guarantee for the safety and stability of resource planning of the transformer district. Distribution load forecasts can be broadly divided into long-term, medium-term, and short-term forecasts, depending on the forecast cycle duration. The planning of the installation capacity of the residential district mainly aims to predict the long-term load of the corresponding district in the next years, so that the difficulty in later-period capacity increase or resource waste caused by unreasonable capacity is avoided. However, the online time of the electric power acquisition system is short, and the actual station data of many cities are small in data volume, missing and the like, so that it is difficult to directly use a large amount of data to establish a relatively accurate long-term prediction model.
Meanwhile, the load prediction influence factors of residential districts are complex, each residential district has the characteristics and the check-in rule, and the traditional single regression prediction analysis obviously has no applicability and universality and can not solve the problem of load prediction of an undeveloped new planning area. Therefore, it is necessary to establish a long-term prediction model according to the characteristics of the residential community to guide the early business expansion or the later business expansion construction of the residential community.
Disclosure of Invention
In order to solve the practical problems that the influence factors considered when a newly-built cell carries out medium-long term load prediction on a distribution area are few, the artificial prediction error is large, and the construction planning of the distribution area at the later stage is unreasonable and the like, the invention provides a cell load prediction method based on improved clustering and long-short term memory deep learning, which comprises the following steps of: and according to the influence factors of the residential areas of different types, carrying out residential category division, respectively establishing corresponding prediction models for each type of residential areas to obtain a predicted load value, and realizing reasonable planning of the transformer area.
In order to achieve the purpose, the invention adopts the following specific scheme:
step 1: acquiring historical load data of each residential district, and preprocessing the historical load data to obtain the historical load data of each residential district after preprocessing;
step 2: extracting a preset historical load data attribute set of each residential district according to a preset influence factor, constructing a feature vector of each residential district, and further constructing a residential district sample set;
and step 3: clustering and dividing the residential area sample set by using an improved clustering algorithm to obtain K, K final clustering centers and a clustered sample set of the number of the residential area samples to be classified;
and 4, step 4: respectively adopting a long-short term memory deep learning prediction model for each clustered sample in the clustered sample set to obtain a long-short term memory deep learning prediction model of each category;
and 5: acquiring a newly-built residential district as test data, extracting an attribute set of the tested residential district according to an influence factor, and constructing a feature vector of the tested residential district;
step 6: calculating the distance between the feature vector of the tested residential district and the K final clustering centers, wherein the smallest distance is the category of the tested residential district;
and 7: and predicting by using the long-short term memory deep learning prediction model of the type of the tested residential community to obtain the predicted load of the tested community.
The pretreatment in the step 1 comprises the following specific steps:
step 1.1: assigning null values to all abnormal values and missing values in the historical load data of each residential district;
step 1.2: filling the null value to obtain the historical load data of each residential district after filling, which specifically comprises the following steps: carrying out interpolation filling on null values appearing at a certain moment in a day by adopting daily average load, if data at all the moments in the day are null values, filling the data by using the average value of the maximum loads of the previous and the next days, and if the data are null values for a plurality of consecutive days in a month, filling the data by using the average value of the maximum loads of the days of the load data existing in the month;
step 1.3: normalizing the historical load data of each residential district after filling to obtain the normalized historical load data of each residential district, wherein the formula is as follows:
wherein the content of the first and second substances,for the historical load data, x, of the populated residential districtsminIs the minimum value, x, of the historical load data of each residential district after fillingmaxMaximum value, x, of the historical load data of each residential district after fillingiAnd i is the normalized historical load data of each residential district, and is the residential district number.
The step 2 specifically comprises the following steps:
step 2.1: setting m influence factors;
step 2.2: extracting a historical load data attribute set A ═ a of each residential district after pretreatment1,a2,a3,...,amIn which amIs the mth influencing factor;
step 2.3: constructing a feature vector for each residential district, denoted as xi={a1i,a2i,a3i,...,amiWhere i residential cell number, amiThe value is the value corresponding to the mth influence factor of the ith residential district;
step 2.4: constructing a residential area sample set, wherein the residential area sample set is expressed as X ═ X1,x2,...xi,...,xnAnd n is the number of samples.
The influence factors comprise: service life, volume ratio, property grade, district grade, educational resources, medical resources and green space area ratio.
The step 3 specifically comprises the following steps:
step 3.1: rewrite residential area sample set X ═ X1,x2,...xi,...,xnIs the sample space form X ═ X11,x12,...,xij...,xnmAnd randomly allocated to initial categories of various residential cells, wherein xijRepresenting a numerical value corresponding to a jth influence factor of an ith residential district, wherein n is the number of samples, and m is the number of the influence factors;
step 3.2: according to the initial category, defining all clustering centers and obtaining inter-category distances and intra-category distances;
the inter-class distance formula is as follows:
wherein: dWorkshopThe distance between the clusters is defined as the distance between the clusters,is a cluster CiMean of each dimension of the internal sample; b isijThe average value of each dimension of the whole sample is taken;is an initial category;
the intra-class distance formula is as follows:
wherein: dInner partIs an intra-class distance;
step 3.3: determining an optimal class value according to the class spacing and the class inner spacing, wherein the optimal class is the minimum value P of the sum of the class spacing and the class inner spacingkNamely:
min S(X,Pk)=min(Dworkshop+DInner part)
Step 3.4: according to the optimal class value PkObtaining an initial clustering center Ci=(1,2,…,Pk);
Step 3.5: calculating the distance from each sample to the initial clustering center, and re-dividing the type of the sample space according to the minimum distance;
step 3.6: calculating the mean value of each sample in the class after the sample space is divided again, and taking the mean value as an updated clustering center;
step 3.7: according to the updated clustering centers, the optimal class value is taken as the initial class, the steps 3.2-3.6 are repeated, the sample class is determined again until the clustering centers and the sample class value are unchanged, iteration is stopped, K, K final clustering centers and a clustered sample set are output, and the step 4 is switched to;
the long-short term memory deep learning prediction model of each category in the step 4 specifically comprises the following steps:
step 4.1: output o of last timet-1And input X at this timetConnecting to obtain input layer storage probability ftAnd finally outputting a result which is a value between 0 and 1:
ft=σ(Wf·[ot-1,xt]+bf)
wherein f istTo forget the gate output, WfCoefficient of linear relation, bfFor bias, σ is sigmoid activation function;
it=σ(Wf·[ot-1,xt]+bf)
Wherein itIn order to increment the weight for the information,as the instantaneous state of the current time, CtThe current time is the final state; wcThe current forgetting gate hidden layer weight is obtained;
step 4.3: constructing an output layer, and finally outputting Ot:
ht=σ(Wo·[ot-1,xt]+b0)
ot=ht*tanh(Ct)
htFor the loss calculation function, tanh is the activation function, b0For current output gate hidden layer biasing, W0The current output gate is hidden with the layer weight.
In the step 4, the long-short term memory deep learning prediction model of each category uses a dropout technology at the hidden layer to randomly discard the hidden layer output; random inactivation (dropout) is a method for optimizing an artificial neural network with a deep structure, and in the learning process, partial weight or output of a hidden layer is randomly zeroed, so that interdependency (co-dependency) among nodes is reduced, regularization (regularization) of the neural network is realized, and the structural risk of the neural network is reduced;
the step 5 specifically comprises the following steps:
acquiring a newly-built residential district as test data, and extracting an attribute set A ═ a of the residential district to be tested according to influence factors1,a2,a3,...,amAnd constructing a feature vector of the residential district to be tested Wherein, amIs the mth influencing factor.
The step 7 specifically comprises the following steps: testing the feature vector of the residential district by using the long-short term memory deep learning prediction model of the category of the residential district to be testedAs an input xtAnd predicting to obtain the predicted load of the test cell.
The beneficial technical effects are as follows:
(1) the invention fully considers the difference between different residential districts, and carries out category division on the residential districts by combining the influence factors when renting and selling the houses in the actual life, thereby loading the social development law;
(2) the whole immigration stage of the residential district generally increases in an S shape, and if the building is carried out according to the required transformer district capacity when the immigration rate is 100% in the initial stage, the transformer district resource waste is easily caused, the 10Kv resource is in short supply, and the like, the invention can effectively carry out reasonable planning on the transformer district building of the residential district, avoid the transformer district from being idle in a long period, and simultaneously make a known plan for the transformer district building in the later stage;
(3) the invention can effectively solve the problem of data shortage when the newly built residential district carries out load prediction, and a reasonable prediction model is established for the newly built district by using the load data of the similar district;
(4) at present, the estimation stage of the industrial expansion capacity of a residential district is generally judged mainly by human experience at the initial construction stage, the error is large, the consideration factor is incomplete, the method makes up for the problem, and the non-load can be automatically predicted according to the self characteristics of the residential district;
(5) in the LSTM (Long Short-Term Memory network) learning stage, the Dropout technology is adopted to carry out effective overfitting processing on the LSTM, and the training speed is improved, so that the model has strong generalization and stability and small overall fluctuation compared with other models.
Drawings
FIG. 1 is a flowchart of the whole cell load prediction method based on improved clustering and deep learning of long-short term memory according to an embodiment of the present invention;
FIG. 2 is a long-short term memory module structure of the LSTM network;
FIG. 3 is a neural network model using Dropout;
FIG. 4 shows the result of improved cluster partitioning of attribute sets according to an embodiment of the present invention;
FIG. 5 is a histogram comparing the degree of fit of various models according to the present invention;
fig. 6 is a diagram of the predicted effect according to the embodiment of the present invention, in which (a) is a predicted effect diagram of a first-type cell, (b) is a predicted effect diagram of a second-type cell, (c) is a predicted effect diagram of a third-type cell, and (d) is a predicted effect diagram of a fourth-type cell;
fig. 7 shows an LSTM network training error descending curve according to an embodiment of the present invention, (a) is a first-type cell LSTM network training error descending curve, (b) is a second-type cell LSTM network training error descending curve, (c) is a third-type cell LSTM network training error descending curve, and (d) is a fourth-type cell LSTM network training error descending curve.
Detailed Description
The present application is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present application is not limited thereby. A
The invention provides a cell load prediction method based on improved clustering and long-short term memory deep learning, which comprises the following steps: and according to the influence factors of the residential areas of different types, carrying out residential category division, respectively establishing corresponding prediction models for each type of residential areas to obtain a predicted load value, and realizing reasonable planning of the transformer area.
In the embodiment, the total number of residential cell samples is about 51, and the influence factor is 7, which is equivalent to 7 input feature dimensions.
As shown in fig. 1, the method for predicting cell load based on improved clustering and long-short term memory deep learning specifically includes:
step 1: acquiring historical load data of each residential district, and preprocessing the historical load data to obtain the historical load data of each residential district after preprocessing;
the pretreatment in the step 1 comprises the following specific steps:
step 1.1: assigning null values to all abnormal values and missing values in the historical load data of each residential district;
step 1.2: filling the null value to obtain the historical load data of each residential district after filling, which specifically comprises the following steps: carrying out interpolation filling on null values appearing at a certain moment in a day by adopting daily average load, if data at all the moments in the day are null values, filling the data by using the average value of the maximum loads of the previous and the next days, and if the data are null values for a plurality of consecutive days in a month, filling the data by using the average value of the maximum loads of the days of the load data existing in the month;
step 1.3: normalizing the historical load data of each residential district after filling to obtain the normalized historical load data of each residential district, wherein the formula is as follows:
wherein the content of the first and second substances,for the historical load data, x, of the populated residential districtsminIs the minimum value, x, of the historical load data of each residential district after fillingmaxMaximum value, x, of the historical load data of each residential district after fillingiAnd i is the normalized historical load data of each residential district, and is the residential district number.
Step 2: extracting a preset historical load data attribute set of each residential district according to the set influence factor, constructing a feature vector of each residential district, and further constructing a residential district sample set;
the step 2 specifically comprises the following steps:
step 2.1: setting m influence factors;
step 2.2: extracting a historical load data attribute set A ═ a of each residential district after pretreatment1,a2,a3,...,amIn which amIs the mth influencing factor;
step 2.3: constructing a feature vector for each residential district, denoted as xi={a1i,a2i,a3i,...,amiWhere i residential cell number, amiFor the m influence of the ith residential districtThe value corresponding to the factor;
step 2.4: constructing a residential area sample set, wherein the residential area sample set is expressed as X ═ X1,x2,...xi,...,xnAnd n is the number of samples.
The influence factors comprise: service life, volume ratio, property grade, district grade, educational resources, medical resources and green space area ratio.
And step 3: clustering and dividing the residential area sample set by using an improved clustering algorithm to obtain K, K final clustering centers and a clustered sample set of the number of the residential area samples to be classified;
the step 3 specifically comprises the following steps:
step 3.1: rewrite residential area sample set X ═ X1,x2,...xi,...,xnIs the sample space form X ═ X11,x12,...,xij,...,xnmAnd randomly distributing the initial categories to all residential cells, wherein x isijRepresenting a numerical value corresponding to a jth influence factor of an ith residential district, wherein n is the number of samples, and m is the number of the influence factors;
step 3.2: according to the initial category, defining all clustering centers and obtaining inter-category distances and intra-category distances;
the inter-class distance formula is as follows:
wherein: dWorkshopThe distance between the clusters is defined as the distance between the clusters,is a cluster CiMean of each dimension of the internal sample; b isijThe average value of each dimension of the whole sample is taken;is an initial category;
the intra-class distance formula is as follows:
wherein: dInner partIs an intra-class distance;
step 3.3: determining an optimal class value according to the class spacing and the class inner spacing, wherein the optimal class is the minimum value P of the sum of the class spacing and the class inner spacingkNamely:
min S(X,Pk)=min(Dworkshop+DInner part)
Step 3.4: according to the optimal class value PkObtaining an initial clustering center Ci=(1,2,…,Pk);
Step 3.5: calculating the distance from each sample to the initial clustering center, and re-dividing the type of the sample space according to the minimum distance;
step 3.6: calculating the mean value of each sample in the class after the sample space is divided again, and taking the mean value as an updated clustering center;
step 3.7: according to the updated clustering centers, the optimal class value is taken as the initial class, the steps 3.2-3.6 are repeated, the sample class is determined again until the clustering centers and the sample class value are unchanged, iteration is stopped, K, K final clustering centers and a clustered sample set are output, and the step 4 is switched to;
and clustering the residential cells according to the attribute set of the residential cells. Obtaining inter-class distance and intra-class distance according to the initial class, determining an optimal class value according to the inter-class distance and the intra-class distance, then re-obtaining a corresponding initial class center according to the optimal class value, re-determining the sample class according to the nearest distance principle, calculating each clustering mean value as an updated clustering center, and iterating the calculation process according to the new clustering center until the clustering result is unchanged. As shown in fig. 4, in the embodiment, 51 residential cells are finally classified into 4 types, the convergence condition is that the category label of each residential cell is not changed any more, and the obtained result substantially meets the requirement of subsequent prediction.
And 4, step 4: respectively adopting a long-short term memory deep learning prediction model for each clustered sample in the clustered sample set to obtain a long-short term memory deep learning prediction model of each category;
long Short-Term Memory network (LSTM), Long Short-Term Memory network, is a special RNN model, can utilize time recursion to construct an artificial neural network, realizes the learning of Long-Term dependent information by improving the problem of Long-interval information loss in the conventional RNN, and is suitable for processing and predicting important events with relatively Long intervals and delays in time sequences. Like all RNNs, LSTM also has duplicate modules, but unlike standard RNNs, LSTM has a complex duplicate module structure, and the modules interact in a special manner to form a four-level network, and the specific structure of the modules is shown in fig. 2.
As can be seen from fig. 2, LSTM differs from standard recurrent neural networks in that a single Sigmoid is used as the firing function, LSTM controls the transfer of information between neurons through a structure called "gate" that selectively passes information into cells, and there are three gates in LSTM: the cell state is controlled by different gates so as to achieve the purposes of long-term learning and information screening.
The long-short term memory deep learning prediction model of each category in the step 4 specifically comprises the following steps:
step 4.1: the forgetting gate determines to delete information, and the output o at the previous moment is output by the forgetting gatet-1And input X at this timetConnecting to obtain input layer storage probability ftAnd finally outputting a result which is a value between 0 and 1:
ft=σ(Wf·[ot-1,xt]+bf)
wherein f istTo forget the gate output, WfCoefficient of linear relation, bfFor bias, σ is sigmoid activation function;
it=σ(Wf·[ot-1,xt]+bf)
Wherein itIn order to increment the weight for the information,as the instantaneous state of the current time, CtThe current time is the final state; wcThe current forgetting gate hidden layer weight is obtained;
after the input information of the upper layer neuron is selected and obtained, the input information updating content is determined by using an 'input gate', and the formula i is usedtCalculating to obtain the update probability, combining the new input by the tanh layer, and calculating to generate a new combined output quantity by using a formula
Updating neuron state content using the generated forgetting probability ftAnd input probabilityObtaining new neuron content;
step 4.3: constructing an output layer, and finally outputting Ot:
ht=σ(Wo·[ot-1,xt]+b0)
ot=ht*tanh(Ct)
Wherein h istTan h is an activation function; b0For current output gate hidden layer biasing, W0The current output gate is hidden with the layer weight.
Finally, the output content of the neuron is determined according to the output gate, and the operation is carried outObtaining output content by a sigmoid layer, obtaining output probability by utilizing a tanh function, and finally utilizing a formula OtResulting in the final output of the LSTM.
In the step 4, the long-short term memory deep learning prediction model of each category uses a dropout technology at the hidden layer to randomly discard the hidden layer output;
in the prediction stage using the LSTM, setting the hidden layer of the LSTM to be 3 layers, and processing the hidden layer by using Dropout technology, the step size of the input layer is set to be the first 60% of the cell building time by continuously adjusting, wherein the kernel function of the LSTM is 'SIG', and the network layer of the LSTM is set to be three layers, and the number of neurons in each layer is 10, 20 and 40 in sequence.
According to the previous definition of the platform load prediction input and output, the embodiment constructs an LSTM prediction model comprising an input layer, three hidden layers and an output layer. The step size of the input layer is set to the first 60% of the cell establishment time by continuous adjustment, which mainly considers that the information contained in the too short time sequence length is not complete enough, but also aims to suppress the degradation of the model performance caused by a large number of sequence inputs. Meanwhile, the training of the whole model is accelerated by using the dropout technology in the hidden layer, and the problem of overfitting in the later period is effectively avoided. Dropout is a technique for preventing overfitting of a model, which performs equal probability inactivation on neurons by randomly discarding hidden layer outputs under the condition of keeping updated weights of output gates, avoids the model from being over-dependent on some local features, makes the model more robust, and improves the LSTM performance, and the effect is as shown in fig. 3;
and 5: acquiring a newly-built residential district as test data, extracting an attribute set of the tested residential district according to an influence factor, and constructing a feature vector of the tested residential district; the step 5 specifically comprises the following steps:
acquiring a newly-built residential district as test data, and extracting an attribute set A ═ a of the residential district to be tested according to influence factors1,a2,a3,...,amAnd constructing a feature vector of the residential district to be tested Wherein, amIs the mth influencing factor.
Step 6: calculating the distance between the feature vector of the tested residential district and the K final clustering centers, wherein the smallest distance is the category of the tested residential district;
In the examples: the invention mainly aims at the medium-term prediction of the load of the distribution area, thereby obtaining reasonable business expansion capacity and improving the utilization rate of the distribution area, so the method mainly takes months as a unit, and the prediction in years takes fewer load values, which causes lower prediction precision and larger difference between the predicted value and the actual value.
According to the acquisition interval of the load of a certain market area, 15 minutes are taken as the load value of the day, the maximum load of each day is selected as the load value of the day, and the load value of each month can be expressed as m ═ { d ═ d%1,d2,...,dtAnd (c) wherein t is 28/30/31, and max (m) is selected as the load value of the month.
And selecting the previous w-year and month data of each residential area as sample input and the first month load data of the w +1 year as output according to different building time of each residential area during model training, and sequentially delaying. And when the model is verified, the cell with the existing load is taken as a newly-built cell, the trained model is directly input for prediction, and the cell is taken as historical data for sliding prediction until a target annual and monthly load value is obtained after a monthly load value is predicted.
Description of the experiment:
firstly, obtaining 51 cells vectorization representation according to attributes such as service life, volume ratio and the like of the cells, obtaining a cell attribute set according to the model of the invention, and further carrying out clustering division on the cells of different categories by utilizing an improved clustering algorithm, wherein clustering results are shown in fig. 4, and it can be seen from fig. 4 that the 51 cells are automatically clustered into four categories according to corresponding attribute values, the degree of separability is obvious, and then four prediction models are respectively established for the four categories of cells.
And dividing the cells of each category into a training set and a test set, selecting one cell load with the shortest building time from each category of cells as test data, and building an LSTM prediction model corresponding to each category of cells by using the training data as the rest. And the known type cells in the test set are taken as new cells without historical load data and are directly input into corresponding trained models to predict the maximum load value of the target year and month, meanwhile, multivariate ARMA and SVM are adopted to model the load data of each type of cells, and the results are compared and analyzed with the ICA-LSTM model result. The kernel function of the SVM is 'RBF', the kernel function of the LSTM is 'SIG', the LSTM network layer is set to be three layers, and the number of neurons in each layer is 10, 20 and 40 in sequence. In order to compare the effects of the models, the invention adopts the prediction evaluation indexes of average absolute error MAE, root mean square error RMSE and fitting degree R-Squared to evaluate the models, and the calculation formula is as follows:
the results are shown in table (1) and fig. 5:
TABLE 1 MAPE, RMSE values for the three models
It can be seen from table 1 that, in the three models, the MAPE and RMSE values of LSTM are lower than those of the other two models, fig. 5 is the fitting degree between the predicted curve and the true curve of each type of cell for the three models, and it can be seen from the figure that the predicted fitting degree of LSTM is the maximum, the fitting degree is better, and the error accuracy and the fitting degree of LSTM are relatively stable for any type of cell, which indicates that the LSTM has good generalization performance. In order to more intuitively see the prediction effect of the LSTM, the prediction values of the four types of cells and the original time load are compared through graphical analysis by the three models, and the time load is shown in FIG. 6
It is seen from fig. 5 that the prediction effect of LSTM in four types of cells is significantly better than that of the other two models, and is most similar to the real variation curve, while the fitting degree effect of multivariate ARMA is the worst, the prediction effect is poor, and the model adaptation degree is also poor. Fig. 7 is a descending curve of training loss of four prediction models of LSTM, the horizontal axis is the number of times of cyclic training, and the vertical axis is the loss of each training function, so that it can be seen that no matter which type of cells the function loss of LSTM has been reduced to 0.01 when the number of times of training is about 50, the decline of the whole training process is very stable, no large-scale oscillation occurs, and the convergence effect is excellent. This further illustrates the good performance of LSTM in the face of longer period predictions.
And (4) conclusion:
the invention starts from the self attribute of the load prediction sample object, establishes the LSTM prediction model of the corresponding type aiming at different types of residential districts, and combines various self characteristics and internal and external factors of the residential districts with the prediction model except considering the time development factor, thereby avoiding the singleness of the prediction model and ensuring higher applicability. The model verification is carried out by using the real load data of the residential district in a certain city, and the result shows that the model can effectively improve the medium-term and long-term prediction precision of the power load of the district, has good popularization capability and higher practical significance, and provides an important theoretical basis for the construction of the power distribution network in the transformer district of the residential district. Therefore, a future load characteristic development rule of the newly-built cell is obtained, and guidance is provided for later-stage platform area construction planning.
The present applicant has described and illustrated embodiments of the present invention in detail with reference to the accompanying drawings, but it should be understood by those skilled in the art that the above embodiments are merely preferred embodiments of the present invention, and the detailed description is only for the purpose of helping the reader to better understand the spirit of the present invention, and not for limiting the scope of the present invention, and on the contrary, any improvement or modification made based on the spirit of the present invention should fall within the scope of the present invention.
Claims (9)
1. A cell load prediction method based on improved clustering and long-short term memory deep learning is characterized in that:
and according to the influence factors of the residential areas of different types, carrying out residential category division, respectively establishing corresponding prediction models for each type of residential areas to obtain a predicted load value, and realizing reasonable planning of the transformer area.
2. The cell load prediction method based on improved clustering and long-short term memory deep learning according to claim 1, wherein the load prediction method comprises the following steps:
step 1: acquiring historical load data of each residential district, and preprocessing the historical load data to obtain the historical load data of each residential district after preprocessing;
step 2: extracting a preset historical load data attribute set of each residential district according to a preset influence factor, constructing a feature vector of each residential district, and further constructing a residential district sample set;
and step 3: clustering and dividing the residential area sample set by using an improved clustering algorithm to obtain K, K final clustering centers and a clustered sample set of the number of the residential area samples to be classified;
and 4, step 4: respectively adopting a long-short term memory deep learning prediction model for each clustered sample in the clustered sample set to obtain a long-short term memory deep learning prediction model of each category;
and 5: acquiring a newly-built residential district as test data, extracting an attribute set of the tested residential district according to an influence factor, and constructing a feature vector of the tested residential district;
step 6: calculating the distance between the feature vector of the tested residential district and the K final clustering centers, wherein the smallest distance is the category of the tested residential district;
and 7: and predicting by using the long-short term memory deep learning prediction model of the type of the tested residential community to obtain the predicted load of the tested community.
3. The method for predicting cell load based on improved clustering and long-short term memory deep learning according to claim 2, wherein the preprocessing in the step 1 comprises:
step 1.1: assigning null values to all abnormal values and missing values in the historical load data of each residential district;
step 1.2: filling the null value to obtain the historical load data of each residential district after filling;
step 1.3: normalizing the historical load data of each residential district after filling to obtain the normalized historical load data of each residential district, wherein the formula is as follows:
wherein the content of the first and second substances,for the historical load data, x, of the populated residential districtsminIs the minimum value, x, of the historical load data of each residential district after fillingmaxMaximum value, x, of the historical load data of each residential district after fillingiAnd i is the normalized historical load data of each residential district, and is the residential district number.
4. The method for predicting cell load based on improved clustering and deep learning of long-short term memory according to claim 3, wherein in step 1.2, the filling of null values is specifically: and (3) carrying out interpolation filling on null values appearing at a certain moment in a day by adopting a daily average load, if the data of all the moments in the day are null values, carrying out filling by using the average value of the maximum loads of the previous and the next days, and if the data of the continuous days in the month are null values, carrying out filling by using the average value of the maximum loads of the days in which the load data exist in the month.
5. The method for predicting cell load based on improved clustering and long-short term memory deep learning according to claim 2, wherein the step 2 is specifically as follows:
step 2.1: setting m influence factors;
step 2.2: extracting the preprocessed historical load data attribute set A ═ { a ═ a of each residential district1,a2,a3,...,amIn which amIs the mth influencing factor;
step 2.3: constructing a feature vector for each residential district, denoted as xi={a1i,a2i,a3i,...,amiWhere i residential cell number, amiThe value is the value corresponding to the mth influence factor of the ith residential district;
step 2.4: constructing a residential area sample set, wherein the residential area sample set is expressed as X ═ X1,x2,...xi,...,xnAnd n is the number of samples.
6. The method for predicting cell load based on improved clustering and deep learning of long-short term memory according to claim 2 or claim 5, wherein the influencing factors comprise: service life, volume ratio, property grade, district grade, educational resources, medical resources and green space area ratio.
7. The method for predicting cell load based on improved clustering and long-short term memory deep learning according to claim 2, wherein the step 3 is specifically as follows:
step 3.1: rewrite residential area sample set X ═ X1,x2,...xi,...,xnIs the sample space form X ═ X11,x12,...,xij,...,xnmAnd randomly distributing the initial categories to all residential cells, wherein x isijRepresenting a numerical value corresponding to a jth influence factor of an ith residential district, wherein n is the number of samples, and m is the number of the influence factors;
step 3.2: according to the initial category, defining all clustering centers and obtaining inter-category distances and intra-category distances;
the inter-class distance formula is as follows:
wherein: dWorkshopThe distance between the clusters is defined as the distance between the clusters,as the initial clustering center CiMean of each dimension of the internal sample; b isijThe average value of each dimension of the whole sample is taken;is an initial category;
the intra-class distance formula is as follows:
wherein: dInner partIs an intra-class distance;
step 3.3: determining an optimal class value according to the class spacing and the class inner spacing, wherein the optimal class is the minimum value P of the sum of the class spacing and the class inner spacingkNamely:
min S(X,Pk)=min(Dworkshop+DInner part)
Step 3.4: according to the optimal class value PkObtaining an initial clustering center Ci=(1,2,...,Pk);
Step 3.5: calculating the distance from each sample to the initial clustering center, and re-dividing the type of the sample space according to the minimum distance;
step 3.6: calculating the mean value of each sample in the class after the sample space is divided again, and taking the mean value as an updated clustering center;
step 3.7: and (4) according to the updated clustering centers, taking the optimal class value as the initial class, repeating the steps 3.2-3.6, re-determining the class of the sample until the clustering centers and the class value of the sample are unchanged, stopping iteration, outputting the final clustering centers and the clustered sample set of which the classification number of the final residential area samples is K, K, and turning to the step 4.
8. The method for predicting cell load based on improved clustering and long-short term memory deep learning according to claim 1, wherein the long-short term memory deep learning prediction model of each category in the step 4 is specifically:
step 4.1: output o of last timet-1And input x at this timetConnecting to obtain input layer storage probability ftAnd finally outputting a result which is a value between 0 and 1:
ft=σ(Wf·[ot-1,xt]+bf)
wherein f istTo forget the gate output, WfCoefficient of linear relation, bfFor bias, σ is sigmoid activation function;
step 4.2: according to the input layer, constructing a hidden layer:
it=σ(Wf·[ot-1,xt]+bf)
wherein itIn order to increment the weight for the information,as the instantaneous state of the current time, CtThe current time is the final state; wcThe current forgetting gate hidden layer weight is obtained;
step 4.3: constructing an output layer with a final output of ot:
ht=σ(Wo·[ot-1,xt]+b0)
ot=ht*tanh(Ct)
Wherein h istFor the loss calculation function, tanh is the activation function, b0For current output gate hidden layer biasing, W0The current output gate is hidden with the layer weight.
9. The method as claimed in claim 4, wherein the prediction model of long-short term memory deep learning in step 4 uses dropout technique in its hidden layer to discard the hidden layer output randomly.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910948947.8A CN110674999A (en) | 2019-10-08 | 2019-10-08 | Cell load prediction method based on improved clustering and long-short term memory deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910948947.8A CN110674999A (en) | 2019-10-08 | 2019-10-08 | Cell load prediction method based on improved clustering and long-short term memory deep learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110674999A true CN110674999A (en) | 2020-01-10 |
Family
ID=69080761
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910948947.8A Pending CN110674999A (en) | 2019-10-08 | 2019-10-08 | Cell load prediction method based on improved clustering and long-short term memory deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110674999A (en) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111310583A (en) * | 2020-01-19 | 2020-06-19 | 中国科学院重庆绿色智能技术研究院 | Vehicle abnormal behavior identification method based on improved long-term and short-term memory network |
CN111442476A (en) * | 2020-03-06 | 2020-07-24 | 财拓云计算(上海)有限公司 | Method for realizing energy-saving temperature control of data center by using deep migration learning |
CN111461400A (en) * | 2020-02-28 | 2020-07-28 | 国网浙江省电力有限公司 | Load data completion method based on Kmeans and T-L STM |
CN111652444A (en) * | 2020-06-05 | 2020-09-11 | 南京机电职业技术学院 | K-means and LSTM-based daily passenger volume prediction method |
CN111832796A (en) * | 2020-02-29 | 2020-10-27 | 上海电力大学 | Fine classification and prediction method and system for residential electricity load mode |
CN112183846A (en) * | 2020-09-25 | 2021-01-05 | 合肥工业大学 | TVF-EMD-MCQRNN load probability prediction method based on fuzzy C-means clustering |
CN112308345A (en) * | 2020-11-30 | 2021-02-02 | 中国联合网络通信集团有限公司 | Communication network load prediction method, device and server |
CN112365098A (en) * | 2020-12-07 | 2021-02-12 | 国网冀北电力有限公司承德供电公司 | Power load prediction method, device, equipment and storage medium |
CN112580260A (en) * | 2020-12-22 | 2021-03-30 | 广州杰赛科技股份有限公司 | Method and device for predicting water flow of pipe network and computer readable storage medium |
CN113011630A (en) * | 2021-01-25 | 2021-06-22 | 国网浙江省电力有限公司杭州供电公司 | Method for short-term prediction of space load in zone time of big data power distribution network |
CN113255764A (en) * | 2021-05-21 | 2021-08-13 | 池测(上海)数据科技有限公司 | Method, system and device for detecting electrochemical energy storage system fault by using machine learning |
CN113449793A (en) * | 2021-06-28 | 2021-09-28 | 国网北京市电力公司 | Method and device for determining power utilization state |
CN113570004A (en) * | 2021-09-24 | 2021-10-29 | 西南交通大学 | Riding hot spot area prediction method, device, equipment and readable storage medium |
CN114722891A (en) * | 2022-02-11 | 2022-07-08 | 杭州致成电子科技有限公司 | Transformer area state classification method |
CN114760637A (en) * | 2020-12-25 | 2022-07-15 | 中国联合网络通信集团有限公司 | Cell capacity expansion method and device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103218400A (en) * | 2013-03-15 | 2013-07-24 | 北京工业大学 | Method for dividing network community user groups based on link and text contents |
CN107506872A (en) * | 2017-09-14 | 2017-12-22 | 国网福建省电力有限公司 | A kind of residential block part throttle characteristics and the Categorical research method of model prediction |
CN107591803A (en) * | 2017-09-21 | 2018-01-16 | 国网上海市电力公司 | A kind of electric load behavior prediction method based on demand response |
CN108830487A (en) * | 2018-06-21 | 2018-11-16 | 王芊霖 | Methods of electric load forecasting based on long neural network in short-term |
-
2019
- 2019-10-08 CN CN201910948947.8A patent/CN110674999A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103218400A (en) * | 2013-03-15 | 2013-07-24 | 北京工业大学 | Method for dividing network community user groups based on link and text contents |
CN107506872A (en) * | 2017-09-14 | 2017-12-22 | 国网福建省电力有限公司 | A kind of residential block part throttle characteristics and the Categorical research method of model prediction |
CN107591803A (en) * | 2017-09-21 | 2018-01-16 | 国网上海市电力公司 | A kind of electric load behavior prediction method based on demand response |
CN108830487A (en) * | 2018-06-21 | 2018-11-16 | 王芊霖 | Methods of electric load forecasting based on long neural network in short-term |
Non-Patent Citations (1)
Title |
---|
裘斌: ""基于数据预处理和深度置信网络的短期电力负荷预测研究"", 《中国优秀硕士论文全文数据库 工程科技II辑》 * |
Cited By (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111310583A (en) * | 2020-01-19 | 2020-06-19 | 中国科学院重庆绿色智能技术研究院 | Vehicle abnormal behavior identification method based on improved long-term and short-term memory network |
CN111310583B (en) * | 2020-01-19 | 2023-02-10 | 中国科学院重庆绿色智能技术研究院 | Vehicle abnormal behavior identification method based on improved long-term and short-term memory network |
CN111461400A (en) * | 2020-02-28 | 2020-07-28 | 国网浙江省电力有限公司 | Load data completion method based on Kmeans and T-L STM |
CN111461400B (en) * | 2020-02-28 | 2023-06-23 | 国网浙江省电力有限公司 | Kmeans and T-LSTM-based load data completion method |
CN111832796B (en) * | 2020-02-29 | 2022-12-23 | 上海电力大学 | Fine classification and prediction method and system for residential electricity load mode |
CN111832796A (en) * | 2020-02-29 | 2020-10-27 | 上海电力大学 | Fine classification and prediction method and system for residential electricity load mode |
CN111442476A (en) * | 2020-03-06 | 2020-07-24 | 财拓云计算(上海)有限公司 | Method for realizing energy-saving temperature control of data center by using deep migration learning |
CN111652444B (en) * | 2020-06-05 | 2023-07-21 | 南京机电职业技术学院 | K-means and LSTM-based daily guest volume prediction method |
CN111652444A (en) * | 2020-06-05 | 2020-09-11 | 南京机电职业技术学院 | K-means and LSTM-based daily passenger volume prediction method |
CN112183846A (en) * | 2020-09-25 | 2021-01-05 | 合肥工业大学 | TVF-EMD-MCQRNN load probability prediction method based on fuzzy C-means clustering |
CN112183846B (en) * | 2020-09-25 | 2022-04-19 | 合肥工业大学 | TVF-EMD-MCQRNN load probability prediction method based on fuzzy C-means clustering |
CN112308345A (en) * | 2020-11-30 | 2021-02-02 | 中国联合网络通信集团有限公司 | Communication network load prediction method, device and server |
CN112365098A (en) * | 2020-12-07 | 2021-02-12 | 国网冀北电力有限公司承德供电公司 | Power load prediction method, device, equipment and storage medium |
CN112580260A (en) * | 2020-12-22 | 2021-03-30 | 广州杰赛科技股份有限公司 | Method and device for predicting water flow of pipe network and computer readable storage medium |
CN114760637A (en) * | 2020-12-25 | 2022-07-15 | 中国联合网络通信集团有限公司 | Cell capacity expansion method and device |
CN114760637B (en) * | 2020-12-25 | 2023-06-06 | 中国联合网络通信集团有限公司 | Cell capacity expansion method and device |
CN113011630A (en) * | 2021-01-25 | 2021-06-22 | 国网浙江省电力有限公司杭州供电公司 | Method for short-term prediction of space load in zone time of big data power distribution network |
CN113011630B (en) * | 2021-01-25 | 2024-01-23 | 国网浙江省电力有限公司杭州供电公司 | Short-term prediction method for space-time load of big data distribution network area |
CN113255764A (en) * | 2021-05-21 | 2021-08-13 | 池测(上海)数据科技有限公司 | Method, system and device for detecting electrochemical energy storage system fault by using machine learning |
CN113449793A (en) * | 2021-06-28 | 2021-09-28 | 国网北京市电力公司 | Method and device for determining power utilization state |
CN113570004A (en) * | 2021-09-24 | 2021-10-29 | 西南交通大学 | Riding hot spot area prediction method, device, equipment and readable storage medium |
CN114722891A (en) * | 2022-02-11 | 2022-07-08 | 杭州致成电子科技有限公司 | Transformer area state classification method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110674999A (en) | Cell load prediction method based on improved clustering and long-short term memory deep learning | |
Li et al. | Prediction for tourism flow based on LSTM neural network | |
CN108846517B (en) | Integration method for predicating quantile probabilistic short-term power load | |
CN108510006B (en) | Enterprise power consumption analysis and prediction method based on data mining | |
CN109063911B (en) | Load aggregation grouping prediction method based on gated cycle unit network | |
CN105631483B (en) | A kind of short-term electro-load forecast method and device | |
CN110705743B (en) | New energy consumption electric quantity prediction method based on long-term and short-term memory neural network | |
CN112116144B (en) | Regional power distribution network short-term load prediction method | |
Shao et al. | Nickel price forecast based on the LSTM neural network optimized by the improved PSO algorithm | |
CN111260136A (en) | Building short-term load prediction method based on ARIMA-LSTM combined model | |
CN109359786A (en) | A kind of power station area short-term load forecasting method | |
CN108564391B (en) | Shared electric vehicle demand prediction method and system considering subjective and objective information | |
CN113554466B (en) | Short-term electricity consumption prediction model construction method, prediction method and device | |
CN109034500A (en) | A kind of mid-term electric load forecasting method of multiple timings collaboration | |
CN109508826A (en) | The schedulable capacity prediction methods of electric car cluster of decision tree are promoted based on gradient | |
CN113807589B (en) | Rolling optimization method and device for energy storage of operators based on model prediction | |
CN112329990A (en) | User power load prediction method based on LSTM-BP neural network | |
CN115186803A (en) | Data center computing power load demand combination prediction method and system considering PUE | |
CN110717581A (en) | Short-term load prediction method based on temperature fuzzy processing and DBN | |
CN114298377A (en) | Photovoltaic power generation prediction method based on improved extreme learning machine | |
CN113139605A (en) | Power load prediction method based on principal component analysis and LSTM neural network | |
CN109754122A (en) | A kind of Numerical Predicting Method of the BP neural network based on random forest feature extraction | |
Wang et al. | Short-term load forecasting with LSTM based ensemble learning | |
CN110991689A (en) | Distributed photovoltaic power generation system short-term prediction method based on LSTM-Morlet model | |
CN114091776A (en) | K-means-based multi-branch AGCNN short-term power load prediction method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200110 |
|
RJ01 | Rejection of invention patent application after publication |