CN110674999A

CN110674999A - Cell load prediction method based on improved clustering and long-short term memory deep learning

Info

Publication number: CN110674999A
Application number: CN201910948947.8A
Authority: CN
Inventors: 田杨阳; 张小斐; 王楠; 郭志民; 耿俊成; 袁少光; 万迪名; 李铭岩; 刘芳冰; 陶亚光; 王倩; 牛霜霞; 毛万登; 时洪飞; 肖寒
Original assignee: Henan Power Co Weihui Power Supply Co; Electric Power Research Institute of State Grid Henan Electric Power Co Ltd
Current assignee: Henan Power Co Weihui Power Supply Co; Electric Power Research Institute of State Grid Henan Electric Power Co Ltd
Priority date: 2019-10-08
Filing date: 2019-10-08
Publication date: 2020-01-10

Abstract

The application discloses a residential load prediction method based on improved clustering and long-short term memory deep learning, residential category division is carried out through an improved clustering algorithm according to residential influence factors of different categories, a corresponding prediction model is respectively established for each category of residential areas by using an LSTM algorithm, and Dropout processing is carried out on the LSTM algorithm, so that local optimization is avoided, a predicted load value is obtained, the power consumption difference between the reported capacity of the residential area and the actual load is reduced, and reasonable planning of a transformer area is realized. And performing improved clustering analysis according to each attribute value of the newly-built cell to obtain the cell type, and performing load prediction by using a prediction model of the corresponding type, thereby predicting the business expansion capacity and guiding the construction of the transformer area.

Description

Cell load prediction method based on improved clustering and long-short term memory deep learning

Technical Field

The invention belongs to the field of auxiliary construction of distribution network industry expansion, and particularly relates to a cell load prediction method based on improved clustering and long-short term memory deep learning.

Background

With the continuous acceleration of urbanization construction and the vigorous implementation of supply side structure reformation, the power consumption of each region is frequently and innovatively high, but the distribution of social power resources is uneven, and the difference of power consumption of cells is large, so that the adverse phenomena that the load of most residential power utilization regions is frequently heavy and overloaded, and simultaneously, the load is light, no load or even idle in certain regions are caused. Therefore, reasonable platform load planning and capacity expansion are particularly critical to urban power grid planning, and prediction of distribution load of the platform area is important early work of capacity expansion planning. The report load of the residential district is used as an important component of the report load of the urban district, and the reasonable planning and scientific prediction of the report capacity of the residential district can provide powerful guarantee for the safety and stability of resource planning of the transformer district. Distribution load forecasts can be broadly divided into long-term, medium-term, and short-term forecasts, depending on the forecast cycle duration. The planning of the installation capacity of the residential district mainly aims to predict the long-term load of the corresponding district in the next years, so that the difficulty in later-period capacity increase or resource waste caused by unreasonable capacity is avoided. However, the online time of the electric power acquisition system is short, and the actual station data of many cities are small in data volume, missing and the like, so that it is difficult to directly use a large amount of data to establish a relatively accurate long-term prediction model.

Meanwhile, the load prediction influence factors of residential districts are complex, each residential district has the characteristics and the check-in rule, and the traditional single regression prediction analysis obviously has no applicability and universality and can not solve the problem of load prediction of an undeveloped new planning area. Therefore, it is necessary to establish a long-term prediction model according to the characteristics of the residential community to guide the early business expansion or the later business expansion construction of the residential community.

Disclosure of Invention

In order to solve the practical problems that the influence factors considered when a newly-built cell carries out medium-long term load prediction on a distribution area are few, the artificial prediction error is large, and the construction planning of the distribution area at the later stage is unreasonable and the like, the invention provides a cell load prediction method based on improved clustering and long-short term memory deep learning, which comprises the following steps of: and according to the influence factors of the residential areas of different types, carrying out residential category division, respectively establishing corresponding prediction models for each type of residential areas to obtain a predicted load value, and realizing reasonable planning of the transformer area.

In order to achieve the purpose, the invention adopts the following specific scheme:

step 1: acquiring historical load data of each residential district, and preprocessing the historical load data to obtain the historical load data of each residential district after preprocessing;

step 2: extracting a preset historical load data attribute set of each residential district according to a preset influence factor, constructing a feature vector of each residential district, and further constructing a residential district sample set;

and step 3: clustering and dividing the residential area sample set by using an improved clustering algorithm to obtain K, K final clustering centers and a clustered sample set of the number of the residential area samples to be classified;

and 4, step 4: respectively adopting a long-short term memory deep learning prediction model for each clustered sample in the clustered sample set to obtain a long-short term memory deep learning prediction model of each category;

and 5: acquiring a newly-built residential district as test data, extracting an attribute set of the tested residential district according to an influence factor, and constructing a feature vector of the tested residential district;

step 6: calculating the distance between the feature vector of the tested residential district and the K final clustering centers, wherein the smallest distance is the category of the tested residential district;

and 7: and predicting by using the long-short term memory deep learning prediction model of the type of the tested residential community to obtain the predicted load of the tested community.

The pretreatment in the step 1 comprises the following specific steps:

step 1.1: assigning null values to all abnormal values and missing values in the historical load data of each residential district;

step 1.2: filling the null value to obtain the historical load data of each residential district after filling, which specifically comprises the following steps: carrying out interpolation filling on null values appearing at a certain moment in a day by adopting daily average load, if data at all the moments in the day are null values, filling the data by using the average value of the maximum loads of the previous and the next days, and if the data are null values for a plurality of consecutive days in a month, filling the data by using the average value of the maximum loads of the days of the load data existing in the month;

step 1.3: normalizing the historical load data of each residential district after filling to obtain the normalized historical load data of each residential district, wherein the formula is as follows:

wherein the content of the first and second substances,

for the historical load data, x, of the populated residential districts_minIs the minimum value, x, of the historical load data of each residential district after filling_maxMaximum value, x, of the historical load data of each residential district after filling_iAnd i is the normalized historical load data of each residential district, and is the residential district number.

The step 2 specifically comprises the following steps:

step 2.1: setting m influence factors;

step 2.2: extracting a historical load data attribute set A ═ a of each residential district after pretreatment₁，a₂，a₃，...，a_mIn which a_mIs the mth influencing factor;

step 2.3: constructing a feature vector for each residential district, denoted as x_i＝{a_1i，a_2i，a_3i，...，a_miWhere i residential cell number, a_miThe value is the value corresponding to the mth influence factor of the ith residential district;

step 2.4: constructing a residential area sample set, wherein the residential area sample set is expressed as X ═ X₁，x₂，...x_i，...，x_nAnd n is the number of samples.

The influence factors comprise: service life, volume ratio, property grade, district grade, educational resources, medical resources and green space area ratio.

The step 3 specifically comprises the following steps:

step 3.1: rewrite residential area sample set X ═ X₁，x₂，...x_i，...，x_nIs the sample space form X ═ X₁₁，x₁₂，...，x_ij...，x_nmAnd randomly allocated to initial categories of various residential cells, wherein x_ijRepresenting a numerical value corresponding to a jth influence factor of an ith residential district, wherein n is the number of samples, and m is the number of the influence factors;

step 3.2: according to the initial category, defining all clustering centers and obtaining inter-category distances and intra-category distances;

the inter-class distance formula is as follows:

wherein: d_WorkshopThe distance between the clusters is defined as the distance between the clusters,

is a cluster C_iMean of each dimension of the internal sample; b is_ijThe average value of each dimension of the whole sample is taken;

is an initial category;

the intra-class distance formula is as follows:

wherein: d_{Inner part}Is an intra-class distance;

step 3.3: determining an optimal class value according to the class spacing and the class inner spacing, wherein the optimal class is the minimum value P of the sum of the class spacing and the class inner spacing_kNamely:

min S(X，P_k)＝min(D_workshop+D_{Inner part})

Step 3.4: according to the optimal class value P_kObtaining an initial clustering center C_i＝(1，2，…，P_k)；

Step 3.5: calculating the distance from each sample to the initial clustering center, and re-dividing the type of the sample space according to the minimum distance;

step 3.6: calculating the mean value of each sample in the class after the sample space is divided again, and taking the mean value as an updated clustering center;

step 3.7: according to the updated clustering centers, the optimal class value is taken as the initial class, the steps 3.2-3.6 are repeated, the sample class is determined again until the clustering centers and the sample class value are unchanged, iteration is stopped, K, K final clustering centers and a clustered sample set are output, and the step 4 is switched to;

the long-short term memory deep learning prediction model of each category in the step 4 specifically comprises the following steps:

step 4.1: output o of last time_t-1And input X at this time_tConnecting to obtain input layer storage probability f_tAnd finally outputting a result which is a value between 0 and 1:

f_t＝σ(W_f·[o_t-1,x_t]+b_f)

wherein f is_tTo forget the gate output, W_fCoefficient of linear relation, b_fFor bias, σ is sigmoid activation function;

step 4.2: building three hidden layers i_t、

And C_t，

i_t＝σ(W_f·[o_t-1,x_t]+b_f)

Wherein i_tIn order to increment the weight for the information,

as the instantaneous state of the current time, C_tThe current time is the final state; w_cThe current forgetting gate hidden layer weight is obtained;

step 4.3: constructing an output layer, and finally outputting O_t：

h_t＝σ(W_o·[o_t-1,x_t]+b₀)

o_t＝h_t*tanh(C_t)

h_tFor the loss calculation function, tanh is the activation function, b₀For current output gate hidden layer biasing, W₀The current output gate is hidden with the layer weight.

In the step 4, the long-short term memory deep learning prediction model of each category uses a dropout technology at the hidden layer to randomly discard the hidden layer output; random inactivation (dropout) is a method for optimizing an artificial neural network with a deep structure, and in the learning process, partial weight or output of a hidden layer is randomly zeroed, so that interdependency (co-dependency) among nodes is reduced, regularization (regularization) of the neural network is realized, and the structural risk of the neural network is reduced;

the step 5 specifically comprises the following steps:

acquiring a newly-built residential district as test data, and extracting an attribute set A ═ a of the residential district to be tested according to influence factors₁,a₂,a₃,...,a_mAnd constructing a feature vector of the residential district to be tested

Wherein, a_mIs the mth influencing factor.

The step 7 specifically comprises the following steps: testing the feature vector of the residential district by using the long-short term memory deep learning prediction model of the category of the residential district to be testedAs an input x_tAnd predicting to obtain the predicted load of the test cell.

The beneficial technical effects are as follows:

(1) the invention fully considers the difference between different residential districts, and carries out category division on the residential districts by combining the influence factors when renting and selling the houses in the actual life, thereby loading the social development law;

(2) the whole immigration stage of the residential district generally increases in an S shape, and if the building is carried out according to the required transformer district capacity when the immigration rate is 100% in the initial stage, the transformer district resource waste is easily caused, the 10Kv resource is in short supply, and the like, the invention can effectively carry out reasonable planning on the transformer district building of the residential district, avoid the transformer district from being idle in a long period, and simultaneously make a known plan for the transformer district building in the later stage;

(3) the invention can effectively solve the problem of data shortage when the newly built residential district carries out load prediction, and a reasonable prediction model is established for the newly built district by using the load data of the similar district;

(4) at present, the estimation stage of the industrial expansion capacity of a residential district is generally judged mainly by human experience at the initial construction stage, the error is large, the consideration factor is incomplete, the method makes up for the problem, and the non-load can be automatically predicted according to the self characteristics of the residential district;

(5) in the LSTM (Long Short-Term Memory network) learning stage, the Dropout technology is adopted to carry out effective overfitting processing on the LSTM, and the training speed is improved, so that the model has strong generalization and stability and small overall fluctuation compared with other models.

Drawings

FIG. 1 is a flowchart of the whole cell load prediction method based on improved clustering and deep learning of long-short term memory according to an embodiment of the present invention;

FIG. 2 is a long-short term memory module structure of the LSTM network;

FIG. 3 is a neural network model using Dropout;

FIG. 4 shows the result of improved cluster partitioning of attribute sets according to an embodiment of the present invention;

FIG. 5 is a histogram comparing the degree of fit of various models according to the present invention;

fig. 6 is a diagram of the predicted effect according to the embodiment of the present invention, in which (a) is a predicted effect diagram of a first-type cell, (b) is a predicted effect diagram of a second-type cell, (c) is a predicted effect diagram of a third-type cell, and (d) is a predicted effect diagram of a fourth-type cell;

fig. 7 shows an LSTM network training error descending curve according to an embodiment of the present invention, (a) is a first-type cell LSTM network training error descending curve, (b) is a second-type cell LSTM network training error descending curve, (c) is a third-type cell LSTM network training error descending curve, and (d) is a fourth-type cell LSTM network training error descending curve.

Detailed Description

The present application is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present application is not limited thereby. A

The invention provides a cell load prediction method based on improved clustering and long-short term memory deep learning, which comprises the following steps: and according to the influence factors of the residential areas of different types, carrying out residential category division, respectively establishing corresponding prediction models for each type of residential areas to obtain a predicted load value, and realizing reasonable planning of the transformer area.

In the embodiment, the total number of residential cell samples is about 51, and the influence factor is 7, which is equivalent to 7 input feature dimensions.

As shown in fig. 1, the method for predicting cell load based on improved clustering and long-short term memory deep learning specifically includes:

the pretreatment in the step 1 comprises the following specific steps:

wherein the content of the first and second substances,

Step 2: extracting a preset historical load data attribute set of each residential district according to the set influence factor, constructing a feature vector of each residential district, and further constructing a residential district sample set;

the step 2 specifically comprises the following steps:

step 2.1: setting m influence factors;

step 2.2: extracting a historical load data attribute set A ═ a of each residential district after pretreatment₁,a₂,a₃,...,a_mIn which a_mIs the mth influencing factor;

step 2.3: constructing a feature vector for each residential district, denoted as x_i＝{a_1i,a_2i,a_3i,...,a_miWhere i residential cell number, a_miFor the m influence of the ith residential districtThe value corresponding to the factor;

step 2.4: constructing a residential area sample set, wherein the residential area sample set is expressed as X ═ X₁,x₂,...x_i,...,x_nAnd n is the number of samples.

the step 3 specifically comprises the following steps:

step 3.1: rewrite residential area sample set X ═ X₁,x₂,...x_i,...,x_nIs the sample space form X ═ X₁₁,x₁₂,...,x_ij,...,x_nmAnd randomly distributing the initial categories to all residential cells, wherein x is_ijRepresenting a numerical value corresponding to a jth influence factor of an ith residential district, wherein n is the number of samples, and m is the number of the influence factors;

the inter-class distance formula is as follows:

is an initial category;

the intra-class distance formula is as follows:

wherein: d_{Inner part}Is an intra-class distance;

min S(X,P_k)＝min(D_workshop+D_{Inner part})

Step 3.4: according to the optimal class value P_kObtaining an initial clustering center C_i＝(1,2,…,P_k)；

and clustering the residential cells according to the attribute set of the residential cells. Obtaining inter-class distance and intra-class distance according to the initial class, determining an optimal class value according to the inter-class distance and the intra-class distance, then re-obtaining a corresponding initial class center according to the optimal class value, re-determining the sample class according to the nearest distance principle, calculating each clustering mean value as an updated clustering center, and iterating the calculation process according to the new clustering center until the clustering result is unchanged. As shown in fig. 4, in the embodiment, 51 residential cells are finally classified into 4 types, the convergence condition is that the category label of each residential cell is not changed any more, and the obtained result substantially meets the requirement of subsequent prediction.

long Short-Term Memory network (LSTM), Long Short-Term Memory network, is a special RNN model, can utilize time recursion to construct an artificial neural network, realizes the learning of Long-Term dependent information by improving the problem of Long-interval information loss in the conventional RNN, and is suitable for processing and predicting important events with relatively Long intervals and delays in time sequences. Like all RNNs, LSTM also has duplicate modules, but unlike standard RNNs, LSTM has a complex duplicate module structure, and the modules interact in a special manner to form a four-level network, and the specific structure of the modules is shown in fig. 2.

As can be seen from fig. 2, LSTM differs from standard recurrent neural networks in that a single Sigmoid is used as the firing function, LSTM controls the transfer of information between neurons through a structure called "gate" that selectively passes information into cells, and there are three gates in LSTM: the cell state is controlled by different gates so as to achieve the purposes of long-term learning and information screening.

step 4.1: the forgetting gate determines to delete information, and the output o at the previous moment is output by the forgetting gate_t-1And input X at this time_tConnecting to obtain input layer storage probability f_tAnd finally outputting a result which is a value between 0 and 1:

f_t＝σ(W_f·[o_t-1,x_t]+b_f)

step 4.2: building three hidden layers i_t、

And C_t，

i_t＝σ(W_f·[o_t-1,x_t]+b_f)

Wherein i_tIn order to increment the weight for the information,

after the input information of the upper layer neuron is selected and obtained, the input information updating content is determined by using an 'input gate', and the formula i is used_tCalculating to obtain the update probability, combining the new input by the tanh layer, and calculating to generate a new combined output quantity by using a formula

Updating neuron state content using the generated forgetting probability f_tAnd input probability

Obtaining new neuron content;

step 4.3: constructing an output layer, and finally outputting O_t：

h_t＝σ(W_o·[o_t-1,x_t]+b₀)

o_t＝h_t*tanh(C_t)

Wherein h is_tTan h is an activation function; b₀For current output gate hidden layer biasing, W₀The current output gate is hidden with the layer weight.

Finally, the output content of the neuron is determined according to the output gate, and the operation is carried outObtaining output content by a sigmoid layer, obtaining output probability by utilizing a tanh function, and finally utilizing a formula O_tResulting in the final output of the LSTM.

In the step 4, the long-short term memory deep learning prediction model of each category uses a dropout technology at the hidden layer to randomly discard the hidden layer output;

in the prediction stage using the LSTM, setting the hidden layer of the LSTM to be 3 layers, and processing the hidden layer by using Dropout technology, the step size of the input layer is set to be the first 60% of the cell building time by continuously adjusting, wherein the kernel function of the LSTM is 'SIG', and the network layer of the LSTM is set to be three layers, and the number of neurons in each layer is 10, 20 and 40 in sequence.

According to the previous definition of the platform load prediction input and output, the embodiment constructs an LSTM prediction model comprising an input layer, three hidden layers and an output layer. The step size of the input layer is set to the first 60% of the cell establishment time by continuous adjustment, which mainly considers that the information contained in the too short time sequence length is not complete enough, but also aims to suppress the degradation of the model performance caused by a large number of sequence inputs. Meanwhile, the training of the whole model is accelerated by using the dropout technology in the hidden layer, and the problem of overfitting in the later period is effectively avoided. Dropout is a technique for preventing overfitting of a model, which performs equal probability inactivation on neurons by randomly discarding hidden layer outputs under the condition of keeping updated weights of output gates, avoids the model from being over-dependent on some local features, makes the model more robust, and improves the LSTM performance, and the effect is as shown in fig. 3;

and 5: acquiring a newly-built residential district as test data, extracting an attribute set of the tested residential district according to an influence factor, and constructing a feature vector of the tested residential district; the step 5 specifically comprises the following steps:

Wherein, a_mIs the mth influencing factor.

and 7: predicting by using the long-short term memory deep learning prediction model of the category of the tested residential district, and testing the feature vector of the residential district

As an input x_tAnd predicting to obtain the predicted load of the test cell.

In the examples: the invention mainly aims at the medium-term prediction of the load of the distribution area, thereby obtaining reasonable business expansion capacity and improving the utilization rate of the distribution area, so the method mainly takes months as a unit, and the prediction in years takes fewer load values, which causes lower prediction precision and larger difference between the predicted value and the actual value.

According to the acquisition interval of the load of a certain market area, 15 minutes are taken as the load value of the day, the maximum load of each day is selected as the load value of the day, and the load value of each month can be expressed as m ═ { d ═ d%₁,d₂,...,d_tAnd (c) wherein t is 28/30/31, and max (m) is selected as the load value of the month.

And selecting the previous w-year and month data of each residential area as sample input and the first month load data of the w +1 year as output according to different building time of each residential area during model training, and sequentially delaying. And when the model is verified, the cell with the existing load is taken as a newly-built cell, the trained model is directly input for prediction, and the cell is taken as historical data for sliding prediction until a target annual and monthly load value is obtained after a monthly load value is predicted.

Description of the experiment:

firstly, obtaining 51 cells vectorization representation according to attributes such as service life, volume ratio and the like of the cells, obtaining a cell attribute set according to the model of the invention, and further carrying out clustering division on the cells of different categories by utilizing an improved clustering algorithm, wherein clustering results are shown in fig. 4, and it can be seen from fig. 4 that the 51 cells are automatically clustered into four categories according to corresponding attribute values, the degree of separability is obvious, and then four prediction models are respectively established for the four categories of cells.

And dividing the cells of each category into a training set and a test set, selecting one cell load with the shortest building time from each category of cells as test data, and building an LSTM prediction model corresponding to each category of cells by using the training data as the rest. And the known type cells in the test set are taken as new cells without historical load data and are directly input into corresponding trained models to predict the maximum load value of the target year and month, meanwhile, multivariate ARMA and SVM are adopted to model the load data of each type of cells, and the results are compared and analyzed with the ICA-LSTM model result. The kernel function of the SVM is 'RBF', the kernel function of the LSTM is 'SIG', the LSTM network layer is set to be three layers, and the number of neurons in each layer is 10, 20 and 40 in sequence. In order to compare the effects of the models, the invention adopts the prediction evaluation indexes of average absolute error MAE, root mean square error RMSE and fitting degree R-Squared to evaluate the models, and the calculation formula is as follows:

the results are shown in table (1) and fig. 5:

TABLE 1 MAPE, RMSE values for the three models

It can be seen from table 1 that, in the three models, the MAPE and RMSE values of LSTM are lower than those of the other two models, fig. 5 is the fitting degree between the predicted curve and the true curve of each type of cell for the three models, and it can be seen from the figure that the predicted fitting degree of LSTM is the maximum, the fitting degree is better, and the error accuracy and the fitting degree of LSTM are relatively stable for any type of cell, which indicates that the LSTM has good generalization performance. In order to more intuitively see the prediction effect of the LSTM, the prediction values of the four types of cells and the original time load are compared through graphical analysis by the three models, and the time load is shown in FIG. 6

It is seen from fig. 5 that the prediction effect of LSTM in four types of cells is significantly better than that of the other two models, and is most similar to the real variation curve, while the fitting degree effect of multivariate ARMA is the worst, the prediction effect is poor, and the model adaptation degree is also poor. Fig. 7 is a descending curve of training loss of four prediction models of LSTM, the horizontal axis is the number of times of cyclic training, and the vertical axis is the loss of each training function, so that it can be seen that no matter which type of cells the function loss of LSTM has been reduced to 0.01 when the number of times of training is about 50, the decline of the whole training process is very stable, no large-scale oscillation occurs, and the convergence effect is excellent. This further illustrates the good performance of LSTM in the face of longer period predictions.

And (4) conclusion:

the invention starts from the self attribute of the load prediction sample object, establishes the LSTM prediction model of the corresponding type aiming at different types of residential districts, and combines various self characteristics and internal and external factors of the residential districts with the prediction model except considering the time development factor, thereby avoiding the singleness of the prediction model and ensuring higher applicability. The model verification is carried out by using the real load data of the residential district in a certain city, and the result shows that the model can effectively improve the medium-term and long-term prediction precision of the power load of the district, has good popularization capability and higher practical significance, and provides an important theoretical basis for the construction of the power distribution network in the transformer district of the residential district. Therefore, a future load characteristic development rule of the newly-built cell is obtained, and guidance is provided for later-stage platform area construction planning.

The present applicant has described and illustrated embodiments of the present invention in detail with reference to the accompanying drawings, but it should be understood by those skilled in the art that the above embodiments are merely preferred embodiments of the present invention, and the detailed description is only for the purpose of helping the reader to better understand the spirit of the present invention, and not for limiting the scope of the present invention, and on the contrary, any improvement or modification made based on the spirit of the present invention should fall within the scope of the present invention.

Claims

1. A cell load prediction method based on improved clustering and long-short term memory deep learning is characterized in that:

and according to the influence factors of the residential areas of different types, carrying out residential category division, respectively establishing corresponding prediction models for each type of residential areas to obtain a predicted load value, and realizing reasonable planning of the transformer area.

2. The cell load prediction method based on improved clustering and long-short term memory deep learning according to claim 1, wherein the load prediction method comprises the following steps:

3. The method for predicting cell load based on improved clustering and long-short term memory deep learning according to claim 2, wherein the preprocessing in the step 1 comprises:

step 1.2: filling the null value to obtain the historical load data of each residential district after filling;

wherein the content of the first and second substances,

4. The method for predicting cell load based on improved clustering and deep learning of long-short term memory according to claim 3, wherein in step 1.2, the filling of null values is specifically: and (3) carrying out interpolation filling on null values appearing at a certain moment in a day by adopting a daily average load, if the data of all the moments in the day are null values, carrying out filling by using the average value of the maximum loads of the previous and the next days, and if the data of the continuous days in the month are null values, carrying out filling by using the average value of the maximum loads of the days in which the load data exist in the month.

5. The method for predicting cell load based on improved clustering and long-short term memory deep learning according to claim 2, wherein the step 2 is specifically as follows:

step 2.1: setting m influence factors;

step 2.2: extracting the preprocessed historical load data attribute set A ═ { a ═ a of each residential district₁,a₂,a₃,...,a_mIn which a_mIs the mth influencing factor;

step 2.3: constructing a feature vector for each residential district, denoted as x_i＝{a_1i,a_2i,a_3i,...,a_miWhere i residential cell number, a_miThe value is the value corresponding to the mth influence factor of the ith residential district;

6. The method for predicting cell load based on improved clustering and deep learning of long-short term memory according to claim 2 or claim 5, wherein the influencing factors comprise: service life, volume ratio, property grade, district grade, educational resources, medical resources and green space area ratio.

7. The method for predicting cell load based on improved clustering and long-short term memory deep learning according to claim 2, wherein the step 3 is specifically as follows:

the inter-class distance formula is as follows:

as the initial clustering center C_iMean of each dimension of the internal sample; b is_ijThe average value of each dimension of the whole sample is taken;

is an initial category;

the intra-class distance formula is as follows:

wherein: d_{Inner part}Is an intra-class distance;

min S(X,P_k)＝min(D_workshop+D_{Inner part})

Step 3.4: according to the optimal class value P_kObtaining an initial clustering center C_i＝(1,2,...,P_k)；

step 3.7: and (4) according to the updated clustering centers, taking the optimal class value as the initial class, repeating the steps 3.2-3.6, re-determining the class of the sample until the clustering centers and the class value of the sample are unchanged, stopping iteration, outputting the final clustering centers and the clustered sample set of which the classification number of the final residential area samples is K, K, and turning to the step 4.

8. The method for predicting cell load based on improved clustering and long-short term memory deep learning according to claim 1, wherein the long-short term memory deep learning prediction model of each category in the step 4 is specifically:

f_t＝σ(W_f·[o_t-1,x_t]+b_f)

step 4.2: according to the input layer, constructing a hidden layer:

i_t＝σ(W_f·[o_t-1,x_t]+b_f)

wherein i_tIn order to increment the weight for the information,

step 4.3: constructing an output layer with a final output of o_t：

h_t＝σ(W_o·[o_t-1,x_t]+b₀)

o_t＝h_t*tanh(C_t)

Wherein h is_tFor the loss calculation function, tanh is the activation function, b₀For current output gate hidden layer biasing, W₀The current output gate is hidden with the layer weight.

9. The method as claimed in claim 4, wherein the prediction model of long-short term memory deep learning in step 4 uses dropout technique in its hidden layer to discard the hidden layer output randomly.