CN114862032B

CN114862032B - XGBoost-LSTM-based power grid load prediction method and device

Info

Publication number: CN114862032B
Application number: CN202210544195.0A
Authority: CN
Inventors: 王栋; 汤向华; 吴迪; 侯丽钢; 江志辉; 丁雨婷; 罗辛; 毛艳芳; 范乐松; 沈鑫; 王�华; 吕晓祥
Original assignee: Nantong Power Supply Co Of State Grid Jiangsu Electric Power Co
Current assignee: Nantong Power Supply Co Of State Grid Jiangsu Electric Power Co
Priority date: 2022-05-19
Filing date: 2022-05-19
Publication date: 2023-07-28
Anticipated expiration: 2042-05-19
Also published as: CN114862032A

Abstract

The invention discloses a grid load prediction method and device based on XGBoost-LSTM, which are used for cleaning an initial data set and a historical load which are collected, and providing noise data so that the data is more effective and simpler; sorting the preprocessed feature variables by using principal component analysis, screening out feature sets which obviously influence the load, and improving the accuracy and efficiency of a prediction model; the XGBoost prediction seasonal component, the random component and the LSTM prediction trend component are introduced, so that the established input quantity comprehensively characterizes the influence of complex factors on the power load, and the accuracy, the speed and the generalization capability of the power load prediction are obviously improved.

Description

XGBoost-LSTM-based power grid load prediction method and device

Technical Field

The invention relates to the technical field of artificial intelligence prediction, in particular to a power grid load prediction method and device based on XGBoost-LSTM.

Background

Power load prediction is an important part of power management, and the load prediction data provided by it is extremely important for control, operation and planning of the power system. The method has the advantages that the power load data are accurately predicted, and the method plays an important role in determining the running mode of the power system, and also plays an important role in determining the optimal scheduling of the power system, the inter-region power transmission scheme and the load scheduling scheme. In addition, the accuracy of power load prediction directly affects the safety, reliability, economy and power quality of the operation of the power system, and relates to the production planning and scheduling operation of the power system. Also in the last decade, the increased market competition, the aging of infrastructure and the integration of renewable energy requirements have made load forecasting more important and difficult. Extensive and intensive studies on power system load prediction theory have been conducted by domestic scholars for a long time, and many effective methods have been proposed, such as regression analysis, time series method, neural network method, wavelet analysis method, and the like. For a certain prediction problem, multiple prediction methods may be established. Different prediction methods provide different prediction information and different prediction accuracy.

The above prediction method does not consider the fluctuation of the power grid load, which is generated by a plurality of driving factors, such as different weather, seasonal conditions, holidays, working period, fluctuation of economic properties and the like, and the data generated by the factors has a fine time mode, so that the prediction accuracy and the robustness of the power grid load data are poor. Therefore, modeling is performed on the change rule of the power grid load and the influence factors thereof, and high-precision power grid load prediction is a key technical problem to be solved in power management.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a power grid load prediction method and device based on XGBoost-LSTM, so as to solve the problems.

In order to achieve the above object, the present invention is achieved by the following technical scheme.

The XGBoost-LSTM-based power grid load prediction method is characterized by comprising the following steps of:

s1, acquiring original power grid load data, cleaning the data and normalizing the sequence;

s2, researching an original time sequence by using Census X12, and extracting a trend circulation sequence, a seasonal sequence and an irregular sequence;

s3, carrying out principal component analysis on meteorological factors (temperature, humidity and air pressure), economic factors (first industry, second industry, third industry, residents and the like) and other random factors, and extracting strongly related factors;

s4, uniformly constructing an objective function of the prediction model;

s5, predicting a trend sequence of the predicted month load by using a load trend curve before the predicted month and a load influence factor of the current month based on an LSTM network;

s6, predicting a seasonal sequence and an irregular sequence of the predicted month load by using main influence factors of the current month based on the XGBoost model;

s7, multiplying the predicted trend sequence, the seasonal sequence and the irregular sequence to output a current month power grid load result;

s8, combining meteorological factors and economic factors or emergency events, and correcting errors of the irregular sequence.

Further, the step S1 of performing data cleansing on the original data includes the following steps: processing and replacing the identified abnormal value by using a horizontal processing method; performing linear transformation on the numerical data by using a min-max standardization method to transform the numerical data into a [0,1] interval; and thermally encoding category type data such as holidays, months and the like.

Further, the non-stationary power grid load time sequence in step S2 is affected by the uncertainty factors such as economic development and industrial structure adjustment, seasonality, etc. of the region, and can be decomposed into trend factors TC _t The method comprises the steps of carrying out a first treatment on the surface of the Season element S _t Reflecting the periodic variation of regional power load time sequence in the same season of different months due to the influence of regional climate and the like, irregular element I _t The regional power load quarter or month time sequence is affected by regional abnormal events, natural disasters and the like to present random changes, noise and the like, and the changes are irregular and circulated. The method adopts multiplication season adjustment method, namely Y _t ＝TC _t S _t I _t And (5) performing element decomposition of the load sequence.

Further, the weather influencing factors selected in step S3 include air pressure, temperature, and humidity, and the economic influencing factors include first industry, second industry, third industry, town domestic electricity, country domestic electricity, agriculture, forestry, animal husbandry, fishery, industry, manufacturing industry, construction industry, computer information industry, real estate, business, public utility, and management, accommodation industry. Other factors include (days of holidays of the month, months of the year), principal component analysis was performed on these 21 factors, and the principal component analysis was calculated as follows: and (3) carrying out standardization processing on the original data matrix to ensure that the indexes are comparable. The calculation formula is as follows:

wherein X represents an original matrix, X ^* Representing the normalized matrix, X _max ＝max(X),X _min ＝min(X)。

And establishing a correlation coefficient matrix R. Finding the characteristic root of RCorresponding unit feature vectorThe number of principal components is determined. Taking the characteristic value of the accumulated contribution rate reaching more than 60 +.>The 1 st, 2 nd, correspondingm (m is less than or equal to p) th main component. The expression of the main components is:

wherein: e, e _ip The p-dimensional eigenvector corresponding to the ith eigenvalue of the original power load matrix;is the variable of the initial input of the p dimension.

Further, in step S4, the prediction effect is generally evaluated by using the mean absolute percentage error MAPE, and a smaller MAPE value indicates a higher prediction accuracy of the model.

Where T is the predicted sample period, t=t+1, t+2.

Further, the specific input process of constructing the trend sequence prediction model in step S5 is as follows:

s5-1, dividing a data set, wherein the power load time series data set is divided into two parts: 80% of the data set is used as a training set for training a model; the remaining 20% of the dataset served as the validation set.

S5-2.LSTM network input, wherein the input of the LSTM prediction model of the multivariate time sequence is a plurality of time sequences X; let LSTM of the multivariate time series predict the input in that as:

in the formula, the first N time sequences are main components with high accumulated contribution rate selected in the step S3, and L is the length of the time sequences.

Wherein X is _j ^[1] The development of (2) is:

and so on; and then->The expansion substitution of (c) is obtained:

when the time series is input into the LSTM prediction model, the data of the first n times of the current time are input, and then the input x of the LSTM prediction model at the time t is input _t The method comprises the following steps:

wherein x is _t Is a time series; t E [ n+1, L]And outputting a predicted value at the time t.

S5-3.LSTM model structure, LSTM prediction model includes the dry Golgi unit; each LSTM cell has an input layer, a hidden layer, and an output layer; the inside of the hidden layer is also provided with a door structure consisting of a forgetting door, an input door and an output door; the input of each LSTM unit is input x at time t respectively _t LSTM cell state C at time t-1 _t-1 Hidden layer state h at time t-1 _t-1 LSTM unit state C with output of t time _t And hidden layer state h at time t _t ；

The forget gate is used for calculating the state C of the LSTM unit at the time t-1 _t-1 The degree of forgetting at time t;

f _t ＝σ(W _f ·[h _t-1 ,x _t ]+b _f )

wherein f _t LSTM cell state C representing time t-1 _t-1 The probability value retained at time t, σ represents the Sigmoid function, [ h ] _t-1 ,x _t ]Indicating that will be h _t-1 And x _t Connected into a vector; w (W) _t And b _f The weight and the bias of the forgetting gate are respectively represented and obtained through training;

the input gate is used for calculating the state of the LSTM unit in the middle of the time tThe degree of refresh to memory cell:

i _t ＝σ(W _i ·[h _t-1 ,x _t ]+b _i )

wherein i is _t Indicating the state of the LSTM cell in the middle of time tProbability value, W, retained at time t _t And b _f Respectively representing the weight and the bias of the input gate, and obtaining the weight and the bias by training;

the output gate is used for calculating the output of the time t, and depends on the degree of the memory unit at the time t:

o _t ＝σ(W _o ·[h _t-1 ,x _t ]+b _o )

in the formula, o _t Probability value, W representing screening of all LSTM cell states C _o And b _o Respectively representing the weight and the paranoid of the output gate, and obtaining the weight and the paranoid by training;

the output of the LSTM unit at the moment t is the hidden layer state h _t And cell state C _t ；

In which W is _c And b _c Respectively representing the weights and the biases of the middle LSTM units, and obtaining the weights and the biases by training; * For the Hadamard product, tanh is the hyperbolic tangent activation function.

Further, in step S6, by constructing an XGBoost prediction model, the extracted important feature vectors (temperature, holiday, month) are used as input to perform prediction training, and finally load values of the seasonal component and the random component are output.

S6, the XGBoost model algorithm which is established and considers complex influence factors is based on the following principle:

the CART tree is used as a basic learner, the model is trained by constructing a plurality of weak learners and reducing loss along the steepest gradient method in each iterative learning, the residual errors of all the weak learners in the front are corrected in an effort, and finally, the final prediction result is obtained by weighted summation, namely:

assuming a given sample set has n samples and m features, then:

D＝{x _i ,y _i }(|D|＝n,x _i ∈R ^m ,y _i ∈R)

wherein x is _i Representing i samples, y _i The space F representing the i-th class label, CART tree, is:

F＝{f(x)＝ω _q(x) }(q:R ^m →T,ω∈R ^T )

where q represents the structure of each tree, which maps the samples to the corresponding leaf nodes; t is the structure q and leaf node weight ω of the corresponding tree. The predicted value of XGBoost is the sum of the values of the corresponding leaf nodes of each tree. The following objective functions are minimized:

wherein,,representing the predicted value, y, of the model _i Class label representing the i-th sample, f _k Representing a kth tree model, T representing the number of leaf nodes of each tree, ω representing a set of scores of the leaf nodes of each tree, γ and λ representing coefficients, and tuning is required in practical applications.

The first term of the above equation is a loss error, the second term is a regular term, the complexity of the tree is controlled, and overfitting is prevented. The optimization parameter of the objective function is a model (function), which cannot be optimized in European space by a traditional optimization method, but the model can be understood as an addition mode when training, so f (t) is added to the model at the t-th round, and the objective function is minimized as follows:

---

during training, a new f function is added for a new round to maximally reduce the objective function, and the objective function is as follows:

wherein,,

after solving, converting the iteration about the tree model into the iteration about the leaf nodes of the tree, and solving the optimal leaf node score, wherein the final objective function is as follows:

further, in step S8, since some uncertain events occur every month, which may affect the load of the power grid, the magnitude of the random component may be adjusted according to the events, so as to improve the accuracy and generalization capability of model prediction.

The invention also provides a power grid load prediction device based on XGBoost-LSTM, as shown in figure 2, which comprises a data acquisition module, a data storage module, a preprocessing module and an execution module, wherein the data acquisition module is connected with the data storage module, the data acquisition module is used for acquiring original power grid load data and transmitting the acquired original power grid load data to the storage module for storage, the data storage module is connected with the preprocessing module, the preprocessing module is used for preprocessing a data set and decomposing a load data sequence, the preprocessing module is connected with the execution module, the execution module is used for executing a load prediction instruction, the predicted data is stored in the data storage module, the data acquisition module is used for preprocessing the original load data of a power grid, the execution module is used for executing a power grid load prediction instruction, and the data storage module is used for storing the original load data acquired by the data acquisition module and the load data predicted by the execution module model. The device can directly act on any newly acquired power grid load data set, and can predict the power load by considering different complex factors according to the load data sets with different sizes, so that the problem of high-precision load prediction in a power grid environment is solved.

Further, the data storage module comprises a load data storage unit and a prediction data storage unit, and the load data storage unit is connected with the data acquisition module and used for storing the power grid load data of the data acquisition module. The predicted data storage unit is connected with the execution module and is used for storing predicted load data.

Further, the preprocessing module comprises a sequence decomposition and data preprocessing unit, and the sequence decomposition unit is used for decomposing the load time sequence into a trend component, a random component and a season component.

Further, the data preprocessing unit is used for preprocessing the received power grid load data set, including normalization, outlier processing and training set and verification set division of the data set.

Further, the execution module comprises a model 1 prediction unit, a model 2 prediction unit and a prediction data output unit.

The model 1 prediction unit comprises the establishment of an error function, the analysis of important features of a main component, the establishment of an LSTM model and the training of the LSTM model.

The model 2 prediction unit comprises the establishment of an error function, the feature screening, the establishment of an XGboost model and the training of an XGBoost model.

The output prediction data unit comprises multiplying the data predicted by the model 1 and the model 2, correcting errors and outputting the predicted data to be stored in the data storage module.

Compared with the prior art, the method has the beneficial effects that: the XGBoost-LSTM-based power grid load prediction method and device are used for cleaning an initial data set and a historical load which are collected, and providing noise data so that the data are more effective and simpler;

the content of the research is too complex due to the consideration of too many influencing factors, so that the research process is not smooth, and in order to solve the problem of complex calculation due to the consideration of too many influencing factors, the preprocessed characteristic variables are sequenced by utilizing principal component analysis, the characteristic set which obviously influences the load is screened out, the input dimension of a model is simplified, and the calculation process is simplified;

because the power grid load is influenced by various nonlinear factors and uncertain factors, the regional statistical load as a time sequence has obvious non-stationary characteristics such as trend, circularity and seasonality, so that high-precision load prediction is difficult, and XGBoost prediction seasonal components, random components and LSTM prediction trend components are introduced, so that the defect that a plurality of sample trend values at the beginning and the end cannot be estimated by a traditional combined model is overcome, and the method has greater advantages for a sequence with the predicted property changing along with time; the established input quantity comprehensively characterizes the influence of complex factors on the power load, and improves the precision and generalization capability of power load prediction.

Drawings

FIG. 1 is a flow chart of a method of XGBoost-LSTM based grid load prediction method of the present invention;

FIG. 2 is a schematic structural diagram of the XGBoost-LSTM based power grid load prediction device of the present invention;

FIG. 3 shows the method of the present invention and LSTM univariate time series prediction model, census12-Autoregressive Integrated Moving Average (X12-ARIMA) model, and exponential smoothing (ETS) model with inputs of only monthly power load time series.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments.

As shown in fig. 1, the power grid load prediction method based on XGBoost-LSTM of the present invention includes the following steps:

a power grid load prediction method based on XGBoost-LSTM comprises the steps of collecting a moon electric load data set and a relevant weather economic data set of the south China sea gate and predicting training instructions, detecting abnormal values of collected moon electric load data, standardizing, and analyzing principal components of collected weather economic factors affecting power grid load change. Because the monthly power load has a periodic characteristic, the monthly power load is decomposed into trend, season and irregular sequences. Predicting seasons and irregular sequences by using XGBoost model and principal component analysis extracted features, and predicting trend sequences by using LSTM model and dimension reduction features. And finally, aggregating the predicted seasonal sequence, the irregular sequence and the trend sequence according to a multiplication method to obtain a load prediction result. And simultaneously, correcting the prediction result by combining the emergency and the situation of the current month.

S1, cleaning data of the original data comprises the following steps: processing and replacing the identified abnormal value by using a horizontal processing method; performing linear transformation on the numerical data by using a min-max standardization method to transform the numerical data into a [0,1] interval; and thermally encoding category type data such as holidays, months and the like.

S2, researching an original monthly load time sequence by using Census X12, and extracting a trend circulation sequence, a seasonal sequence and an irregular sequence; the non-stable power grid load time sequence is influenced by economic development and industrial structure adjustment of the area and seasonal uncertainty factors, and can be decomposed into trend factors TC _t The method comprises the steps of carrying out a first treatment on the surface of the Season element S _t Reflecting the periodic variation of the regional power load time sequence, which is affected by the regional climate and is presented in the same season of different months, irregular element I _t The regional power load quarter or month time sequence is randomly changed or noise which is influenced by abnormal events and natural disasters of the regional, and the change is irregular and can be circulated. The method adopts multiplication season adjustment method, namely Y _t ＝TC _t *S _t *I _t And (5) performing element decomposition of the load sequence.

S3, carrying out principal component analysis on meteorological factors (temperature, humidity and air pressure), economic factors (first industry, second industry, third industry, residents and the like) and other random factors of each month in Nantong, and extracting strongly related factors; and (3) carrying out standardization processing on the original data matrix to ensure that the indexes are comparable. The calculation formula is as follows:

Establishing a correlation coefficient matrix R, and solving characteristic roots of RAnd its corresponding unit feature vector ++>The number of principal components is determined. Taking the characteristic value of the accumulated contribution rate reaching more than 60 +.>The 1 st, 2 nd, correspondingm (m is less than or equal to p) th main component.

The principal component expression is

S4, an objective function unified structure of the prediction model is adopted, the prediction effect is generally evaluated by MAPE (Mean Absolute Percentage Error, average absolute percentage error), and the smaller the MAPE value is, the higher the prediction accuracy of the model is.

Where T is the predicted sample period, t=t+1, t+2.

S5, predicting a trend sequence of the predicted month load by using a load trend curve before the predicted month and a load influence factor of the current month based on an LSTM (Long Short-term memory) network;

as a preferable mode of the present embodiment, step 5 includes the steps of:

s5-1, dividing a data set, wherein the monthly power load time sequence data set is divided into two parts: 80% of the data set is used as a training set for training a model; the remaining 20% of the dataset served as the validation set.

S5-2.LSTM model input, wherein the LSTM prediction model input of the multivariate time series is a plurality of time series X; let LSTM of the multivariate time series predict the input in that as:

Wherein X is _j ^[1] The development of (2) is:

and so on; and then->The expansion substitution of (c) is obtained:

f _t ＝σ(W _f ·[h _t-1 ,x _t ]+b _f )

i _t ＝σ(W _i ·[h _t-1 ,x _t ]+b _i )

o _t ＝σ(W _o ·[h _t-1 ,x _t ]+b _o )

S6, constructing an XGBoost (a general gradient lifting decision tree algorithm) prediction model, taking the extracted important feature vectors (temperature, holidays and months) as input to perform prediction training, and finally outputting load values of seasonal components and random components.

The XGBoost model algorithm which is established in the S6 and considers complex influence factors is based on the following principle:

establishing a base learner, adopting a CART tree (decision tree) as the base learner, training a model by constructing a plurality of weak learners and reducing loss along the steepest gradient method in each iterative learning, striving to correct residual errors of all the weak learners before, and finally obtaining a final prediction result by weighted summation, namely:

assuming a given sample set has n samples and m features, then:

D＝{x _i ,y _i }(|D|＝n,x _i ∈R ^m ,y _i ∈R)

F＝{f(x)＝ω _q(x) }(q:R ^m →T,ω∈R ^T )

---

wherein,,

s8, combining the emergency of weather and economic factors in the Nantong city to correct errors of the irregular sequence. Since some uncertain events occur in each month and may affect the load of the power grid, the size of the random component can be adjusted according to the events, so that the accuracy and generalization capability of model prediction can be improved.

As can be seen in connection with the drawing shown in fig. 2, the following test data can be derived as in the following application scenario:

application scene: the south-going city gate electric company issues the total power supply data predicted for 12 months in 2021.

1. Acquiring power grid load data of all the south China city gates acquired in real time in a server, and acquiring data of temperature, humidity, air pressure, rainfall, first industry, second industry, third industry, resident electricity consumption and the like of each month:

(2013-01-01,1,9,3.798870056,29283,…,2013

2013-02-01,2,11,5.453960396,17672,…,2013

…

2021-11-01,11,8,12.703347280334725,43673,…,2021)

2. data preprocessing is carried out on the collected data:

(-1.562e+00,-4.542e+-01,-1.564e+-02,…,-1.546e+00

…

1.772e-01,-9.850e-02,-1.046e+00,…,-1.523e+00)

3. inputting the acquired data into an XGBoost model, and outputting a 12 month seasonal component and an irregular component result:

(0.985003008，1.07103526)

4. inputting the acquired data into the LSTM model and outputting a result of 12 months trend components:

(45372.00819)

5. multiplying the prediction results of the three components, and obtaining a prediction error according to a MAPE calculation formula:

prediction result: 48108.99357 prediction error 0.14%

The invention also provides an XGBoost-LSTM-based power grid load prediction device, which comprises a data acquisition module, a data storage module, a preprocessing module and an execution module, wherein the data acquisition module is connected with the data storage module, the data acquisition module is used for acquiring original power grid load data and transmitting the acquired original power grid load data to the storage module for storage, the data storage module is connected with the preprocessing module, the preprocessing module is used for preprocessing a data set and decomposing a load data sequence, the preprocessing module is connected with the execution module, and the execution module is used for executing a load prediction instruction and storing the predicted data into the data storage module.

The method comprises the steps of obtaining original load data of a power grid of a south China sea gate through a data receiving module, preprocessing the original load data through a preprocessing module, decomposing a load sequence, executing an instruction for predicting the load of the power grid through an executing module, and storing the original load data obtained through the data receiving module and the load data predicted by the executing module model through a data storage block.

In this embodiment, the data storage module includes a load data storage unit and a prediction data storage unit, where the load data storage unit is connected to the data acquisition module and is used to store the grid load data of the data acquisition module. The predicted data storage unit is connected with the execution module and is used for storing predicted load data.

The preprocessing module in this embodiment includes a sequence decomposition and data preprocessing unit, and the sequence decomposition unit performs decomposition of the load time sequence into a trend component, a random component, and a season component.

The data preprocessing unit is used for preprocessing a received moon load data set of the Hakka electric network in the south China, and comprises normalization, outlier processing and training set and verification set division of the data set.

As a preferable mode of the present embodiment, the execution module includes a model 1 prediction unit, a model 2 prediction unit, and a prediction data output unit.

Table 3 shows the method and LSTM univariate time series prediction model, census12-Autoregressive Integrated Moving Average (X12-ARIMA) model and exponential smoothing (ETS) model with input values of only monthly power load time series, the test set is the data of the monthly power load of the Haifeng city of 12 months in 2021, and the smaller the comparison index adopts MAPE value, the higher the prediction precision of the model is represented. After the embodiment of the invention is applied, MAPE of the prediction model on the load data set is obviously reduced, and the effect is better in the aspect of prediction precision.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments in accordance with the present application. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.

It should be noted that the terms "first," "second," and the like in the description and claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the present application described herein may be implemented in sequences other than those illustrated or described herein.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A power grid load prediction method based on XGBoost-LSTM is characterized in that: the method comprises the following steps:

s3, carrying out principal component analysis on meteorological factors, economic factors and other random factors, and extracting strongly related factors, wherein the meteorological factors comprise temperature, humidity and air pressure, and the economic factors comprise a first industry, a second industry, a third industry and residents;

s4, uniformly constructing an objective function of the prediction model;

s8, combining meteorological factors and economic factors or emergency events, carrying out error correction on the irregular sequence, wherein the step S1 of carrying out data cleaning on the original data comprises the following contents: processing and replacing the identified abnormal value by using a horizontal processing method; linear transformation of numerical data using min-max normalization method to transform its value to [0,1]]The interval is within; the non-steady power grid load time sequence in the step S2 is influenced by uncertain factors such as economic development, industrial structure adjustment, seasonality and the like of the area and can be decomposed into trend factors TC _t The method comprises the steps of carrying out a first treatment on the surface of the Season element S _t Reflecting the periodic variation of regional power load time sequence in the same season of different months due to the influence of regional climate and the like, irregular element I _t The regional power load quarter or month time sequence is randomly changed or noise and the like which are presented by the influence of regional abnormal events, natural disasters and the like, the change is irregular and can be circulated, and a multiplication season adjustment method, namely Y is adopted _t ＝TC _t S _t I _t Performing element decomposition of a load sequence, wherein the weather influencing factors selected in the step S3 comprise air pressure, temperature and humidity, and the economic influencing factors comprise a first industry and a second industryIndustry, third industry, town domestic electricity, country domestic electricity, agriculture, forestry, animal husbandry, fishery, industry, manufacturing industry, construction industry, computer information industry, real estate and business, public utilities and management, accommodation industry, other factors including holiday days of the month and month of the year, and main component analysis is performed on the 21 factors, wherein the calculation process of the main component analysis is as follows: the original data matrix is standardized, so that the indexes are comparable, and the calculation formula is as follows:

wherein X represents an original matrix, X ^* Representing the normalized matrix, X _max ＝max(X),X _min ＝min(X)，

Establishing a correlation coefficient matrix R, and solving a characteristic root lambda of R ₁ ≥λ ₂ ≥…≥λ _p > 0 and corresponding unit feature vector e ₁ ,e ₂ ,…,e _p Determining the number of main components, and taking the characteristic value lambda with the accumulated contribution rate reaching more than 60% ₁ ,λ ₂ ,…,λ _m The corresponding 1 st, 2 nd, … th and m th main components, m is less than or equal to p, and the expression of the main components is that

Wherein: e, e _ip The p-dimensional eigenvector corresponding to the ith eigenvalue of the original power load matrix; [ x ] ₁ x ₂ …x _p ] ^T Is the variable of the initial input of the p dimension, in the step S4, the prediction effect is evaluated by using the mean absolute percentage error MAPE, the smaller the MAPE value is, the higher the prediction accuracy of the model is,

wherein T is a predicted sample period, t=T+1, T+2, …, T+k, in the step S5, S5-1. Data set division, S5-2.LSTM model input, S5-3.LSTM model structure,

s5-1, dividing a data set, wherein the power load time series data set is divided into two parts: 80% of the data set is used as a training set for training a model; the remaining 20% of the dataset was used as the validation set,

x＝[X _j ^[1] ,X _j ^[2] ,X _j ^[3] …,X _j ^[N+1] ],j＝1,2,…,L

wherein the first N time series are main components with high accumulated contribution rate selected in S3, L is the length of the time series,

wherein X is _j ^[1] The development of (2) is:

X _j ^[1] ＝[X ₁ ^[1] ,X ₂ ^[1] ,X ₃ ^[1] …,X _L ^[1] ] ^T

X _j ^[2] ,X _j ^[3] …,X _j ^[N-1] and so on; and then X is taken _j ^[2] ,X _j ^[3] …,X _j ^[N-1] The expansion substitution of (c) is obtained:

wherein x is _t Is a time series; t E [ n+1, L]The predicted value at the time t is output,

s5-3.LSTM model structure, LSTM prediction model includes the dry Golgi unit; each LSTM cell has an input layer, a hidden layer, and an output layer; the inside of the hidden layer is also provided with a door structure consisting of a forgetting door, an input door and an output door; the input of each LSTM unit is input x at time t respectively _t LSTM cell state C at time t-1 _t-1 Hidden layer state h at time t-1 _t-1 LSTM unit state C with output of t time _t And hidden layer state h at time t _t The method comprises the steps of carrying out a first treatment on the surface of the The forget gate is used for calculating the state C of the LSTM unit at the time t-1 _t-1 The degree of forgetting at time t;

f _t ＝σ(W _f ·[h _t-1 ,x _t ]+b _f )

i _t ＝σ(W _i ·[h _t-1 ,x _t ]+b _i )

o _t ＝σ(W _o ·[h _t-1 ,x _t ]+b _o )

In which W is _c And b _c Respectively representing the weights and the biases of the middle LSTM units, and obtaining the weights and the biases by training; * For Hadamard product, tanh is hyperbolic tangent activation function, in step S6, the extracted important feature vector is obtained by constructing XGBoost prediction model: and the temperature, holidays and months are used as inputs for predictive training, and finally load values of seasonal components and random components are output, and the principle of the XGBoost model algorithm which is established in the S6 and considers complex influence factors is as follows:

assuming a given sample set has n samples and m features, then:

D＝{x _i ,y _i }(|D|＝n,x _i ∈R ^m ,y _i ∈R)

F＝{f(x)＝ω _q(x) }(q:R ^m →T,ω∈R ^T )

where q represents the structure of each tree, which maps the samples to the corresponding leaf nodes; t is the structure q and leaf node weight ω of the corresponding tree, so the predicted value of XGBoost is the sum of the values of the corresponding leaf nodes of each tree, minimizing the following objective function:

wherein,,representing the predicted value, y, of the model _i Class label representing the i-th sample, f _k Representing a kth tree model, T representing the number of leaf nodes of each tree, ω representing a set of fractions of leaf nodes of each tree, γ and λ representing coefficients, in practical application requiring tuning, the first term of the above equation being a loss error, the second term being a regularization term, controlling complexity of the tree, preventing overfitting, the optimization parameter of the objective function being a model (function) which cannot be optimized in the european space by the conventional optimization method, but which can be understood as an addition way when training, so that at the T-th round f (T) is added to the model, minimizing the objective function, the procedure is as follows:

---

wherein,,

2. the XGBoost-LSTM based grid load prediction method of claim 1, wherein: in the step S8, since some uncertain events occur every month, which may affect the load of the power grid, the magnitude of the random component may be adjusted according to the events, so as to improve the accuracy and generalization capability of model prediction.

3. An XGBoost-LSTM based power grid load prediction apparatus, performing an XGBoost-LSTM based power grid load prediction method according to any one of claims 1-2, characterized by: the system comprises a data acquisition module, a data storage module, a preprocessing module and an execution module, wherein the data acquisition module is connected with the data storage module, the data acquisition module is used for acquiring original power grid load data and transmitting the acquired original power grid load data to the storage module for storage, the data storage module is connected with the preprocessing module, the preprocessing module is used for preprocessing a data set and decomposing a load data sequence, the preprocessing module is connected with the execution module, the execution module is used for executing a load prediction instruction, and the predicted data is stored in the data storage module.

4. A XGBoost-LSTM based grid load prediction device according to claim 3, characterized in that: the data storage module comprises a prediction data storage unit and a load data storage unit, the prediction data storage unit is connected with the execution module and used for storing predicted data, the load data storage unit is connected with the data acquisition module, the power grid load data transmitted by the data acquisition module is stored, the data receiving module is used for obtaining the original load data of the power grid, the preprocessing module is used for preprocessing the original load data and decomposing a load sequence, the execution module is used for executing an instruction for predicting the power grid load, and the data storage module is used for storing the original power grid load data obtained by the data receiving module and the load data predicted by the execution module model.