CN113762387B

CN113762387B - Multi-element load prediction method for data center station based on hybrid model prediction

Info

Publication number: CN113762387B
Application number: CN202111048836.5A
Authority: CN
Inventors: 李华; 丁吉; 杨东升; 张化光; 周博文; 李广地; 金硕巍; 罗艳红; 王迎春; 闫士杰; 杨波; 陈乐�
Original assignee: 东北大学
Priority date: 2021-09-08
Filing date: 2021-09-08
Publication date: 2024-02-02
Anticipated expiration: 2041-09-08
Also published as: CN113762387A

Abstract

The invention provides a data center station multi-element load prediction method based on hybrid model prediction, and relates to the technical field of automatic control. According to the invention, the multi-element data of the data center station are divided into three scenes of spring, autumn and winter, multi-element load prediction is carried out on the data in each scene, the GRA method is adopted to carry out characteristic analysis and normalization on the multi-element load data, the processed data are input into the QPSO-BP neural network to be predicted, the QPSO-BP neural network and the XGBoost model are adopted to carry out parallel prediction in the aspect of a prediction algorithm, the deep learning and the machine learning technology are simultaneously applied to load prediction, the two integrated learning modes are effectively combined, the advantages of the two models are fully exerted, and the model with higher stability and generalization capability is facilitated. The mixed prediction model can actively enrich the characteristics of input data with single dimensionality, avoid the influence of data errors caused by artificial factors on calculation accuracy in the data acquisition process, and realize high-accuracy load prediction under special conditions of large load fluctuation and the like.

Description

Multi-element load prediction method for data center station based on hybrid model prediction

Technical Field

The invention relates to the technical field of automatic control, in particular to a data center station multi-element load prediction method based on hybrid model prediction.

Background

In recent years, with the rapid development of internet technology, the scale and number of data center stations are rapidly expanded, and the data center electricity consumption in China is counted to be 1% of the total electricity consumption in China, so that the load of the data center stations becomes a considerable electric load. Under the requirements of rapid and accurate scheduling of a power system and system safety stability, the implementation of prediction precision on a data center station has become important.

The load of the data center station is mainly divided into two types, one is the load of a server for processing data, the other is the load for storing, illuminating, cooling and distributing the normal work of the server, the load is influenced by a plurality of factors due to the complexity of power consumption of the data center station, and the change of the load has no obvious regularity. In the traditional load prediction method, a factor is often selected to perform single mapping analysis on the load, the influence of other factors is ignored, and the linkage relation among all influence factors is not considered, so that the analysis of load characteristics is not accurate enough, the load prediction and the establishment of an electricity consumption plan are influenced, and the accuracy is low. In addition, the traditional prediction models such as a time sequence model, a neural network model and an artificial intelligence optimization model have respective advantages and disadvantages, the time sequence model is simple to assume and calculate and has strong adaptability, but the extrapolation effect is poor, and the prediction range is small; the neural network model has a good fitting effect and capability of processing nonlinear data, but the model is unstable and depends on data characteristics; the artificial intelligent optimization model can be used in combination with other methods to improve the prediction accuracy, but is easy to fall into local optimum. In addition, the traditional prediction algorithm has typical limitations that errors are insensitive to weight value changes, error gradient changes are small, adjustment time is long, iteration times are large, convergence is slow, a neural network output layer is very easy to sink into local minimum, certain defects are caused in the aspects of prediction precision and stability, and the problems all provide challenges for accurate load prediction of a data center station.

The traditional prediction method does not fully excavate massive historical operation data of sleeping, the load is often predicted in a single scene, the load difference at the time level is ignored, meanwhile, the prediction precision of the system is influenced by various factors existing in the system, the traditional prediction model is long in adjustment time, multiple in iteration times and slow in convergence, the neural network output layer is extremely easy to sink into local minimum, certain defects are caused in the aspects of the prediction precision and stability, and the system prediction is inaccurate due to the various factors.

Disclosure of Invention

Aiming at the problems existing in the prior art, the invention provides a data center station multi-element load prediction method based on mixed model prediction.

In order to solve the technical problems, the invention adopts the following technical scheme:

a data center station multi-element load prediction method based on mixed model prediction comprises the following steps:

step 1: data collection and data preprocessing; and acquiring historical data of the data center station within preset time, constructing a training set, and preprocessing the data, wherein the historical data comprises cold load, heat load, electric load, light intensity, wind speed, humidity, air pressure and date.

Step 1.1: acquiring historical data of a data center station in preset time, and dividing the load of the data center station into three scenes of spring, autumn, summer and winter by adopting a clustering algorithm K-means method to predict scenes;

step 1.2: three meteorological characteristic factors of solar radiation quantity, temperature and air humidity are selected from historical data, the three meteorological characteristic factors are sequenced into the solar radiation quantity, the temperature and the air humidity in a training set, and then the cold-hot electric load and the environmental factors of a data center station are collected to form a training set X as follows:

wherein X is a training set; x is x _e Electrical load of data center, x _e (i) Is the i-th electrical load in the electrical load sequence; x is x _h For heat load, x _h (i) Is the i-th thermal load in the thermal load sequence; x is x _c For cold load, x _c (i) Is the i-th cold load in the cold load sequence; x is x _R For the amount of solar radiation, x _R (i) An i-th solar radiation amount in the radiation amount sequence; x is x _T Is the temperature, x _T (i) Is the i-th temperature amount in the temperature sequence; x is x _M Is air humidity x _M (i) Is the ith moisture content in the moisture sequence; m is the number of sequences of a sequence.

Step 1.3: carrying out importance ranking on load prediction feature data by adopting Random Forest (RF) out-of-bag estimation and carrying out feature selection;

the importance is calculated as follows:

wherein Q is the number of base learners; errOOB (oob) _q An out-of-bag error for the q-th basis learner; errOOB' _q Adding noise-added bag outside errors for the q-th basis learner, and carrying out importance ranking on load prediction feature data and feature selection by adopting Random Forest (RF) bag outside estimation;

step 1.4: calculating the correlation between the data load and the characteristic factors;

analyzing the correlation between the cold and hot loads and the electric loads of the data center station and the multielement loads and the meteorological influence factors according to three scenes of spring, autumn and winter, establishing matrixes formed by the cold and hot loads and the environmental influence factors under the three scenes, and calculating the strength, the magnitude and the order of the relationship between the cold and hot loads and the environmental influence factors of the data center station under the three scenes to obtain a correlation coefficient and a correlation degree;

in spring and autumn, the sequences of the cold-hot electric load and the environmental influence factors form the following matrix:

wherein X is ₁ A matrix formed for spring and autumn Leng Redian loads and environmental impact factors.

In summer, the sequences of the cold load, the electric load and the environmental impact factors form the following matrix:

wherein X is ₂ The matrix is formed by the cold electric load in summer and the environmental influence factors.

In winter, the data sequences of the thermal load, the electric load and the environmental impact factors form the following matrixes:

wherein X is ₃ A matrix formed for winter thermoelectric load and environmental impact factors.

The normalization processing needs to be carried out on the original data, and the formula is as follows:

wherein x is the selectionOutputting original data; x is x _max Is the maximum value of the sample data; x is x _min Is the minimum value of the sample data; x' is the value after normalization;

correlation coefficient xi _j Degree of association gamma _j The calculation formula of (2) is as follows:

in xi _j Correlation coefficient, ζ, for data class j _j (k) The k-th association degree is the k-th association degree; gamma ray _j Association degree for data category j; x is x ₀ (k) The kth value of the sequence of normalized weather factors; x is x _j (k) A kth value that is a normalized load sequence; ρ is a resolution coefficient, j represents the type of normalized data;

step 2, constructing a BP neural network model by adopting a quantum particle swarm algorithm QPSO;

step 2.1: the BP neural network is adopted to construct a predictive electric load calculation model of the data center station, and the formula is as follows:

wherein l is the number of hidden layer neurons in the model; n is the number of neurons of the input layer, m is the number of sequence amounts, and a is a constant between 1 and 10;

step 2.2: the Li Yongliang particle swarm algorithm QPSO optimizes the neural network model;

step 2.2.1: the average particle history optimum position is calculated as shown in the following formula:

m in the formula _best The optimal position is the particle history; s is the size of the particle swarm; q (Q) _local,i The position of the ith particle in the particle iteration; step 2.2.2: updating the particle position as shown in the following formula:

q in _i Updated positions for the ith particle; alpha ₁ 、α ₂ Is [0,1]Random numbers between the two; q (Q) _global Is the global optimal particle position;

step 2.2.3: and adopting the inverse of the error square sum of the electric load calculated value and the actual value as an individual fitness function, and constructing the fitness function, wherein the fitness function is shown in the following formula:

in E _i The fitness of i populations; the y (i) th is the actual electrical load represented by the i th population of the data center station; s (i) is a predicted electrical load represented by an ith population of data center sites; n is the population number.

After introducing the fitness function, the particle position function is updated as follows:

in which x is _i Is the position of the ith particle; mu is [0,1]A uniform random number thereon; the χ is continuously updated along with the increase of the iteration times, and the particle position is kept to be optimal; n (N) _max The maximum number of QPSO iterations; n (N) _min Is the minimum number of QPSO iterations;

step 3, constructing an XGBoost prediction model;

step 3.1: establishing a regularized learning objective function;

for the training set X in the step 1, predicting a predicted value by adopting an additive function equation:

wherein L is a model minimum regularization objective function;predicted value for ith target +.>And the actual value y _i The difference between them, i.e. the loss function; n is the sample size, K is the sample feature number, Ω (f _k ) Calculating a variable f for the kth iteration _k A complexity penalty function corresponding to the tree.

Step 3.2: optimizing by using a gradient tree enhancement algorithm;

wherein the second order approximation of the objective function is optimized as:

in the method, in the process of the invention,g is the i-th predicted value in the t-th iteration _i Is first order gradient data in the loss function; h is a _i For second order gradient data in the loss function, f _t (x _i ) For the t-th iterationVariable (I)>Is a gradient sign;

step 3.3: the impurity fraction of the decision tree is evaluated as shown in the following formula:

in the method, in the process of the invention,the best weight for leaf j; l (L) ^(t) (q) is an optimum value of the formula structure q. I _j Is a real set of leaves j in the gradient tree; gamma and lambda are XGBoost algorithm custom parameters, wherein gamma is a step function regular penalty term, and lambda is a second order gradient function regular penalty term; t is the total number of leaf nodes in the gradient tree;

step 4, combining the QPSO-BP neural network model and the XGBoost prediction model to construct a hybrid prediction model, and calculating the weight of the hybrid prediction model;

calculating the weight of the output results of the two models, setting a fusion model weight initial value by combining an average absolute percentage error reciprocal weight MAPE-RW algorithm with an error index, and searching an optimal weight value by combining the initial value to finally form an optimal load prediction model;

the MAPE-RW algorithm is shown as follows:

wherein omega is _a Weights for prediction model a; sigma (sigma) _MAPE,a 、σ _MAPE,b MAPE values for prediction models a and b, respectively.

The hybrid prediction model weight is calculated as follows:

f _s,x ＝w _QPSO-BP ·f _XGBoost,s,x +w _XGBoost ·f _QPSO-BP,s,x (22)

wherein f _s,x Outputting a predicted value of an x-th class load of the scene s for the mixed model; w (w) _QPSO-BP 、w _XGBoost The weights of the QPSO-BP neural network and the XGBoost model are respectively; f (f) _QPSO-BP,s,x 、f _XGBoost,s,x The prediction values of the QPSO-BP neural network and the XGBoost model on the x-th class load of the scene s are respectively obtained; the s value of the output scene is 1, 2 and 3, and the s value represents three scenes of spring, autumn, summer and winter respectively; class x loads include cold loads, hot loads, and electrical loads;

step 5: and (3) carrying the data preprocessed in the step (1) into a mixed prediction model for calculation, and completing multi-element load prediction of the data center station of the sub-scene.

The beneficial effects of the invention are as follows:

the invention provides a multi-element load prediction method for a data center station of a sub-scene based on parallel prediction of a hybrid model, which has the following beneficial effects:

(1) Dividing the multi-element data of the data center station into three scenes of spring, autumn, summer and winter, carrying out multi-element load prediction on the data in each scene, and improving the prediction precision while reducing the prediction time;

(2) In the aspect of feature factor processing, various feature factors are considered, and gray correlation degrees are used for describing the strength, the size and the order of the relationships among the factors, so that the prediction error is reduced;

(3) The GRA method is adopted to conduct characteristic analysis and normalization on the multi-element load data, the processed data is input into the QPSO-BP neural network to conduct prediction, so that the learning time of the QPSO-BP neural network on the data can be remarkably reduced, and more efficient data mining is achieved.

(3) The traditional BP neural network is replaced by the QPSO-BP neural network, and the neural network line loss rate calculation model optimized by the genetic algorithm has better nonlinear fitting capability and higher calculation accuracy than a single BP neural network model.

(4) In the aspect of a prediction algorithm, a QPSO-BP neural network and an XGBoost model are adopted for parallel prediction, a deep learning and machine learning technology is simultaneously applied to load prediction, two integrated learning modes are effectively combined, the advantages of the two models are fully exerted, and a model with higher stability and generalization capability is obtained. The mixed prediction model can actively enrich the input data characteristics with single dimensionality, so that the network learning is more efficient, the influence of data errors caused by artificial factors on the calculation precision in the data acquisition process can be avoided, and the high-precision load prediction can be realized under the special conditions of large load fluctuation and the like;

(5) Setting the weight of the fusion model by using an MAPE-RW algorithm, completing the search of the optimal weight, and reducing the error of the fusion model;

drawings

FIG. 1 is an overall flow chart of multi-element load prediction for a data center station in an embodiment of the invention;

FIG. 2 is a graph showing the result of the analysis of the correlation between the data center and the spring and autumn in the embodiment of the invention;

FIG. 3 is a graph showing the result of summer correlation analysis of data in a data center according to an embodiment of the present invention;

FIG. 4 is a graph showing the result of the winter correlation analysis of data in a data center according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a prediction model of a BP neural network according to an embodiment of the present invention;

FIG. 6 is a flowchart of a QPSO-BP neural network algorithm in an embodiment of the present invention;

FIG. 7 is a flowchart of an XGBoost neural network algorithm according to an embodiment of the present invention;

FIG. 8 is a comparison of the electrical load hybrid prediction model in spring and autumn with the prediction results of other methods in the embodiment of the invention;

FIG. 9 is a comparison of the prediction results of the summer cold load hybrid prediction model and other methods according to the embodiment of the present invention;

FIG. 10 is a comparison of the prediction results of the winter heat load mixture prediction model and other methods according to the embodiment of the present invention.

Detailed Description

The following describes in further detail the embodiments of the present invention with reference to the drawings and examples. The following examples are illustrative of the invention and are not intended to limit the scope of the invention.

A data center station multi-element load prediction method based on mixed model prediction is shown in fig. 1, and comprises the following steps:

Step 1.1: the method comprises the steps of obtaining historical data of a data center station in preset time, and dividing the load of the data center station into three scenes of spring, autumn, summer and winter by adopting a clustering algorithm K-means method to conduct scene prediction, so that prediction accuracy is further improved.

Step 1.2: in order to predict the cold-hot electric load of the data center station, three weather characteristic factors which strongly influence the load prediction are selected from historical data, wherein the three weather characteristic factors are sequenced into the solar radiation amount, the temperature and the air humidity in a training set, and then the cold-hot electric load of the data center station and the environmental factors are collected to form the training set X as follows:

the importance is calculated as follows:

wherein Q is the number of base learners; errOOB (oob) _q An out-of-bag error for the q-th basis learner; errOOB' _q The out-of-bag error after adding noise to the q-th basis learner. In the load prediction of the data center station, the historical data used for prediction may include feature data related to prediction but redundant, and the Random Forest (RF) out-of-bag estimation is adopted to rank the importance of the load prediction feature data and perform feature selection, wherein the more the MDA index is reduced, the greater the influence of the corresponding feature on the prediction result is indicated, and the higher the importance is indicated.

the correlation is analyzed to further improve the prediction accuracy. The cold supply and the heat supply exist in spring and autumn simultaneously, the electric load and the cold load have strong coupling, and the correlation between the electric load and the cold load of the data center station and the environmental influence factors is analyzed; in summer, the system is mainly used in a cold supply season, strong coupling exists between the electric load and the cold load, and the correlation between the electric load and the cold load of the data center station and the environmental influence factors is analyzed; in winter, the heat supply season is mainly used, strong coupling exists between the electric load and the heat load, and the correlation between the electric load and the heat load of the data center station and the environmental influence factors is analyzed.

Because the cold-hot electric load and the environmental factors have certain periodicity each year, under the condition of less sample data, the data are recycled by using the sample data repeatedly so that the trained network model is more accurate, and the data are recycled for 2 times. Because the load value of the data center station is larger, in order to remove the dimension influence and accelerate the network convergence speed, the normalization processing needs to be carried out on the original data, and the formula is as follows:

wherein x is selected original data; x is x _max Is the maximum value of the sample data; x is x _min Is the minimum of sample dataA value; x' is the value after normalization;

the GRA method is adopted to judge the strength, the size and the order of the relationship between the cold and hot loads of the data center station and the environmental influence factors in three scenes of spring, autumn, summer and winter, and the correlation strength and the correlation coefficient xi according to the similarity degree of curve geometry, namely the correlation degree _j Degree of association gamma _j The calculation formula of (2) is as follows:

in xi _j Correlation coefficient, ζ, for data class j _j (k) The k-th association degree is the k-th association degree; gamma ray _j Association degree for data category j; x is x ₀ (k) The kth value of the sequence of normalized weather factors; x is x _j (k) A kth value that is a normalized load sequence; ρ is the resolution factor, typically 0.5; j represents the type of normalized data;

the traditional BP neural network is replaced by the quantum particle swarm algorithm QPSO-BP neural network model, and the neural network calculation model optimized by the QPSO algorithm has better nonlinear fitting capacity and higher calculation accuracy than the single BP neural network model.

Step 2.1: constructing a predicted electrical load calculation model of the data center station by adopting a BP neural network;

the determination of the neural network structure mainly comprises the determination of the network type, the layer number of the BP neural network, the number of neurons at each layer, the excitation function form and the like. The patent adopts BP neural network to construct a predictive electric load calculation model of the data center station, and the function between the middle layer (hidden layer) and the output layer selects an S-shaped function; the number of input layers is determined by the dimension of the sample data; the output layer is an electrical load of the data center station; at present, no unified theoretical method exists for determining the number of middle layers, the number is determined through an empirical formula, and fig. 5 is a schematic diagram of a BP neural network prediction model of the patent, and the formula is as follows:

wherein l is the number of hidden layer neurons in the model; n is the number of neurons of the input layer, a is a constant between 1 and 10;

the BP neural network in this patent operates as follows:

(a) Opening a neural network fitting (Neural Net Fitting) module in a status bar application program in Matlab software;

(b) Under the condition of selecting a data page, selecting a standard sample matrix file to be imported, and selecting a target output matrix file to be imported by Targets;

(c) Under the page of verification and test data (Validation and Test Data), the Training data (Training) is selected to be 70%, the verification data (Validation) is selected to be 15%, and the test data (Testing) is selected to be 15%;

(d) The number of hidden layers is chosen to be 4 under the network architecture (Network Architecture) page.

the BP neural network has slower learning convergence speed, is easy to be trapped in local minimum points, has weak nonlinear fitting capability and lower calculation accuracy, and the Quantum Particle Swarm Optimization (QPSO) is a global optimization algorithm, and utilizes the QPSO to optimize the weight and the threshold of the BP neural network to obtain an optimal individual. And predicting and calculating the electric load of the data center station through the optimal weight and the threshold value, so that the BP neural network is prevented from sinking into local optimal, and a more accurate load predicted value is obtained.

The position updating operation of the quantum particle swarm algorithm in the patent is as follows:

m in the formula _best The optimal position is the particle history; s is the size of the particle swarm; q (Q) _local,i The position of the ith particle in the particle iteration;

step 2.2.2: updating the particle position as shown in the following formula:

in the patent, the particle position updating formula is increased from a traditional random variable to two random variables, so that the convergence rate of the algorithm is increased, and the randomness of the algorithm is increased while the risk is reduced.

fitness (Fitness) represents the quality of population individuals in a genetic algorithm, and the Fitness function can clearly reflect the iterative evolutionary effect of each particle, and QPSO is carried out towards the direction of increasing the Fitness, and the expression is as follows:

step 3, constructing an XGBoost prediction model, as shown in FIG. 7;

XGBoost essentially belongs to a tree-based Boosting serial integrated learning algorithm, a base learner is characterized by weak prediction model and strong correlation, and the integrated mode is formed by continuous serial superposition of the base learner.

Step 3.1: establishing a regularized learning objective function;

wherein L is a model minimum regularization objective function;predicted value for ith target +.>And the actual value y _i The difference between them, i.e. the loss function; n is the sample capacity, K is the sample feature number, in this patent, the sample feature is the environmental factor of the data center station, and the sample feature number is 3; omega (f) _k ) Calculating a variable f for the kth iteration _k A complexity penalty function corresponding to the tree.

Step 3.2: optimizing by using a gradient tree enhancement algorithm;

in the method, in the process of the invention,g is the i-th predicted value in the t-th iteration _i Is first order gradient data in the loss function; h is a _i For second order gradient data in the loss function, f _t (x _i ) Calculating a variable for the t-th iteration, +.>Is a gradient sign;

in the method, in the process of the invention,the best weight for leaf j; l (L) ^(t) (q) is an optimum value of the formula structure q. I _j Is a real set of leaves j in the gradient tree; gamma and lambda are XGBoost algorithm custom parameters, wherein gamma is a step function regular penalty term, lambda is a second order gradient function regular penaltyAn item; t is the total number of leaf nodes in the gradient tree;

because the QPSO-BP neural network model and the XGBoost prediction model have different learning mechanisms and different emphasis points on errors, the prediction results obtained by the two models have certain deviation, so that the prediction result precision of the hybrid prediction model is higher, and the weight of the output results of the two models needs to be calculated.

the MAPE-RW algorithm is shown as follows:

The hybrid prediction model weight is calculated as follows:

f _s,x ＝w _QPSO-BP ·f _XGBoost,s,x +w _XGBoost ·f _QPSO-BP,s,x (22)

The QPSO-BP integral algorithm is divided into two parts, one part is a QPSO part, the other part is a BP neural network part, the prediction result output by the mixed prediction model has smaller error than the electrical load prediction result of a single QPSO-BP neural network, and the fitting degree is higher.

Step 5.1: the QPSO-BP neural network model calculation comprises a QPSO algorithm and a BP neural network algorithm as shown in figure 5;

the QPSO algorithm comprises the following steps:

step a1: inputting data and preprocessing the data;

step a2: randomly creating an initial population, and randomly assigning initial values to the positions and the speeds of the particles;

step a3: calculating the fitness of each particle according to the fitness function, comparing the fitness of each particle, and recording the particle position with the optimal fitness and the corresponding fitness value;

step a4: updating the position of the particle, comparing the current position of the particle with the previous optimal position, and updating the current position into the optimal position if the current position is better than the previous optimal position;

step a5: and when the iteration times reach the upper limit, stopping the operation, outputting the optimization result of the QPSO, and if the operation stopping condition is not reached, returning to the operation step a3 to continue the operation.

The BP neural network algorithm comprises the following steps:

step b1: determining a BP neural network topology structure;

step b2: initial BP neural network weight and threshold length;

step b3: optimizing an optimal weight and a threshold value through a QPSO algorithm;

step b4: calculating an error;

step b5: updating the weight threshold value;

step b6: and (4) outputting a result if the output condition is met, otherwise, returning to the operation step (4) to continue operation.

Step 5.2: the XGBoost model algorithm is shown in FIG. 6, and comprises the following steps:

step c1: inputting a training data set;

step c2: setting a target loss function;

step c3: determining a regression tree structure to calculate an independent tree structure q and a leaf weight w;

step c4: and (3) starting XGBoost iterative computation, outputting a predicted value if the iteration times are larger than the set times, and otherwise, returning to the step (c 3) to continue iteration.

As can be seen from fig. 2, the electric load and thermal load correlation coefficient is 0.43 in spring and autumn, the electric load and cold load correlation coefficient is 0.47, and the coupling relationship between the cold load and the electric load is not deep; as can be seen from FIG. 3, the electrical load and the cold load have a correlation coefficient of up to 0.87 in summer, which are in strong coupling relationship, indicating that a large part of the electricity of the data center station is used for refrigeration in summer; from fig. 4, it can be seen that the correlation coefficient between the electric load and the thermal load in winter is as high as 0.67, which reflects that the heating power consumption of the western station accounts for a large proportion of the data; and the influence of weather factors on the cold-hot electric load can be directly seen from figures 2-4.

The superiority of the hybrid prediction model provided by the patent can be demonstrated through simulation analysis shown in fig. 8, 9 and 10, and compared with a BP neural network and a QPSO-BP model, the curve of the prediction output result of the hybrid prediction model is more attached to an actual load curve, wherein the QPSO-BP model has higher prediction precision than the BP neural network, the prediction precision of the hybrid prediction model is more than 99.7 percent and the prediction precision of the QPSO-BP model is about 98.12 percent, and the prediction precision of the BP neural network is about 96.52 percent according to the three-kind load simulation calculation of cold, heat and electricity in fig. 8-10. The multi-element load prediction method for the data center station based on the mixed model prediction, which is provided by the patent, has superiority and feasibility.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced with equivalents; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions, which are defined by the scope of the appended claims.

Claims

1. A data center station multi-element load prediction method based on mixed model prediction is characterized by comprising the following steps:

step 1: data collection and data preprocessing; acquiring historical data of a data center station within preset time, constructing a training set, and preprocessing the data, wherein the historical data comprises cold load, heat load, electric load, light intensity, wind speed, humidity, air pressure and date;

wherein X is a training set; x is x _e Electrical load of data center, x _e (i) Is the i-th electrical load in the electrical load sequence; x is x _h For heat load, x _h (i) Is the i-th thermal load in the thermal load sequence; x is x _c For cold load, x _c (i) Is the i-th cold load in the cold load sequence; x is x _R For the amount of solar radiation, x _R (i) An i-th solar radiation amount in the radiation amount sequence; x is x _T Is the temperature, x _T (i) Is the i-th temperature amount in the temperature sequence; x is x _M Is air humidity x _M (i) Is the ith moisture content in the moisture sequence; m is the number of sequences of a sequence;

step 1.3: carrying out importance ranking on the load prediction feature data by adopting random forest out-of-bag estimation and carrying out feature selection;

the importance is calculated as follows:

wherein Q is the number of base learners; errOOB (oob) _q An out-of-bag error for the q-th basis learner; errOOB' _q The method comprises the steps that (1) the out-of-bag error after noise is added to a q-th basis learner, importance ranking is conducted on load prediction feature data by adopting random out-of-bag forest estimation, and feature selection is conducted;

wherein X is ₁ A matrix formed for spring and autumn Leng Redian load and environmental impact factors;

wherein X is ₂ The matrix is formed by the cold electric load in summer and the environmental influence factors;

wherein X is ₃ A matrix formed for winter thermoelectric load and environmental impact factors;

the normalization processing is carried out on the original data, and the formula is as follows:

wherein x is selected original data; x is x _max Is the maximum value of the sample data; x is x _min Is the minimum value of the sample data; x' is the value after normalization;

step 3, constructing an XGBoost prediction model;

step 4, combining the QPSO-BP neural network model and the XGBoost prediction model to construct a hybrid prediction model, and calculating the weight of the hybrid prediction model; calculating the weight of the output results of the two models, setting a fusion model weight initial value by combining an average absolute percentage error reciprocal weight MAPE-RW algorithm with an error index, and searching an optimal weight value by combining the initial value to finally form an optimal load prediction model;

the MAPE-RW algorithm in step 4 is shown as follows:

wherein omega is _a Weights for prediction model a; sigma (sigma) _MAPE,a 、σ _MAPE,b MAPE values of prediction models a and b, respectively;

the hybrid prediction model weight is calculated as follows:

f _s,x ＝w _QPSO-BP ·f _XGBoost,s,x +w _XGBoost ·f _QPSO-BP,s,x (22)

wherein f _s,x Outputting a predicted value of an x-th class load of the scene s for the mixed model; w (w) _QPSO-BP 、w _XGBoost The weights of the QPSO-BP neural network and the XGBoost model are respectively; f (f) _QPSO-BP,s,x 、f _XGBoost,s,x The prediction values of the QPSO-BP neural network and the XGBoost model on the x-th class load of the output scene s are respectively obtained;

2. The method for predicting multiple loads of a data center station based on mixed model prediction according to claim 1, wherein the step 2 comprises the following steps:

step 2.2: the Li Yongliang sub-particle swarm algorithm QPSO optimizes the neural network model.

3. The method for predicting multiple loads of a data center station based on mixed model prediction according to claim 2, wherein the step 2.2 comprises the following steps:

step 2.2.2: updating the particle position as shown in the following formula:

in E _i The fitness of i populations; the y (i) th is the actual electrical load represented by the i th population of the data center station; s (i) is a predicted electrical load represented by an ith population of data center sites;n is population number;

in which x is _i Is the position of the ith particle; mu is [0,1]A uniform random number thereon; the χ is continuously updated along with the increase of the iteration times, and the particle position is kept to be optimal; n (N) _max The maximum number of QPSO iterations; n (N) _min Is the minimum number of QPSO iterations.

4. The method for predicting multiple loads of a data center station based on mixed model prediction according to claim 1, wherein the step 3 comprises the following steps:

step 3.1: establishing a regularized learning objective function;

wherein L is a model minimum regularization objective function;predicted value for ith target +.>And the actual value y _i The difference between them, i.e. the loss function; n is the sample size, K is the sample feature number, Ω (f _k ) Calculating a variable f for the kth iteration _k A complexity penalty function corresponding to the tree;

step 3.2: optimizing by using a gradient tree enhancement algorithm;

in the method, in the process of the invention,the best weight for leaf j; l (L) ^(t) (q) is the optimum value of the formula structure q, I _j Is a real set of leaves j in the gradient tree; gamma and lambda are XGBoost algorithm custom parameters, wherein gamma isA step function regular penalty term, wherein lambda is a second-order gradient function regular penalty term; t is the total number of leaf nodes in the gradient tree.

5. The method for predicting the multiple loads of the data center station based on the mixed model prediction according to claim 1, wherein the values of the output scenes s are 1, 2 and 3, and the three scenes are respectively represented by spring, autumn, summer and winter; the class x loads include cold loads, hot loads, and electrical loads.