CN113762387B - Multi-element load prediction method for data center station based on hybrid model prediction - Google Patents

Multi-element load prediction method for data center station based on hybrid model prediction Download PDF

Info

Publication number
CN113762387B
CN113762387B CN202111048836.5A CN202111048836A CN113762387B CN 113762387 B CN113762387 B CN 113762387B CN 202111048836 A CN202111048836 A CN 202111048836A CN 113762387 B CN113762387 B CN 113762387B
Authority
CN
China
Prior art keywords
load
data
prediction
model
data center
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111048836.5A
Other languages
Chinese (zh)
Other versions
CN113762387A (en
Inventor
李华
丁吉
杨东升
张化光
周博文
李广地
金硕巍
罗艳红
王迎春
闫士杰
杨波
陈乐�
Original Assignee
东北大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 东北大学 filed Critical 东北大学
Priority to CN202111048836.5A priority Critical patent/CN113762387B/en
Publication of CN113762387A publication Critical patent/CN113762387A/en
Application granted granted Critical
Publication of CN113762387B publication Critical patent/CN113762387B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/06Electricity, gas or water supply
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Abstract

The invention provides a data center station multi-element load prediction method based on hybrid model prediction, and relates to the technical field of automatic control. According to the invention, the multi-element data of the data center station are divided into three scenes of spring, autumn and winter, multi-element load prediction is carried out on the data in each scene, the GRA method is adopted to carry out characteristic analysis and normalization on the multi-element load data, the processed data are input into the QPSO-BP neural network to be predicted, the QPSO-BP neural network and the XGBoost model are adopted to carry out parallel prediction in the aspect of a prediction algorithm, the deep learning and the machine learning technology are simultaneously applied to load prediction, the two integrated learning modes are effectively combined, the advantages of the two models are fully exerted, and the model with higher stability and generalization capability is facilitated. The mixed prediction model can actively enrich the characteristics of input data with single dimensionality, avoid the influence of data errors caused by artificial factors on calculation accuracy in the data acquisition process, and realize high-accuracy load prediction under special conditions of large load fluctuation and the like.

Description

Multi-element load prediction method for data center station based on hybrid model prediction
Technical Field
The invention relates to the technical field of automatic control, in particular to a data center station multi-element load prediction method based on hybrid model prediction.
Background
In recent years, with the rapid development of internet technology, the scale and number of data center stations are rapidly expanded, and the data center electricity consumption in China is counted to be 1% of the total electricity consumption in China, so that the load of the data center stations becomes a considerable electric load. Under the requirements of rapid and accurate scheduling of a power system and system safety stability, the implementation of prediction precision on a data center station has become important.
The load of the data center station is mainly divided into two types, one is the load of a server for processing data, the other is the load for storing, illuminating, cooling and distributing the normal work of the server, the load is influenced by a plurality of factors due to the complexity of power consumption of the data center station, and the change of the load has no obvious regularity. In the traditional load prediction method, a factor is often selected to perform single mapping analysis on the load, the influence of other factors is ignored, and the linkage relation among all influence factors is not considered, so that the analysis of load characteristics is not accurate enough, the load prediction and the establishment of an electricity consumption plan are influenced, and the accuracy is low. In addition, the traditional prediction models such as a time sequence model, a neural network model and an artificial intelligence optimization model have respective advantages and disadvantages, the time sequence model is simple to assume and calculate and has strong adaptability, but the extrapolation effect is poor, and the prediction range is small; the neural network model has a good fitting effect and capability of processing nonlinear data, but the model is unstable and depends on data characteristics; the artificial intelligent optimization model can be used in combination with other methods to improve the prediction accuracy, but is easy to fall into local optimum. In addition, the traditional prediction algorithm has typical limitations that errors are insensitive to weight value changes, error gradient changes are small, adjustment time is long, iteration times are large, convergence is slow, a neural network output layer is very easy to sink into local minimum, certain defects are caused in the aspects of prediction precision and stability, and the problems all provide challenges for accurate load prediction of a data center station.
The traditional prediction method does not fully excavate massive historical operation data of sleeping, the load is often predicted in a single scene, the load difference at the time level is ignored, meanwhile, the prediction precision of the system is influenced by various factors existing in the system, the traditional prediction model is long in adjustment time, multiple in iteration times and slow in convergence, the neural network output layer is extremely easy to sink into local minimum, certain defects are caused in the aspects of the prediction precision and stability, and the system prediction is inaccurate due to the various factors.
Disclosure of Invention
Aiming at the problems existing in the prior art, the invention provides a data center station multi-element load prediction method based on mixed model prediction.
In order to solve the technical problems, the invention adopts the following technical scheme:
a data center station multi-element load prediction method based on mixed model prediction comprises the following steps:
step 1: data collection and data preprocessing; and acquiring historical data of the data center station within preset time, constructing a training set, and preprocessing the data, wherein the historical data comprises cold load, heat load, electric load, light intensity, wind speed, humidity, air pressure and date.
Step 1.1: acquiring historical data of a data center station in preset time, and dividing the load of the data center station into three scenes of spring, autumn, summer and winter by adopting a clustering algorithm K-means method to predict scenes;
step 1.2: three meteorological characteristic factors of solar radiation quantity, temperature and air humidity are selected from historical data, the three meteorological characteristic factors are sequenced into the solar radiation quantity, the temperature and the air humidity in a training set, and then the cold-hot electric load and the environmental factors of a data center station are collected to form a training set X as follows:
wherein X is a training set; x is x e Electrical load of data center, x e (i) Is the i-th electrical load in the electrical load sequence; x is x h For heat load, x h (i) Is the i-th thermal load in the thermal load sequence; x is x c For cold load, x c (i) Is the i-th cold load in the cold load sequence; x is x R For the amount of solar radiation, x R (i) An i-th solar radiation amount in the radiation amount sequence; x is x T Is the temperature, x T (i) Is the i-th temperature amount in the temperature sequence; x is x M Is air humidity x M (i) Is the ith moisture content in the moisture sequence; m is the number of sequences of a sequence.
Step 1.3: carrying out importance ranking on load prediction feature data by adopting Random Forest (RF) out-of-bag estimation and carrying out feature selection;
the importance is calculated as follows:
wherein Q is the number of base learners; errOOB (oob) q An out-of-bag error for the q-th basis learner; errOOB' q Adding noise-added bag outside errors for the q-th basis learner, and carrying out importance ranking on load prediction feature data and feature selection by adopting Random Forest (RF) bag outside estimation;
step 1.4: calculating the correlation between the data load and the characteristic factors;
analyzing the correlation between the cold and hot loads and the electric loads of the data center station and the multielement loads and the meteorological influence factors according to three scenes of spring, autumn and winter, establishing matrixes formed by the cold and hot loads and the environmental influence factors under the three scenes, and calculating the strength, the magnitude and the order of the relationship between the cold and hot loads and the environmental influence factors of the data center station under the three scenes to obtain a correlation coefficient and a correlation degree;
in spring and autumn, the sequences of the cold-hot electric load and the environmental influence factors form the following matrix:
wherein X is 1 A matrix formed for spring and autumn Leng Redian loads and environmental impact factors.
In summer, the sequences of the cold load, the electric load and the environmental impact factors form the following matrix:
wherein X is 2 The matrix is formed by the cold electric load in summer and the environmental influence factors.
In winter, the data sequences of the thermal load, the electric load and the environmental impact factors form the following matrixes:
wherein X is 3 A matrix formed for winter thermoelectric load and environmental impact factors.
The normalization processing needs to be carried out on the original data, and the formula is as follows:
wherein x is the selectionOutputting original data; x is x max Is the maximum value of the sample data; x is x min Is the minimum value of the sample data; x' is the value after normalization;
correlation coefficient xi j Degree of association gamma j The calculation formula of (2) is as follows:
in xi j Correlation coefficient, ζ, for data class j j (k) The k-th association degree is the k-th association degree; gamma ray j Association degree for data category j; x is x 0 (k) The kth value of the sequence of normalized weather factors; x is x j (k) A kth value that is a normalized load sequence; ρ is a resolution coefficient, j represents the type of normalized data;
step 2, constructing a BP neural network model by adopting a quantum particle swarm algorithm QPSO;
step 2.1: the BP neural network is adopted to construct a predictive electric load calculation model of the data center station, and the formula is as follows:
wherein l is the number of hidden layer neurons in the model; n is the number of neurons of the input layer, m is the number of sequence amounts, and a is a constant between 1 and 10;
step 2.2: the Li Yongliang particle swarm algorithm QPSO optimizes the neural network model;
step 2.2.1: the average particle history optimum position is calculated as shown in the following formula:
m in the formula best The optimal position is the particle history; s is the size of the particle swarm; q (Q) local,i The position of the ith particle in the particle iteration; step 2.2.2: updating the particle position as shown in the following formula:
q in i Updated positions for the ith particle; alpha 1 、α 2 Is [0,1]Random numbers between the two; q (Q) global Is the global optimal particle position;
step 2.2.3: and adopting the inverse of the error square sum of the electric load calculated value and the actual value as an individual fitness function, and constructing the fitness function, wherein the fitness function is shown in the following formula:
in E i The fitness of i populations; the y (i) th is the actual electrical load represented by the i th population of the data center station; s (i) is a predicted electrical load represented by an ith population of data center sites; n is the population number.
After introducing the fitness function, the particle position function is updated as follows:
in which x is i Is the position of the ith particle; mu is [0,1]A uniform random number thereon; the χ is continuously updated along with the increase of the iteration times, and the particle position is kept to be optimal; n (N) max The maximum number of QPSO iterations; n (N) min Is the minimum number of QPSO iterations;
step 3, constructing an XGBoost prediction model;
step 3.1: establishing a regularized learning objective function;
for the training set X in the step 1, predicting a predicted value by adopting an additive function equation:
wherein L is a model minimum regularization objective function;predicted value for ith target +.>And the actual value y i The difference between them, i.e. the loss function; n is the sample size, K is the sample feature number, Ω (f k ) Calculating a variable f for the kth iteration k A complexity penalty function corresponding to the tree.
Step 3.2: optimizing by using a gradient tree enhancement algorithm;
wherein the second order approximation of the objective function is optimized as:
in the method, in the process of the invention,g is the i-th predicted value in the t-th iteration i Is first order gradient data in the loss function; h is a i For second order gradient data in the loss function, f t (x i ) For the t-th iterationVariable (I)>Is a gradient sign;
step 3.3: the impurity fraction of the decision tree is evaluated as shown in the following formula:
in the method, in the process of the invention,the best weight for leaf j; l (L) (t) (q) is an optimum value of the formula structure q. I j Is a real set of leaves j in the gradient tree; gamma and lambda are XGBoost algorithm custom parameters, wherein gamma is a step function regular penalty term, and lambda is a second order gradient function regular penalty term; t is the total number of leaf nodes in the gradient tree;
step 4, combining the QPSO-BP neural network model and the XGBoost prediction model to construct a hybrid prediction model, and calculating the weight of the hybrid prediction model;
calculating the weight of the output results of the two models, setting a fusion model weight initial value by combining an average absolute percentage error reciprocal weight MAPE-RW algorithm with an error index, and searching an optimal weight value by combining the initial value to finally form an optimal load prediction model;
the MAPE-RW algorithm is shown as follows:
wherein omega is a Weights for prediction model a; sigma (sigma) MAPE,a 、σ MAPE,b MAPE values for prediction models a and b, respectively.
The hybrid prediction model weight is calculated as follows:
f s,x =w QPSO-BP ·f XGBoost,s,x +w XGBoost ·f QPSO-BP,s,x (22)
wherein f s,x Outputting a predicted value of an x-th class load of the scene s for the mixed model; w (w) QPSO-BP 、w XGBoost The weights of the QPSO-BP neural network and the XGBoost model are respectively; f (f) QPSO-BP,s,x 、f XGBoost,s,x The prediction values of the QPSO-BP neural network and the XGBoost model on the x-th class load of the scene s are respectively obtained; the s value of the output scene is 1, 2 and 3, and the s value represents three scenes of spring, autumn, summer and winter respectively; class x loads include cold loads, hot loads, and electrical loads;
step 5: and (3) carrying the data preprocessed in the step (1) into a mixed prediction model for calculation, and completing multi-element load prediction of the data center station of the sub-scene.
The beneficial effects of the invention are as follows:
the invention provides a multi-element load prediction method for a data center station of a sub-scene based on parallel prediction of a hybrid model, which has the following beneficial effects:
(1) Dividing the multi-element data of the data center station into three scenes of spring, autumn, summer and winter, carrying out multi-element load prediction on the data in each scene, and improving the prediction precision while reducing the prediction time;
(2) In the aspect of feature factor processing, various feature factors are considered, and gray correlation degrees are used for describing the strength, the size and the order of the relationships among the factors, so that the prediction error is reduced;
(3) The GRA method is adopted to conduct characteristic analysis and normalization on the multi-element load data, the processed data is input into the QPSO-BP neural network to conduct prediction, so that the learning time of the QPSO-BP neural network on the data can be remarkably reduced, and more efficient data mining is achieved.
(3) The traditional BP neural network is replaced by the QPSO-BP neural network, and the neural network line loss rate calculation model optimized by the genetic algorithm has better nonlinear fitting capability and higher calculation accuracy than a single BP neural network model.
(4) In the aspect of a prediction algorithm, a QPSO-BP neural network and an XGBoost model are adopted for parallel prediction, a deep learning and machine learning technology is simultaneously applied to load prediction, two integrated learning modes are effectively combined, the advantages of the two models are fully exerted, and a model with higher stability and generalization capability is obtained. The mixed prediction model can actively enrich the input data characteristics with single dimensionality, so that the network learning is more efficient, the influence of data errors caused by artificial factors on the calculation precision in the data acquisition process can be avoided, and the high-precision load prediction can be realized under the special conditions of large load fluctuation and the like;
(5) Setting the weight of the fusion model by using an MAPE-RW algorithm, completing the search of the optimal weight, and reducing the error of the fusion model;
drawings
FIG. 1 is an overall flow chart of multi-element load prediction for a data center station in an embodiment of the invention;
FIG. 2 is a graph showing the result of the analysis of the correlation between the data center and the spring and autumn in the embodiment of the invention;
FIG. 3 is a graph showing the result of summer correlation analysis of data in a data center according to an embodiment of the present invention;
FIG. 4 is a graph showing the result of the winter correlation analysis of data in a data center according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a prediction model of a BP neural network according to an embodiment of the present invention;
FIG. 6 is a flowchart of a QPSO-BP neural network algorithm in an embodiment of the present invention;
FIG. 7 is a flowchart of an XGBoost neural network algorithm according to an embodiment of the present invention;
FIG. 8 is a comparison of the electrical load hybrid prediction model in spring and autumn with the prediction results of other methods in the embodiment of the invention;
FIG. 9 is a comparison of the prediction results of the summer cold load hybrid prediction model and other methods according to the embodiment of the present invention;
FIG. 10 is a comparison of the prediction results of the winter heat load mixture prediction model and other methods according to the embodiment of the present invention.
Detailed Description
The following describes in further detail the embodiments of the present invention with reference to the drawings and examples. The following examples are illustrative of the invention and are not intended to limit the scope of the invention.
A data center station multi-element load prediction method based on mixed model prediction is shown in fig. 1, and comprises the following steps:
step 1: data collection and data preprocessing; and acquiring historical data of the data center station within preset time, constructing a training set, and preprocessing the data, wherein the historical data comprises cold load, heat load, electric load, light intensity, wind speed, humidity, air pressure and date.
Step 1.1: the method comprises the steps of obtaining historical data of a data center station in preset time, and dividing the load of the data center station into three scenes of spring, autumn, summer and winter by adopting a clustering algorithm K-means method to conduct scene prediction, so that prediction accuracy is further improved.
Step 1.2: in order to predict the cold-hot electric load of the data center station, three weather characteristic factors which strongly influence the load prediction are selected from historical data, wherein the three weather characteristic factors are sequenced into the solar radiation amount, the temperature and the air humidity in a training set, and then the cold-hot electric load of the data center station and the environmental factors are collected to form the training set X as follows:
wherein X is a training set; x is x e Electrical load of data center, x e (i) Is the i-th electrical load in the electrical load sequence; x is x h For heat load, x h (i) Is the i-th thermal load in the thermal load sequence; x is x c For cold load, x c (i) Is the i-th cold load in the cold load sequence; x is x R For the amount of solar radiation, x R (i) An i-th solar radiation amount in the radiation amount sequence; x is x T Is the temperature, x T (i) Is the i-th temperature amount in the temperature sequence; x is x M Is air humidity x M (i) Is the ith moisture content in the moisture sequence; m is the number of sequences of a sequence.
Step 1.3: carrying out importance ranking on load prediction feature data by adopting Random Forest (RF) out-of-bag estimation and carrying out feature selection;
the importance is calculated as follows:
wherein Q is the number of base learners; errOOB (oob) q An out-of-bag error for the q-th basis learner; errOOB' q The out-of-bag error after adding noise to the q-th basis learner. In the load prediction of the data center station, the historical data used for prediction may include feature data related to prediction but redundant, and the Random Forest (RF) out-of-bag estimation is adopted to rank the importance of the load prediction feature data and perform feature selection, wherein the more the MDA index is reduced, the greater the influence of the corresponding feature on the prediction result is indicated, and the higher the importance is indicated.
Step 1.4: calculating the correlation between the data load and the characteristic factors;
analyzing the correlation between the cold and hot loads and the electric loads of the data center station and the multielement loads and the meteorological influence factors according to three scenes of spring, autumn and winter, establishing matrixes formed by the cold and hot loads and the environmental influence factors under the three scenes, and calculating the strength, the magnitude and the order of the relationship between the cold and hot loads and the environmental influence factors of the data center station under the three scenes to obtain a correlation coefficient and a correlation degree;
the correlation is analyzed to further improve the prediction accuracy. The cold supply and the heat supply exist in spring and autumn simultaneously, the electric load and the cold load have strong coupling, and the correlation between the electric load and the cold load of the data center station and the environmental influence factors is analyzed; in summer, the system is mainly used in a cold supply season, strong coupling exists between the electric load and the cold load, and the correlation between the electric load and the cold load of the data center station and the environmental influence factors is analyzed; in winter, the heat supply season is mainly used, strong coupling exists between the electric load and the heat load, and the correlation between the electric load and the heat load of the data center station and the environmental influence factors is analyzed.
In spring and autumn, the sequences of the cold-hot electric load and the environmental influence factors form the following matrix:
wherein X is 1 A matrix formed for spring and autumn Leng Redian loads and environmental impact factors.
In summer, the sequences of the cold load, the electric load and the environmental impact factors form the following matrix:
wherein X is 2 The matrix is formed by the cold electric load in summer and the environmental influence factors.
In winter, the data sequences of the thermal load, the electric load and the environmental impact factors form the following matrixes:
wherein X is 3 A matrix formed for winter thermoelectric load and environmental impact factors.
Because the cold-hot electric load and the environmental factors have certain periodicity each year, under the condition of less sample data, the data are recycled by using the sample data repeatedly so that the trained network model is more accurate, and the data are recycled for 2 times. Because the load value of the data center station is larger, in order to remove the dimension influence and accelerate the network convergence speed, the normalization processing needs to be carried out on the original data, and the formula is as follows:
wherein x is selected original data; x is x max Is the maximum value of the sample data; x is x min Is the minimum of sample dataA value; x' is the value after normalization;
the GRA method is adopted to judge the strength, the size and the order of the relationship between the cold and hot loads of the data center station and the environmental influence factors in three scenes of spring, autumn, summer and winter, and the correlation strength and the correlation coefficient xi according to the similarity degree of curve geometry, namely the correlation degree j Degree of association gamma j The calculation formula of (2) is as follows:
in xi j Correlation coefficient, ζ, for data class j j (k) The k-th association degree is the k-th association degree; gamma ray j Association degree for data category j; x is x 0 (k) The kth value of the sequence of normalized weather factors; x is x j (k) A kth value that is a normalized load sequence; ρ is the resolution factor, typically 0.5; j represents the type of normalized data;
step 2, constructing a BP neural network model by adopting a quantum particle swarm algorithm QPSO;
the traditional BP neural network is replaced by the quantum particle swarm algorithm QPSO-BP neural network model, and the neural network calculation model optimized by the QPSO algorithm has better nonlinear fitting capacity and higher calculation accuracy than the single BP neural network model.
Step 2.1: constructing a predicted electrical load calculation model of the data center station by adopting a BP neural network;
the determination of the neural network structure mainly comprises the determination of the network type, the layer number of the BP neural network, the number of neurons at each layer, the excitation function form and the like. The patent adopts BP neural network to construct a predictive electric load calculation model of the data center station, and the function between the middle layer (hidden layer) and the output layer selects an S-shaped function; the number of input layers is determined by the dimension of the sample data; the output layer is an electrical load of the data center station; at present, no unified theoretical method exists for determining the number of middle layers, the number is determined through an empirical formula, and fig. 5 is a schematic diagram of a BP neural network prediction model of the patent, and the formula is as follows:
wherein l is the number of hidden layer neurons in the model; n is the number of neurons of the input layer, a is a constant between 1 and 10;
the BP neural network in this patent operates as follows:
(a) Opening a neural network fitting (Neural Net Fitting) module in a status bar application program in Matlab software;
(b) Under the condition of selecting a data page, selecting a standard sample matrix file to be imported, and selecting a target output matrix file to be imported by Targets;
(c) Under the page of verification and test data (Validation and Test Data), the Training data (Training) is selected to be 70%, the verification data (Validation) is selected to be 15%, and the test data (Testing) is selected to be 15%;
(d) The number of hidden layers is chosen to be 4 under the network architecture (Network Architecture) page.
Step 2.2: the Li Yongliang particle swarm algorithm QPSO optimizes the neural network model;
the BP neural network has slower learning convergence speed, is easy to be trapped in local minimum points, has weak nonlinear fitting capability and lower calculation accuracy, and the Quantum Particle Swarm Optimization (QPSO) is a global optimization algorithm, and utilizes the QPSO to optimize the weight and the threshold of the BP neural network to obtain an optimal individual. And predicting and calculating the electric load of the data center station through the optimal weight and the threshold value, so that the BP neural network is prevented from sinking into local optimal, and a more accurate load predicted value is obtained.
The position updating operation of the quantum particle swarm algorithm in the patent is as follows:
step 2.2.1: the average particle history optimum position is calculated as shown in the following formula:
m in the formula best The optimal position is the particle history; s is the size of the particle swarm; q (Q) local,i The position of the ith particle in the particle iteration;
step 2.2.2: updating the particle position as shown in the following formula:
in the patent, the particle position updating formula is increased from a traditional random variable to two random variables, so that the convergence rate of the algorithm is increased, and the randomness of the algorithm is increased while the risk is reduced.
Q in i Updated positions for the ith particle; alpha 1 、α 2 Is [0,1]Random numbers between the two; q (Q) global Is the global optimal particle position;
step 2.2.3: and adopting the inverse of the error square sum of the electric load calculated value and the actual value as an individual fitness function, and constructing the fitness function, wherein the fitness function is shown in the following formula:
fitness (Fitness) represents the quality of population individuals in a genetic algorithm, and the Fitness function can clearly reflect the iterative evolutionary effect of each particle, and QPSO is carried out towards the direction of increasing the Fitness, and the expression is as follows:
in E i The fitness of i populations; the y (i) th is the actual electrical load represented by the i th population of the data center station; s (i) is a predicted electrical load represented by an ith population of data center sites; n is the population number.
After introducing the fitness function, the particle position function is updated as follows:
in which x is i Is the position of the ith particle; mu is [0,1]A uniform random number thereon; the χ is continuously updated along with the increase of the iteration times, and the particle position is kept to be optimal; n (N) max The maximum number of QPSO iterations; n (N) min Is the minimum number of QPSO iterations;
step 3, constructing an XGBoost prediction model, as shown in FIG. 7;
XGBoost essentially belongs to a tree-based Boosting serial integrated learning algorithm, a base learner is characterized by weak prediction model and strong correlation, and the integrated mode is formed by continuous serial superposition of the base learner.
Step 3.1: establishing a regularized learning objective function;
for the training set X in the step 1, predicting a predicted value by adopting an additive function equation:
wherein L is a model minimum regularization objective function;predicted value for ith target +.>And the actual value y i The difference between them, i.e. the loss function; n is the sample capacity, K is the sample feature number, in this patent, the sample feature is the environmental factor of the data center station, and the sample feature number is 3; omega (f) k ) Calculating a variable f for the kth iteration k A complexity penalty function corresponding to the tree.
Step 3.2: optimizing by using a gradient tree enhancement algorithm;
wherein the second order approximation of the objective function is optimized as:
in the method, in the process of the invention,g is the i-th predicted value in the t-th iteration i Is first order gradient data in the loss function; h is a i For second order gradient data in the loss function, f t (x i ) Calculating a variable for the t-th iteration, +.>Is a gradient sign;
step 3.3: the impurity fraction of the decision tree is evaluated as shown in the following formula:
in the method, in the process of the invention,the best weight for leaf j; l (L) (t) (q) is an optimum value of the formula structure q. I j Is a real set of leaves j in the gradient tree; gamma and lambda are XGBoost algorithm custom parameters, wherein gamma is a step function regular penalty term, lambda is a second order gradient function regular penaltyAn item; t is the total number of leaf nodes in the gradient tree;
step 4, combining the QPSO-BP neural network model and the XGBoost prediction model to construct a hybrid prediction model, and calculating the weight of the hybrid prediction model;
because the QPSO-BP neural network model and the XGBoost prediction model have different learning mechanisms and different emphasis points on errors, the prediction results obtained by the two models have certain deviation, so that the prediction result precision of the hybrid prediction model is higher, and the weight of the output results of the two models needs to be calculated.
Calculating the weight of the output results of the two models, setting a fusion model weight initial value by combining an average absolute percentage error reciprocal weight MAPE-RW algorithm with an error index, and searching an optimal weight value by combining the initial value to finally form an optimal load prediction model;
the MAPE-RW algorithm is shown as follows:
wherein omega is a Weights for prediction model a; sigma (sigma) MAPE,a 、σ MAPE,b MAPE values for prediction models a and b, respectively.
The hybrid prediction model weight is calculated as follows:
f s,x =w QPSO-BP ·f XGBoost,s,x +w XGBoost ·f QPSO-BP,s,x (22)
wherein f s,x Outputting a predicted value of an x-th class load of the scene s for the mixed model; w (w) QPSO-BP 、w XGBoost The weights of the QPSO-BP neural network and the XGBoost model are respectively; f (f) QPSO-BP,s,x 、f XGBoost,s,x The prediction values of the QPSO-BP neural network and the XGBoost model on the x-th class load of the scene s are respectively obtained; the s value of the output scene is 1, 2 and 3, and the s value represents three scenes of spring, autumn, summer and winter respectively; class x loads include cold loads, hot loads, and electrical loads;
step 5: and (3) carrying the data preprocessed in the step (1) into a mixed prediction model for calculation, and completing multi-element load prediction of the data center station of the sub-scene.
The QPSO-BP integral algorithm is divided into two parts, one part is a QPSO part, the other part is a BP neural network part, the prediction result output by the mixed prediction model has smaller error than the electrical load prediction result of a single QPSO-BP neural network, and the fitting degree is higher.
Step 5.1: the QPSO-BP neural network model calculation comprises a QPSO algorithm and a BP neural network algorithm as shown in figure 5;
the QPSO algorithm comprises the following steps:
step a1: inputting data and preprocessing the data;
step a2: randomly creating an initial population, and randomly assigning initial values to the positions and the speeds of the particles;
step a3: calculating the fitness of each particle according to the fitness function, comparing the fitness of each particle, and recording the particle position with the optimal fitness and the corresponding fitness value;
step a4: updating the position of the particle, comparing the current position of the particle with the previous optimal position, and updating the current position into the optimal position if the current position is better than the previous optimal position;
step a5: and when the iteration times reach the upper limit, stopping the operation, outputting the optimization result of the QPSO, and if the operation stopping condition is not reached, returning to the operation step a3 to continue the operation.
The BP neural network algorithm comprises the following steps:
step b1: determining a BP neural network topology structure;
step b2: initial BP neural network weight and threshold length;
step b3: optimizing an optimal weight and a threshold value through a QPSO algorithm;
step b4: calculating an error;
step b5: updating the weight threshold value;
step b6: and (4) outputting a result if the output condition is met, otherwise, returning to the operation step (4) to continue operation.
Step 5.2: the XGBoost model algorithm is shown in FIG. 6, and comprises the following steps:
step c1: inputting a training data set;
step c2: setting a target loss function;
step c3: determining a regression tree structure to calculate an independent tree structure q and a leaf weight w;
step c4: and (3) starting XGBoost iterative computation, outputting a predicted value if the iteration times are larger than the set times, and otherwise, returning to the step (c 3) to continue iteration.
As can be seen from fig. 2, the electric load and thermal load correlation coefficient is 0.43 in spring and autumn, the electric load and cold load correlation coefficient is 0.47, and the coupling relationship between the cold load and the electric load is not deep; as can be seen from FIG. 3, the electrical load and the cold load have a correlation coefficient of up to 0.87 in summer, which are in strong coupling relationship, indicating that a large part of the electricity of the data center station is used for refrigeration in summer; from fig. 4, it can be seen that the correlation coefficient between the electric load and the thermal load in winter is as high as 0.67, which reflects that the heating power consumption of the western station accounts for a large proportion of the data; and the influence of weather factors on the cold-hot electric load can be directly seen from figures 2-4.
The superiority of the hybrid prediction model provided by the patent can be demonstrated through simulation analysis shown in fig. 8, 9 and 10, and compared with a BP neural network and a QPSO-BP model, the curve of the prediction output result of the hybrid prediction model is more attached to an actual load curve, wherein the QPSO-BP model has higher prediction precision than the BP neural network, the prediction precision of the hybrid prediction model is more than 99.7 percent and the prediction precision of the QPSO-BP model is about 98.12 percent, and the prediction precision of the BP neural network is about 96.52 percent according to the three-kind load simulation calculation of cold, heat and electricity in fig. 8-10. The multi-element load prediction method for the data center station based on the mixed model prediction, which is provided by the patent, has superiority and feasibility.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced with equivalents; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions, which are defined by the scope of the appended claims.

Claims (5)

1. A data center station multi-element load prediction method based on mixed model prediction is characterized by comprising the following steps:
step 1: data collection and data preprocessing; acquiring historical data of a data center station within preset time, constructing a training set, and preprocessing the data, wherein the historical data comprises cold load, heat load, electric load, light intensity, wind speed, humidity, air pressure and date;
step 1.1: acquiring historical data of a data center station in preset time, and dividing the load of the data center station into three scenes of spring, autumn, summer and winter by adopting a clustering algorithm K-means method to predict scenes;
step 1.2: three meteorological characteristic factors of solar radiation quantity, temperature and air humidity are selected from historical data, the three meteorological characteristic factors are sequenced into the solar radiation quantity, the temperature and the air humidity in a training set, and then the cold-hot electric load and the environmental factors of a data center station are collected to form a training set X as follows:
wherein X is a training set; x is x e Electrical load of data center, x e (i) Is the i-th electrical load in the electrical load sequence; x is x h For heat load, x h (i) Is the i-th thermal load in the thermal load sequence; x is x c For cold load, x c (i) Is the i-th cold load in the cold load sequence; x is x R For the amount of solar radiation, x R (i) An i-th solar radiation amount in the radiation amount sequence; x is x T Is the temperature, x T (i) Is the i-th temperature amount in the temperature sequence; x is x M Is air humidity x M (i) Is the ith moisture content in the moisture sequence; m is the number of sequences of a sequence;
step 1.3: carrying out importance ranking on the load prediction feature data by adopting random forest out-of-bag estimation and carrying out feature selection;
the importance is calculated as follows:
wherein Q is the number of base learners; errOOB (oob) q An out-of-bag error for the q-th basis learner; errOOB' q The method comprises the steps that (1) the out-of-bag error after noise is added to a q-th basis learner, importance ranking is conducted on load prediction feature data by adopting random out-of-bag forest estimation, and feature selection is conducted;
step 1.4: calculating the correlation between the data load and the characteristic factors;
analyzing the correlation between the cold and hot loads and the electric loads of the data center station and the multielement loads and the meteorological influence factors according to three scenes of spring, autumn and winter, establishing matrixes formed by the cold and hot loads and the environmental influence factors under the three scenes, and calculating the strength, the magnitude and the order of the relationship between the cold and hot loads and the environmental influence factors of the data center station under the three scenes to obtain a correlation coefficient and a correlation degree;
in spring and autumn, the sequences of the cold-hot electric load and the environmental influence factors form the following matrix:
wherein X is 1 A matrix formed for spring and autumn Leng Redian load and environmental impact factors;
in summer, the sequences of the cold load, the electric load and the environmental impact factors form the following matrix:
wherein X is 2 The matrix is formed by the cold electric load in summer and the environmental influence factors;
in winter, the data sequences of the thermal load, the electric load and the environmental impact factors form the following matrixes:
wherein X is 3 A matrix formed for winter thermoelectric load and environmental impact factors;
the normalization processing is carried out on the original data, and the formula is as follows:
wherein x is selected original data; x is x max Is the maximum value of the sample data; x is x min Is the minimum value of the sample data; x' is the value after normalization;
correlation coefficient xi j Degree of association gamma j The calculation formula of (2) is as follows:
in xi j Correlation coefficient, ζ, for data class j j (k) The k-th association degree is the k-th association degree; gamma ray j Association degree for data category j; x is x 0 (k) The kth value of the sequence of normalized weather factors; x is x j (k) A kth value that is a normalized load sequence; ρ is a resolution coefficient, j represents the type of normalized data;
step 2, constructing a BP neural network model by adopting a quantum particle swarm algorithm QPSO;
step 3, constructing an XGBoost prediction model;
step 4, combining the QPSO-BP neural network model and the XGBoost prediction model to construct a hybrid prediction model, and calculating the weight of the hybrid prediction model; calculating the weight of the output results of the two models, setting a fusion model weight initial value by combining an average absolute percentage error reciprocal weight MAPE-RW algorithm with an error index, and searching an optimal weight value by combining the initial value to finally form an optimal load prediction model;
the MAPE-RW algorithm in step 4 is shown as follows:
wherein omega is a Weights for prediction model a; sigma (sigma) MAPE,a 、σ MAPE,b MAPE values of prediction models a and b, respectively;
the hybrid prediction model weight is calculated as follows:
f s,x =w QPSO-BP ·f XGBoost,s,x +w XGBoost ·f QPSO-BP,s,x (22)
wherein f s,x Outputting a predicted value of an x-th class load of the scene s for the mixed model; w (w) QPSO-BP 、w XGBoost The weights of the QPSO-BP neural network and the XGBoost model are respectively; f (f) QPSO-BP,s,x 、f XGBoost,s,x The prediction values of the QPSO-BP neural network and the XGBoost model on the x-th class load of the output scene s are respectively obtained;
step 5: and (3) carrying the data preprocessed in the step (1) into a mixed prediction model for calculation, and completing multi-element load prediction of the data center station of the sub-scene.
2. The method for predicting multiple loads of a data center station based on mixed model prediction according to claim 1, wherein the step 2 comprises the following steps:
step 2.1: the BP neural network is adopted to construct a predictive electric load calculation model of the data center station, and the formula is as follows:
wherein l is the number of hidden layer neurons in the model; n is the number of neurons of the input layer, m is the number of sequence amounts, and a is a constant between 1 and 10;
step 2.2: the Li Yongliang sub-particle swarm algorithm QPSO optimizes the neural network model.
3. The method for predicting multiple loads of a data center station based on mixed model prediction according to claim 2, wherein the step 2.2 comprises the following steps:
step 2.2.1: the average particle history optimum position is calculated as shown in the following formula:
m in the formula best The optimal position is the particle history; s is the size of the particle swarm; q (Q) local,i The position of the ith particle in the particle iteration;
step 2.2.2: updating the particle position as shown in the following formula:
q in i Updated positions for the ith particle; alpha 1 、α 2 Is [0,1]Random numbers between the two; q (Q) global Is the global optimal particle position;
step 2.2.3: and adopting the inverse of the error square sum of the electric load calculated value and the actual value as an individual fitness function, and constructing the fitness function, wherein the fitness function is shown in the following formula:
in E i The fitness of i populations; the y (i) th is the actual electrical load represented by the i th population of the data center station; s (i) is a predicted electrical load represented by an ith population of data center sites;n is population number;
after introducing the fitness function, the particle position function is updated as follows:
in which x is i Is the position of the ith particle; mu is [0,1]A uniform random number thereon; the χ is continuously updated along with the increase of the iteration times, and the particle position is kept to be optimal; n (N) max The maximum number of QPSO iterations; n (N) min Is the minimum number of QPSO iterations.
4. The method for predicting multiple loads of a data center station based on mixed model prediction according to claim 1, wherein the step 3 comprises the following steps:
step 3.1: establishing a regularized learning objective function;
for the training set X in the step 1, predicting a predicted value by adopting an additive function equation:
wherein L is a model minimum regularization objective function;predicted value for ith target +.>And the actual value y i The difference between them, i.e. the loss function; n is the sample size, K is the sample feature number, Ω (f k ) Calculating a variable f for the kth iteration k A complexity penalty function corresponding to the tree;
step 3.2: optimizing by using a gradient tree enhancement algorithm;
wherein the second order approximation of the objective function is optimized as:
in the method, in the process of the invention,g is the i-th predicted value in the t-th iteration i Is first order gradient data in the loss function; h is a i For second order gradient data in the loss function, f t (x i ) Calculating a variable for the t-th iteration, +.>Is a gradient sign;
step 3.3: the impurity fraction of the decision tree is evaluated as shown in the following formula:
in the method, in the process of the invention,the best weight for leaf j; l (L) (t) (q) is the optimum value of the formula structure q, I j Is a real set of leaves j in the gradient tree; gamma and lambda are XGBoost algorithm custom parameters, wherein gamma isA step function regular penalty term, wherein lambda is a second-order gradient function regular penalty term; t is the total number of leaf nodes in the gradient tree.
5. The method for predicting the multiple loads of the data center station based on the mixed model prediction according to claim 1, wherein the values of the output scenes s are 1, 2 and 3, and the three scenes are respectively represented by spring, autumn, summer and winter; the class x loads include cold loads, hot loads, and electrical loads.
CN202111048836.5A 2021-09-08 2021-09-08 Multi-element load prediction method for data center station based on hybrid model prediction Active CN113762387B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111048836.5A CN113762387B (en) 2021-09-08 2021-09-08 Multi-element load prediction method for data center station based on hybrid model prediction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111048836.5A CN113762387B (en) 2021-09-08 2021-09-08 Multi-element load prediction method for data center station based on hybrid model prediction

Publications (2)

Publication Number Publication Date
CN113762387A CN113762387A (en) 2021-12-07
CN113762387B true CN113762387B (en) 2024-02-02

Family

ID=78793770

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111048836.5A Active CN113762387B (en) 2021-09-08 2021-09-08 Multi-element load prediction method for data center station based on hybrid model prediction

Country Status (1)

Country Link
CN (1) CN113762387B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115660226B (en) * 2022-12-13 2023-04-25 国网冀北电力有限公司 Power load prediction model construction method and digital twin-based construction device
CN116050666B (en) * 2023-03-20 2023-07-18 中国电建集团江西省电力建设有限公司 Photovoltaic power generation power prediction method for irradiation characteristic clustering
CN116227741A (en) * 2023-05-05 2023-06-06 深圳市万物云科技有限公司 Water chilling unit energy saving method and device based on self-adaptive algorithm and related medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015172560A1 (en) * 2014-05-16 2015-11-19 华南理工大学 Central air conditioner cooling load prediction method based on bp neural network
CN110728401A (en) * 2019-10-10 2020-01-24 郑州轻工业学院 Short-term power load prediction method of neural network based on squirrel and weed hybrid algorithm
CN111260116A (en) * 2020-01-10 2020-06-09 河南理工大学 Time-interval refined short-term load prediction method based on BOA-SVR and fuzzy clustering
WO2020135510A1 (en) * 2018-12-29 2020-07-02 中兴通讯股份有限公司 Burst load prediction method and device, storage medium and electronic device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015172560A1 (en) * 2014-05-16 2015-11-19 华南理工大学 Central air conditioner cooling load prediction method based on bp neural network
WO2020135510A1 (en) * 2018-12-29 2020-07-02 中兴通讯股份有限公司 Burst load prediction method and device, storage medium and electronic device
CN110728401A (en) * 2019-10-10 2020-01-24 郑州轻工业学院 Short-term power load prediction method of neural network based on squirrel and weed hybrid algorithm
CN111260116A (en) * 2020-01-10 2020-06-09 河南理工大学 Time-interval refined short-term load prediction method based on BOA-SVR and fuzzy clustering

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Chen Li等.Power Load Forecasting Based on the Combined Model of LSTM and XGBoost.《prai'19 proceedings of the 2019 the international conference on pattern recognition and artificial intelligence》.2019,第46-51页. *
基于一种混合算法模型的短期电力负荷预测;尹新等;《计算机仿真》;第27卷(第10期);第255-258页 *

Also Published As

Publication number Publication date
CN113762387A (en) 2021-12-07

Similar Documents

Publication Publication Date Title
CN113762387B (en) Multi-element load prediction method for data center station based on hybrid model prediction
CN108022001B (en) Short-term load probability density prediction method based on PCA (principal component analysis) and quantile regression forest
CN110705743B (en) New energy consumption electric quantity prediction method based on long-term and short-term memory neural network
CN113282122B (en) Commercial building energy consumption prediction optimization method and system
CN112116144B (en) Regional power distribution network short-term load prediction method
CN106251001A (en) A kind of based on the photovoltaic power Forecasting Methodology improving fuzzy clustering algorithm
CN105701572B (en) Photovoltaic short-term output prediction method based on improved Gaussian process regression
CN112380765A (en) Photovoltaic cell parameter identification method based on improved balance optimizer algorithm
CN113554466B (en) Short-term electricity consumption prediction model construction method, prediction method and device
CN106682764A (en) Method for predicting other day air-conditioning load of public building based on parallel prediction strategy
CN115186803A (en) Data center computing power load demand combination prediction method and system considering PUE
CN115374995A (en) Distributed photovoltaic and small wind power station power prediction method
CN111832839B (en) Energy consumption prediction method based on sufficient incremental learning
CN115912502A (en) Intelligent power grid operation optimization method and device
CN112288157A (en) Wind power plant power prediction method based on fuzzy clustering and deep reinforcement learning
CN115759389A (en) Day-ahead photovoltaic power prediction method based on weather type similar day combination strategy
CN115907122A (en) Regional electric vehicle charging load prediction method
CN114611757A (en) Electric power system short-term load prediction method based on genetic algorithm and improved depth residual error network
CN110570042A (en) Short-term electric vehicle charging load prediction method and system
CN115481788B (en) Phase change energy storage system load prediction method and system
CN113762591B (en) Short-term electric quantity prediction method and system based on GRU and multi-core SVM countermeasure learning
CN116797274A (en) Shared bicycle demand prediction method based on Attention-LSTM-LightGBM
CN114234392B (en) Air conditioner load fine prediction method based on improved PSO-LSTM
CN113255223B (en) Short-term prediction method and system for air conditioner load
Gao et al. Establishment of economic forecasting model of high-tech industry based on genetic optimization neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant