CN115860260A - Resident air conditioner load prediction model considering frequency domain data characteristic decomposition - Google Patents

Resident air conditioner load prediction model considering frequency domain data characteristic decomposition Download PDF

Info

Publication number
CN115860260A
CN115860260A CN202211700889.5A CN202211700889A CN115860260A CN 115860260 A CN115860260 A CN 115860260A CN 202211700889 A CN202211700889 A CN 202211700889A CN 115860260 A CN115860260 A CN 115860260A
Authority
CN
China
Prior art keywords
model
algorithm
air conditioner
feature
decomposition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211700889.5A
Other languages
Chinese (zh)
Inventor
王世谦
韩丁
李秋燕
田春筝
白宏坤
王圆圆
宋大为
华远鹏
卜飞飞
王涵
贾一博
刘洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University
State Grid Corp of China SGCC
Economic and Technological Research Institute of State Grid Henan Electric Power Co Ltd
Original Assignee
Sichuan University
State Grid Corp of China SGCC
Economic and Technological Research Institute of State Grid Henan Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University, State Grid Corp of China SGCC, Economic and Technological Research Institute of State Grid Henan Electric Power Co Ltd filed Critical Sichuan University
Priority to CN202211700889.5A priority Critical patent/CN115860260A/en
Publication of CN115860260A publication Critical patent/CN115860260A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to a residential air conditioner load prediction model considering frequency domain data characteristic decomposition, which comprises the following steps: step 1: adopting a self-adaptive noise fully-integrated empirical mode decomposition algorithm, namely a CEEMDAN algorithm, to convert original air conditioner load data into components with different fluctuation periods; step 2: introducing a time series arrangement entropy, namely applying a PE algorithm to the field of time sequence air conditioner load prediction, and merging and reconstructing the numerical subcomponent modal characteristics after frequency domain decomposition; and step 3: adopting a data information optimization extraction algorithm selected based on the features of the Catboost; and 4, step 4: considering the influence of high-dimensional external characteristics on an air conditioner load mode, and constructing an air conditioner load prediction model based on an XGboost algorithm; the method has the advantages of performing data characteristic decomposition on the air conditioner load data in a frequency domain, providing a time sequence load permutation entropy algorithm, providing a data information optimization extraction algorithm and providing an air conditioner load prediction model.

Description

Resident air conditioner load prediction model considering frequency domain data characteristic decomposition
Technical Field
The invention belongs to the technical field of air conditioner load prediction, and particularly relates to a residential air conditioner load prediction model considering frequency domain data characteristic decomposition.
Background
In recent years, with the high-speed development of social economy in China, the living standard of residents is continuously improved, the electricity consumption of residents is gradually increased, and the load peak of a power grid system continuously breaks through the historical value, wherein the sudden increase of air conditioner load caused by extremely high temperature is a main reason for high innovation of electricity load in China; in a load prediction method, a method represented by an Extreme Learning Machine (ELM), a Deep Belief Network (DBN), an SVM, a BP neural network and a Random Forest (RF) is widely applied to short-term load prediction, the method converts a long sequence time dependence problem into a static modeling problem, and establishes a nonlinear function mapping relation between load input and output in the current time period through continuous training, however, the air-conditioning load time sequence is nonlinear, unstable, strong in randomness and large in fluctuation, so that the prediction precision is low and the performance of a prediction model is poor in the load prediction process, therefore, the air-conditioning load sequence is decomposed into a plurality of sub-components by an empirical mode decomposition method under the consideration of frequency domain data characteristics, the effective separation of an inherent mode and the frequency domain division of signals are realized, and the load prediction model has strong robustness; in the aspect of a data information optimization extraction algorithm, because short-term load prediction influence factors are complicated and complex, a large number of redundant features and irrelevant features exist, feature dimensionality can be effectively reduced by utilizing a feature selection method to perform information optimization extraction, and algorithm efficiency is improved, but the existing filtering type feature selection method based on minimum redundancy and maximum relevance only evaluates a single feature and does not consider the quality of a feature set, namely, a certain statistical index of each variable is independently calculated respectively, the relative importance degree between different indexes is judged according to the index, and the relatively unimportant indexes are eliminated, in addition, the existing filtering type feature selection method firstly performs feature selection on a data set and then trains a learner, the feature selection process is irrelevant to a subsequent learner, so that the generalization capability of a model is weak, and overfitting is easy to occur; in the aspect of an air conditioner load prediction model, a linear regression model and a neural network model are often adopted in the conventional method, wherein the linear regression method is difficult to model nonlinear data or correlation polynomial regression among data characteristics, cannot solve characteristic interaction in a data set and is difficult to well express highly complex data, so that the prediction precision is low, the neural network method is complex in model parameters, and the problems of difficulty in model hyper-parameter training, easiness in gradient explosion, overfitting training and the like are caused; secondly, the problem to be solved by the optimization method of the local search commonly used in the neural network is to solve the global extremum of the complex nonlinear function, so that the algorithm is likely to fall into the local extremum, and the training fails; finally, the selection of the neural network structure has no unified and complete theoretical guidance, and generally can be selected only by experience, so that the prediction accuracy of the air conditioner load is limited; therefore, it is necessary to provide a residential air conditioner load prediction model considering the frequency domain data characteristic decomposition, which is used for performing the data characteristic decomposition on the air conditioner load data in the frequency domain, providing a time sequence load arrangement entropy algorithm, providing a data information optimization extraction algorithm and providing an air conditioner load prediction model.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a residential air conditioner load prediction model considering frequency domain data characteristic decomposition, which is used for carrying out data characteristic decomposition on air conditioner load data in a frequency domain, providing a time sequence load permutation entropy algorithm, providing a data information optimization extraction algorithm and providing an air conditioner load prediction model.
The purpose of the invention is realized by the following steps: a residential air conditioner load prediction model considering frequency domain data characteristic decomposition comprises the following steps:
step 1: adopting a self-adaptive noise fully-integrated empirical mode decomposition algorithm, namely a CEEMDAN algorithm, to convert original air conditioner load data into components with different fluctuation periods;
and 2, step: introducing a time series arrangement entropy, namely applying a PE algorithm to the field of time sequence air conditioner load prediction, and merging and reconstructing the numerical subcomponent modal characteristics after frequency domain decomposition;
and 3, step 3: in order to reduce overfitting and enhance the generalization capability of the model, a data information optimization extraction algorithm based on the selection of the features of the Catboost is adopted;
and 4, step 4: and considering the influence of high-dimensional external characteristics on the air conditioner load mode, and constructing an XGboost algorithm-based air conditioner load prediction model.
The fully integrated empirical mode decomposition algorithm of the adaptive noise in the step 1 specifically comprises the following steps: the method improves the traditional EMD algorithm by adding Gaussian white noise, provides a self-adaptive noise fully integrated empirical mode decomposition algorithm, adds a plurality of groups of self-adaptive Gaussian white noise, obtains IMF components by averaging the result, ensures that the decomposition process has integrity by a unique residual error calculation mode, improves the inherent mode aliasing phenomenon of the prior EMD, greatly reduces the reconstruction error, ensures that the reconstructed signal is almost the same as the original signal, defines L (t) as the original load sequence and E (t) as the original load sequence i (. To) decompose the i-th component, w, for the EMD sequence i (t) is a set of white Gaussian noises, and the quantity of the white Gaussian noises is consistent with the original load length of L (t); epsilon i Is the white noise amplitude coefficient of the i-th stage,
Figure BDA0004024068620000031
the k-th component is decomposed for the CEEMDAN sequence.
The specific flow of the fully integrated empirical mode decomposition algorithm of the adaptive noise in the step 1 comprises the following steps:
step 1.1: generating M groups of Gaussian noise random values w 1 (t),w 2 (t),...,w M (t) }, obtaining a load curve { L (t) + epsilon of the superimposed noise 0 w 1 (t),L(t)+ε 0 w 2 (t),...,L(t)+ε 0 w M (t), obtaining IMF component { I) by EMD sequence decomposition method 1,1 ,I 2,1 ,...,I M,1 The CEEMDAN component is obtained by taking the mean value, i.e.:
Figure BDA0004024068620000032
step 1.2: computing the residual sequence r of stage 1 1 (t), namely:
Figure BDA0004024068620000033
step 1.3: decomposing M sets of sequences { r ] by EMD 1 (t)+ε 1 E 1 (w 1 (t)), I =1,2,. M }, each group of sequences stops decomposing when the 1 st IMF component is obtained, and the 2 nd component I is obtained by averaging the M IMF components 2 Namely:
Figure BDA0004024068620000041
step 1.4: for the k-th stage, the remaining residuals and components are obtained by equations (4) - (5):
Figure BDA0004024068620000042
step 1.5: repeat step 1.44 until residual sequence r k (t) the number of extrema n is less than a threshold, typically set to 2, then the CEEMDAN decomposition is complete, at which point L (t) is decomposed into a series of modal components
Figure BDA0004024068620000043
And residual R (t), i.e.:
Figure BDA0004024068620000044
The time series arrangement entropy algorithm in the step 2 specifically comprises the following steps: the complexity of the time sequence is measured by defining the time sequence permutation entropy to be used as the basis for combining and recombining the subcomponents, if the number of the subcomponents is large, the numerical difference of the permutation entropy is small, the load fluctuation mode forms are close, the subcomponents can be combined and recombined, and the computing resources and the time cost of the subsequent prediction work are saved.
The specific flow of the time series arrangement entropy algorithm in the step 2 comprises the following steps:
and 2. Step 2.1: for one-dimensional timing load x Load ={x 1 ,x 2 ,...,x N And reconstructing the phase space of the image into a two-dimensional matrix X, namely:
Figure BDA0004024068620000045
in the formula: l represents embedding dimension and determines the sampling number of the row vectors; τ represents the number of interval samples;
step 2.2: for the reconstructed row vector X in X i The elements of (a) are sorted in a descending order to obtain a set of matrix element coordinate indexes { (i, j) 1 ),(i,j 2 ),...,(i,j L ) Is made to satisfy
Figure BDA0004024068620000046
The larger the ordinate index value is, the more the element values are the same, the higher the ranking is;
step 2.3: for arbitrary row vectors X i Defining a corresponding load fluctuation pattern S i ={j 1 ,j 2 ,...,j L Then there is a total of L! Counting the probability of all fluctuation modes in X { P } 1 ,P 2 ,...,P C Define timing loads x Load ={x 1 ,x 2 ,...,x N The permutation entropy H (L) of } is:
Figure BDA0004024068620000051
Figure BDA0004024068620000052
the permutation entropy is normalized to be between 0 and 1 by the formula (9), the closer H is to 1, the richer the fluctuation mode is, and the closer H is to 0, the monotonous the fluctuation mode is.
The specific flow of the data information optimization extraction algorithm selected based on the Catboost characteristics in the step 3 comprises the following steps:
step 3.1: the category characteristics such as time, weather and the like can be better processed by the Catboost compared with the traditional gradient lifting decision tree algorithm, the traditional Greeny TBS takes the category label mean value as the standard of node splitting, and the number of samples is trained and testedWhen the distribution of the data set is different, the problem of condition deviation is easy to occur, and the Catboost effectively reduces the influence of noise and low-frequency secondary sample data on the distribution by adding prior distribution, as shown in formula (10):
Figure BDA0004024068620000053
in the formula: σ = { σ = 12 ,...,σ n When }, when>
Figure BDA0004024068620000054
When it is in motion [ ]]=1, otherwise [ ·]=0;
Figure BDA0004024068620000055
Is a category label value; p is prior; a is a weight coefficient;
Figure BDA0004024068620000056
Average label values for the training set;
Figure BDA0004024068620000057
Classifying feature values for the training set;
step 3.2: the feature importance degree can be evaluated by the Catboost in the training process, based on the feature importance degree, a plurality of feature selection strategies can be constructed, PVC represents the average fluctuation amount of the predicted value of the Catboost model when the unit of the feature value changes, and if the importance degree of the feature relative to the model is higher, the PVC is also larger, as shown in a formula (11); the LFC reflects the effect of the characteristics on accelerating the convergence of the model by comparing the change condition of the Catboost model loss function if the characteristics exist, as shown in formula (13):
Figure BDA0004024068620000058
Figure BDA0004024068620000059
in the formula, W l 、V l 、W r 、V r Respectively representing the weight and the target value of the left leaf and the weight and the target value of the right leaf; LFC = L (X) -L (X) i ) (13) in the formula, X represents an input set { X ] having N feature components 1 ,x 2 ,...,x N };X i Representing a set of inputs { x ] having N-1 feature components 1 ,x 2 ,...,x i-1 ,x i+1 ,...,x N }; l (-) represents the loss function value of the model after the input features; the evaluation index I obtained by weighting calculation of PVC and LFC can give consideration to the advantages of PVC and LFC in different application scenes to comprehensively embody the characteristic importance semantics, as shown in formula (14): i = a · PVC + b · LFC (14), wherein: a and b are weight coefficients, and the importance degree of PVC or LFC indexes can be enhanced by adjusting the sizes of a and b, so that the method is suitable for differentiated application scenes;
step 3.3: and (3) completing a feature selection process by adopting a recursive feature reduction method, searching an optimal feature subset based on a greedy strategy, and removing the least important features by repeatedly constructing a model.
The recursive feature reduction method in step 3.3 specifically includes the following steps:
step 3.31: initializing parameters: input load data and associated impact signature X = { X = 1 ,x 2 ,...,x N As an argument, predicted data Y = { Y = 1 ,y 2 ,...,y M As a dependent variable;
step 3.32: generating a Catboost model: the first stage is to generate a regression tree based on greedy algorithm by calculating different features X in the feature set X i And selecting the feature x with the minimum MSE error i Constructing an optimal tree model; the second stage is gradient lifting, a new regression tree is continuously constructed in the gradient descending direction of the current regression tree, and finally a plurality of regression trees are integrated to obtain a Catboost gradient lifting regression tree model;
step 3.33: removing characteristics: all features { x ] are calculated by equation (14) 1 ,x 2 ,...,x N Importance measure of { I } 1 ,I 2 ,...,I N And in descending order of feature importance I k1 ≥I k2 ≥...≥I kN Obtaining the sorting result
Figure BDA0004024068620000061
Record the current time moldPrediction accuracy p of patterns on a test set N And feature combination>
Figure BDA0004024068620000062
Rejection of the feature of lowest feature importance>
Figure BDA0004024068620000063
Judge the remaining characteristic->
Figure BDA0004024068620000064
Whether the number is equal to 1 or not, if not, inputting the number into a new Catboost model again for learning and training, and repeating the steps from 3.32 to 3.33; if yes, entering step 4;
step 3.34: optimal feature subset selection: load prediction accuracy p of different feature numbers N ,p N-1 ,...,p 1 Sorting in descending order, and assuming that the number of characteristic inputs is j, having the highest prediction precision
Figure BDA0004024068620000071
The corresponding optimal characteristic input is £ er>
Figure BDA0004024068620000072
The air conditioner load prediction model based on the XGboost algorithm in the step 4 specifically comprises the following steps:
step 4.1: XGboost is an integrated learning algorithm with a tree model as a basic model, the principle is that the overall performance of the model is improved by building and integrating a plurality of basic learner training results, the traditional gradient lifting decision tree (GBDT) is used as the basis, XGboost performs second-order Taylor expansion on a loss function of the XGboost, in addition, a regular term is added in the model, the model is accelerated to be converged while the complexity of the model is effectively reduced, the number of samples to be predicted is given as N, and an air conditioner load data set with the characteristic number of M is as follows: d = { (x) i ,y i ):i=1,2,...,n,x i ∈R M ,y i The method belongs to the field of the following integrated prediction models, wherein the integrated prediction models belong to the group R and have K classification regression decision trees (CART):
Figure BDA0004024068620000073
in the formula, y i * Representing an air conditioner load prediction result of the XGBoost model; k is the decision tree number size; f. of k Represents the kth CART decision tree, each f k All have corresponding independent decision tree structures and leaf nodes with different weights; f is a set space representing a decision tree, and its specific meaning is:
Figure BDA0004024068620000074
In the formula: q denotes a respective independent decision tree structure in which all samples can find their corresponding leaf node by mapping under the decision tree and pass ≥ h>
Figure BDA0004024068620000075
Further mapping information contained in the leaf node into a specific numerical value;
step 4.2: calculating the deviation between the model prediction result and the real air conditioner load by defining an objective function, and training the XGboost model by taking a minimized loss function as a target, wherein the objective function is defined as:
Figure BDA0004024068620000076
in the formula: l (-) represents the loss function error, and adopts the mean square error; the regularization term Ω (-) is defined as:
Figure BDA0004024068620000077
In the formula, T represents the number of leaf nodes, and the complexity of the tree structure can be set by changing the number of the leaf nodes; w is a j Representing the weight of the jth leaf node, and keeping the weight at a smaller value can effectively prevent overfitting; gamma and lambda represent penalty coefficients, and the relative importance degree of the two penalty items can be set by changing the numerical values of the two coefficients;
step 4.3: based on forward step-by-step algorithm, by optimizing newly added CART decision tree f t To minimize the objective function, the t-th step removes the constant term and applies the second order Taylor expansionFunction Obj (t) Comprises the following steps:
Figure BDA0004024068620000081
in the formula: I.C. A j Representing all sample number sets mapped into jth leaf nodes through the CART decision tree;
Figure BDA0004024068620000082
Respectively representing the first derivative and the second derivative of the loss function; formula (20) relates to w j The optimal leaf node weight under a certain specific CART decision tree can be obtained by the following equation:
Figure BDA0004024068620000083
Substituting equation (20) into equation (19) results in the optimal objective function corresponding to the specific CART decision tree:
Figure BDA0004024068620000084
Combining formula (15), training the XGboost model with formula (21) as an objective function, and obtaining a final training result, which is an air conditioning load time sequence data set obtained by the air conditioning load prediction model, and recording as:
Figure BDA0004024068620000085
The invention has the beneficial effects that: the invention relates to a residential air conditioner load prediction model considering frequency domain data characteristic decomposition, which has the following advantages in use: 1. aiming at the problems of low prediction precision and poor performance of a load prediction model caused by large fluctuation and strong randomness of data in original air conditioner load data, a complete empirical mode decomposition algorithm of adaptive noise is provided to carry out data characteristic decomposition on the air conditioner load data in a frequency domain, the original air conditioner load data can be converted into components with different fluctuation periods, the problems of large fluctuation, strong randomness, difficult prediction and the like of the original air conditioner load data can be effectively improved, and the precision of load prediction is improved; 2. aiming at the problem of reconstructing and combining the subcomponents after the sequence decomposition, a time sequence load permutation entropy algorithm is provided, and the subcomponents after the sequence decomposition are permutated and recombined by taking the time sequence complexity as a measurement principle, so that the computing resources in the load prediction process can be saved, and the efficiency and the performance of the prediction algorithm can be obviously improved; 3. aiming at the problem that the distribution characteristics of high-dimensional multi-source data are difficult to process, a data information optimization extraction algorithm based on Catboost feature selection is provided, noise and low frequency are effectively reduced by adding prior distribution, the influence of secondary sample data on distribution is realized, the category features such as time, weather and the like can be better processed, the number of features is reduced, the dimension is reduced, the generalization capability of a model is stronger, and overfitting is reduced; 4. aiming at the problem of poor robustness of nonlinear time sequence characteristic prediction of air conditioning load, an XGboost-based air conditioning load prediction model is provided, a tree model is promoted through the interior, missing values can be automatically processed, the robustness of the model is enhanced, in addition, the XGboost model supports column sampling, overfitting can be reduced, the calculation efficiency of an algorithm can be improved, and further accurate prediction of the air conditioning load is realized; the method has the advantages of performing data characteristic decomposition on the air conditioner load data in a frequency domain, providing a time sequence load permutation entropy algorithm, providing a data information optimization extraction algorithm and providing an air conditioner load prediction model.
Drawings
FIG. 1 is a general method roadmap for the present invention.
FIG. 2 is a flow chart of the feature selection of the Catboost model according to the present invention.
FIG. 3 is a flow chart of the XGboost model training process of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
Example 1
As shown in fig. 1 to 3, a residential air conditioning load prediction model considering frequency domain data characteristic decomposition comprises the following steps:
step 1: the method comprises the steps of adopting a self-adaptive noise fully-integrated empirical mode decomposition algorithm, namely a CEEMDAN algorithm, converting original air conditioner load data into components with different fluctuation periods, and avoiding adverse effects on load prediction precision caused by overlarge direct prediction of air conditioner load data fluctuation;
step 2: the time series arrangement entropy, namely the PE algorithm is introduced to the time sequence air conditioner load prediction field, the numerical sub-component modal characteristics after frequency domain decomposition are combined and reconstructed, and the prediction precision of the air conditioner load can be considered while the time complexity of the subsequent load prediction operation is reduced;
and 3, step 3: by adopting a data information optimization extraction algorithm selected based on the features of the Catboost and adding prior distribution, noise and low frequency are effectively reduced, the influence of secondary sample data on distribution is realized, the number of features is reduced, the dimension is reduced, the generalization capability of the model is stronger, and overfitting is reduced;
and 4, step 4: the influence of high-dimensional external characteristics on an air conditioner load mode is considered, an XGboost algorithm-based air conditioner load prediction model is constructed, a tree model is promoted through the inside, missing values can be automatically processed, and the robustness of the model is enhanced.
The invention relates to a residential air conditioner load prediction model considering frequency domain data characteristic decomposition, which has the following advantages in use: 1. aiming at the problems of low prediction accuracy and poor performance of a load prediction model caused by high volatility and strong randomness of data in original air conditioner load data, a complete empirical mode decomposition algorithm of adaptive noise is provided to carry out data characteristic decomposition on the air conditioner load data in a frequency domain, the original air conditioner load data can be converted into components with different fluctuation cycles, the problems of high volatility, strong randomness, difficult prediction and the like of the original air conditioner load data can be effectively improved, and the accuracy of load prediction is improved; 2. aiming at the problem of reconstructing and combining the subcomponents after the sequence decomposition, a time sequence load permutation entropy algorithm is provided, and the subcomponents after the sequence decomposition are permutated and recombined by taking the time sequence complexity as a measurement principle, so that the computing resources in the load prediction process can be saved, and the efficiency and the performance of the prediction algorithm can be obviously improved; 3. aiming at the problem that the distribution characteristics of high-dimensional multi-source data are difficult to process, a data information optimization extraction algorithm based on Catboost feature selection is provided, noise and low frequency are effectively reduced by adding prior distribution, the influence of secondary sample data on distribution is realized, the category features such as time, weather and the like can be better processed, the number of features is reduced, the dimension is reduced, the generalization capability of a model is stronger, and overfitting is reduced; 4. aiming at the problem of poor robustness of nonlinear time sequence characteristic prediction of air conditioning load, an XGboost-based air conditioning load prediction model is provided, a tree model is promoted through the interior, missing values can be automatically processed, the robustness of the model is enhanced, in addition, the XGboost model supports column sampling, overfitting can be reduced, the calculation efficiency of an algorithm can be improved, and further accurate prediction of the air conditioning load is realized; the method has the advantages of performing data characteristic decomposition on the air conditioner load data in a frequency domain, providing a time sequence load permutation entropy algorithm, providing a data information optimization extraction algorithm and providing an air conditioner load prediction model.
Example 2
As shown in fig. 1 to 3, a residential air conditioning load prediction model considering frequency domain data characteristic decomposition comprises the following steps:
step 1: adopting a self-adaptive noise fully-integrated empirical mode decomposition algorithm, namely a CEEMDAN algorithm, to convert original air conditioner load data into components with different fluctuation periods;
and 2, step: introducing a time series arrangement entropy, namely applying a PE algorithm to the field of time sequence air conditioner load prediction, and merging and reconstructing the numerical subcomponent modal characteristics after frequency domain decomposition;
and step 3: in order to reduce overfitting and enhance the generalization capability of the model, a data information optimization extraction algorithm based on Catboost feature selection is adopted;
and 4, step 4: and considering the influence of high-dimensional external characteristics on the air conditioner load mode, and constructing an XGboost algorithm-based air conditioner load prediction model.
In this embodiment, as shown in fig. 1, the present invention extracts residential air conditioning load data from residential electricity data, recombines the original air conditioning load sequence into an eigenmode component and a residual through a CEEMDAN sequence decomposition algorithm, merges subcomponents by using a time sequence arrangement entropy to take into account the efficiency and performance of a prediction model, optimizes an optimal input feature subset of subcomponents based on a Catboost recursive feature elimination algorithm, inputs the result into an XGBOOST model for training an air conditioning load prediction model, and sums the prediction results of each subcomponent to realize accurate and effective air conditioning load prediction.
The fully integrated empirical mode decomposition algorithm of the adaptive noise in the step 1 specifically comprises the following steps: the traditional empirical mode decomposition algorithm, namely the EMD algorithm, is usually applied to the field of signals, processes and analyzes signals which are large in volatility, strong in randomness, non-stable and non-linear, and can extract periodic fluctuation components IMF with regularity from any one signal, wherein the extracted IMF must meet two conditions at the same time: (1) the method comprises the following steps Number of extreme IMF values n 1 And the number n of zero crossings 2 Satisfy | | n 1 -n 2 Less than or equal to 1; (2) the method comprises the following steps An upper envelope L is formed by smoothly connecting local maximum and minimum values max (t), lower envelope L min (t),L max (t) and L min (t) should be symmetrical about the time axis; the method is characterized in that a plurality of groups of self-adaptive white Gaussian noises are added, the result is averaged to obtain an IMF component, the unique residual error calculation mode enables the decomposition process to have integrity, the inherent modal aliasing phenomenon of the existing EMD is improved, the reconstruction error is greatly reduced, the reconstructed signal is almost the same as the original signal, L (t) is defined as an original load sequence, and E (t) is defined as an original load sequence i (. To) decompose the i-th component, w, for the EMD sequence i (t) is a set of white Gaussian noises, and the quantity of the white Gaussian noises is consistent with the original load length of L (t); epsilon i Is the white noise amplitude coefficient of the i-th stage,
Figure BDA0004024068620000123
the k-th component is decomposed for the CEEMDAN sequence.
The specific flow of the fully integrated empirical mode decomposition algorithm of the adaptive noise in the step 1 comprises the following steps:
step 1.1: generating M groups of Gaussian noise random values w 1 (t),w 2 (t),...,w M (t) }, obtaining a load curve { L (t) + epsilon of the superimposed noise 0 w 1 (t),L(t)+ε 0 w 2 (t),...,L(t)+ε 0 w M (t), obtaining IMF component { I } by EMD sequence decomposition method 1,1 ,I 2,1 ,...,I M,1 The CEEMDAN component is obtained by taking the mean value, i.e.:
Figure BDA0004024068620000121
step 1.2: computing the residual sequence r of stage 1 1 (t), namely:
Figure BDA0004024068620000122
step 1.3: decomposing M sets of sequences { r ] by EMD 1 (t)+ε 1 E 1 (w 1 (t)), I =1,2,. M }, each group of sequences stops decomposing when the 1 st IMF component is obtained, and the 2 nd component I is obtained by averaging the M IMF components 2 Namely:
Figure BDA0004024068620000131
step 1.4: for the k-th stage, the remaining residuals and components are obtained by equations (4) - (5):
Figure BDA0004024068620000132
step 1.5: repeat step 1.44 until residual sequence r k (t) the number of extrema n is less than a threshold, the CEEMDAN decomposition is complete, at which point L (t) is decomposed into a series of modal components
Figure BDA0004024068620000136
And residual R (t), i.e.:
Figure BDA0004024068620000133
the time series arrangement entropy algorithm in the step 2 specifically comprises the following steps: the complexity of the time sequence is measured by defining the time sequence permutation entropy to be used as the basis for combining and recombining the subcomponents, if the number of the subcomponents is large, the numerical difference of the permutation entropy is small, the load fluctuation mode forms are close, the subcomponents can be combined and recombined, and the computing resources and the time cost of the subsequent prediction work are saved.
The specific flow of the time series arrangement entropy algorithm in the step 2 comprises the following steps:
step 2.1: for one-dimensional timing load x Load ={x 1 ,x 2 ,...,x N And reconstructing the phase space of the image into a two-dimensional matrix X, namely:
Figure BDA0004024068620000134
in the formula: l represents embedding dimension and determines the number of line vector samples; τ represents the number of interval samples;
step 2.2: for the reconstructed row vector X in X i The elements of (a) are sorted in a descending order to obtain a set of matrix element coordinate indexes { (i, j) 1 ),(i,j 2 ),...,(i,j L ) Is caused to satisfy
Figure BDA0004024068620000135
The larger the ordinate index value is, the more the element values are the same, the higher the ranking is;
step 2.3: for arbitrary row vectors X i Defining a corresponding load fluctuation pattern S i ={j 1 ,j 2 ,...,j L Then there is a total of L! Counting the probability of all fluctuation modes in X { P } 1 ,P 2 ,...,P C Define the time sequence load x Load ={x 1 ,x 2 ,...,x N The permutation entropy H (L) of } is:
Figure BDA0004024068620000141
Figure BDA0004024068620000142
the permutation entropy is normalized to be between 0 and 1 through the formula (9), the closer H is to 1, the richer the fluctuation mode is, the closer H is to 0, and the monotonous fluctuation mode is.
The specific flow of the data information optimization extraction algorithm selected based on the Catboost characteristics in the step 3 comprises the following steps:
step 3.1: compared with the traditional gradient lifting decision tree algorithm, the Catboost can better process the category characteristics of time, weather and the like, the traditional Greeny TBS takes the category label mean value as the standard of node splitting, when the distribution of training and testing sample data sets is different, the problem of condition deviation is easy to occur, the Catboost effectively reduces the influence of noise and low-frequency sample data on the distribution by adding prior distribution, and the formula (10) shows that:
Figure BDA0004024068620000143
in the formula: σ = { σ = 12 ,...,σ n When }, when->
Figure BDA0004024068620000144
When it is in motion [ ]]=1,
Otherwise [ ·]=0;
Figure BDA0004024068620000145
Is a category label value; p is prior; a is a weight coefficient;
Figure BDA0004024068620000146
Average labeling of training sets
A value;
Figure BDA0004024068620000147
classifying feature values for the training set;
step 3.2: the feature importance degree can be evaluated by the Catboost in the training process, based on the feature importance degree, a plurality of feature selection strategies can be constructed, PVC represents the average fluctuation amount of the predicted value of the Catboost model when the unit of the feature value changes, and if the importance degree of the feature relative to the model is higher, the PVC is also larger, as shown in a formula (11); the LFC reflects the effect of the characteristics on accelerating the convergence of the model by comparing the change condition of the Catboost model loss function if the characteristics exist, as shown in formula (13):
Figure BDA0004024068620000148
Figure BDA0004024068620000149
in the formula, W l 、V l 、W r 、V r Respectively representing the weight and the target value of the left leaf and the weight and the target value of the right leaf; LFC = L (X) -L (X) i ) (13), wherein X represents an input set having N feature components { X } 1 ,x 2 ,...,x N };X i Representing a set of inputs { x ] having N-1 feature components 1 ,x 2 ,...,x i-1 ,x i+1 ,...,x N }; l (-) represents the loss function value of the model after the input features; the evaluation index I obtained by weighting calculation of PVC and LFC can give consideration to the advantages of PVC and LFC in different application scenes to comprehensively embody the characteristic importance semantics, as shown in formula (14): i = a · PVC + b · LFC (14), wherein: a and b are weight coefficients, and the importance degree of PVC or LFC indexes can be enhanced by adjusting the sizes of a and b, so that the method is suitable for differentiated application scenes;
step 3.3: and (3) completing a feature selection process by adopting a recursive feature reduction method, searching an optimal feature subset based on a greedy strategy, and removing the least important features by repeatedly constructing a model.
The recursive feature reduction method in step 3.3 specifically includes the following steps:
step 3.31: initializing parameters: input load data and associated impact signature X = { X = { X } 1 ,x 2 ,...,x N As an argument, predicted data Y = { Y = 1 ,y 2 ,...,y M As a dependent variable;
step 3.32: generating a Catboost model: the first stage is to generate a regression tree based on greedy algorithm by calculating different features X in the feature set X i And selecting the feature x with the minimum MSE error i Constructing an optimal tree model; the second stage is gradient lifting, a new regression tree is continuously constructed in the gradient descending direction of the current regression tree, and finally a plurality of regression trees are integrated to obtain a Catboost gradient lifting regression tree model;
step 3.33: removing characteristics: all features { x ] are calculated by equation (14) 1 ,x 2 ,...,x N Importance metric of { I } the importance of the target 1 ,I 2 ,...,I N And in descending order of feature importance I k1 ≥I k2 ≥...≥I kN Obtaining the sorting result
Figure BDA0004024068620000151
Recording the prediction accuracy p of the model on the test set at the moment N And combinations of features
Figure BDA0004024068620000155
Rejection of the feature of lowest feature importance>
Figure BDA0004024068620000152
Judge the remaining characteristic->
Figure BDA0004024068620000153
Whether the number is equal to 1 or not, if not, inputting the number into a new Catboost model again for learning and training, and repeating the steps from 3.32 to 3.33; if yes, entering step 4;
step 3.34: optimal feature subset selection: load prediction accuracy p of different feature numbers N ,p N-1 ,...,p 1 Sorting in descending order, and assuming that the number of characteristic inputs is j, having the highest prediction precision
Figure BDA0004024068620000154
The corresponding optimal characteristic input is £ er>
Figure BDA0004024068620000161
The overall process of feature selection of the Catboost model of the invention is shown in FIG. 2.
The air conditioner load prediction model based on the XGboost algorithm in the step 4 specifically comprises the following steps:
step 4.1: XGboost is an integrated learning algorithm based on a tree model and based on the principle that the XGboost is constructed and executedThe method is characterized in that a plurality of basis learning device training results are integrated to improve the overall performance of a model, on the basis of a traditional gradient lifting decision tree (GBDT), XGboost performs second-order Taylor expansion on a loss function of the XGboost, in addition, a regular term is added in the model, the complexity of the model is effectively reduced, meanwhile, the model is accelerated to converge, the number of samples to be predicted is N, and an air conditioner load data set with the characteristic number of M is as follows: d = { (x) i ,y i ):i=1,2,...,n,x i ∈R M ,y i The method belongs to the field of the following integrated prediction models, wherein the integrated prediction models belong to the group R and have K classification regression decision trees (CART):
Figure BDA0004024068620000162
in the formula, y i * Representing an air conditioner load prediction result of the XGboost model; k is the decision tree number size; f. of k Represents the kth CART decision tree, each f k All have corresponding independent decision tree structures and leaf nodes with different weights; f is a set space representing a decision tree, and its specific meaning is:
Figure BDA0004024068620000163
In the formula: q denotes a respective independent decision tree structure in which all samples can find their corresponding leaf node by mapping under the decision tree and by &>
Figure BDA0004024068620000164
Further mapping information contained in the leaf node into a specific numerical value;
step 4.2: calculating the deviation between the model prediction result and the real air conditioner load by defining an objective function, and training the XGBoost model by taking a minimized loss function as a target, wherein the objective function is defined as:
Figure BDA0004024068620000165
in the formula: l (-) represents the loss function error, and adopts the mean square error; the regularization term Ω (-) is defined as:
Figure BDA0004024068620000166
In the formula, T represents the number of leaf nodes, and the complexity of the tree structure can be set by changing the number of the leaf nodes; w is a j Representing the weight of the jth leaf node, keeping the weight at a small value can be effective to prevent overfitting; gamma and lambda represent penalty coefficients, and the relative importance degree of the two penalty terms can be set by changing the numerical values of the two coefficients;
step 4.3: based on forward stepwise algorithm, through optimizing newly-added CART decision tree f t To minimize the objective function, the t step removes the constant term and applies the objective function Obj of the second order Taylor expansion (t) Comprises the following steps:
Figure BDA0004024068620000171
in the formula: i is j Representing all sample number sets mapped into jth leaf nodes through the CART decision tree;
Figure BDA0004024068620000172
Respectively representing the first derivative and the second derivative of the loss function; formula (20) relates to w j The optimal leaf node weight under a certain specific CART decision tree can be obtained by the following equation:
Figure BDA0004024068620000173
Substituting equation (20) into equation (19) yields the corresponding optimal objective function under the specific CART decision tree:
Figure BDA0004024068620000174
In the combined formula (15), the XGboost model is trained with the formula (21) as an objective function, the training process is shown in fig. 3, and the finally obtained training result is the air conditioning load time sequence data set obtained by the air conditioning load prediction model, and is recorded as:
Figure BDA0004024068620000175
The invention relates to a residential air conditioner load prediction model considering frequency domain data characteristic decomposition, which has the following advantages in use: 1. aiming at the problems of low prediction precision and poor performance of a load prediction model caused by large fluctuation and strong randomness of data in original air conditioner load data, a complete empirical mode decomposition algorithm of adaptive noise is provided to carry out data characteristic decomposition on the air conditioner load data in a frequency domain, the original air conditioner load data can be converted into components with different fluctuation periods, the problems of large fluctuation, strong randomness, difficult prediction and the like of the original air conditioner load data can be effectively improved, and the precision of load prediction is improved; 2. aiming at the problem of reconstructing and combining the subcomponents after the sequence decomposition, a time sequence load permutation entropy algorithm is provided, and the subcomponents after the sequence decomposition are permutated and recombined by taking the time sequence complexity as a measurement principle, so that the computing resources in the load prediction process can be saved, and the efficiency and the performance of the prediction algorithm can be obviously improved; 3. aiming at the problem that the distribution characteristics of high-dimensional multi-source data are difficult to process, a data information optimization extraction algorithm based on Catboost feature selection is provided, noise and low frequency are effectively reduced by adding prior distribution, the influence of secondary sample data on distribution is realized, the category features such as time, weather and the like can be better processed, the number of features is reduced, the dimension is reduced, the generalization capability of a model is stronger, and overfitting is reduced; 4. aiming at the problem of poor robustness of nonlinear time sequence characteristic prediction of air conditioning load, an XGboost-based air conditioning load prediction model is provided, a tree model is promoted through the interior, missing values can be automatically processed, the robustness of the model is enhanced, in addition, the XGboost model supports column sampling, overfitting can be reduced, the calculation efficiency of an algorithm can be improved, and further accurate prediction of the air conditioning load is realized; the method has the advantages of performing data characteristic decomposition on the air conditioner load data in a frequency domain, providing a time sequence load permutation entropy algorithm, providing a data information optimization extraction algorithm and providing an air conditioner load prediction model.

Claims (8)

1. A residential air conditioner load prediction model considering frequency domain data characteristic decomposition is characterized in that: it comprises the following steps:
step 1: adopting a self-adaptive noise fully-integrated empirical mode decomposition algorithm, namely a CEEMDAN algorithm, to convert original air conditioner load data into components with different fluctuation periods;
step 2: introducing a time series arrangement entropy algorithm, namely applying a PE algorithm to the time sequence air conditioner load prediction field, and merging and reconstructing the numerical subcomponent modal characteristics after frequency domain decomposition;
and step 3: in order to reduce overfitting and enhance the generalization capability of the model, a data information optimization extraction algorithm based on the selection of the features of the Catboost is adopted;
and 4, step 4: and considering the influence of high-dimensional external characteristics on the air conditioner load mode, and constructing an XGboost algorithm-based air conditioner load prediction model.
2. The residential air conditioning load prediction model considering frequency domain data characteristic decomposition as claimed in claim 1, wherein: the fully integrated empirical mode decomposition algorithm of the adaptive noise in the step 1 specifically comprises the following steps: the traditional EMD algorithm is improved by adding Gaussian white noise, a self-adaptive noise complete integration empirical mode decomposition algorithm is provided, the algorithm adds a plurality of groups of self-adaptive Gaussian white noise, the result is averaged to obtain IMF components, a unique residual error calculation mode enables the decomposition process to have integrity, the inherent modal aliasing phenomenon of the existing EMD is improved, the reconstruction error is greatly reduced, the reconstructed signal is almost the same as the original signal, L (t) is defined as the original load sequence, E (input/output) is defined as the original load sequence, and i (. To) decompose the i-th component, w, for the EMD sequence i (t) is a set of white Gaussian noises, and the quantity of the white Gaussian noises is consistent with the original load length of L (t); epsilon i Is the white noise amplitude coefficient of the i-th stage,
Figure FDA0004024068610000011
the k-th component is decomposed for the CEEMDAN sequence.
3. The residential air conditioning load prediction model considering frequency domain data characteristic decomposition as claimed in claim 2, wherein: the specific flow of the fully integrated empirical mode decomposition algorithm of the adaptive noise in the step 1 comprises the following steps:
step 1.1: generating M groups of Gaussian noise random values w 1 (t),w 2 (t),...,w M (t) }, obtaining a load curve { L (t) + epsilon of the superimposed noise 0 w 1 (t),L(t)+ε 0 w 2 (t),...,L(t)+ε 0 w M (t), obtaining IMF component { I) by EMD sequence decomposition method 1,1 ,I 2,1 ,...,I M,1 Get the CEEMDAN component by taking the mean, i.e.:
Figure FDA0004024068610000021
step 1.2: computing the residual sequence r of stage 1 1 (t), namely:
Figure FDA0004024068610000022
step 1.3: decomposing M sets of sequences { r ] by EMD 1 (t)+ε 1 E 1 (w 1 (t)), i =1,2,. M }, each group of sequences stops decomposing when the 1 st IMF component is obtained, and the 2 nd component can be obtained by averaging the M IMF components
Figure FDA0004024068610000023
Namely:
Figure FDA0004024068610000024
Step 1.4: for the k-th stage, the remaining residuals and components are obtained by equations (4) - (5):
Figure FDA0004024068610000025
step 1.5: repeat step 1.44 until residual sequence r k (t) the number of extrema n is less than a threshold, the CEEMDAN decomposition is complete, at which point L (t) is decomposed into a series of modal components
Figure FDA0004024068610000027
And residual R (t), i.e.:
Figure FDA0004024068610000026
4. The residential air conditioning load prediction model considering frequency domain data characteristic decomposition as claimed in claim 1, wherein: the time series arrangement entropy algorithm in the step 2 specifically comprises the following steps: the complexity of the time sequence is measured by defining the time sequence permutation entropy to be used as the basis for combining and recombining the subcomponents, if the number of the subcomponents is large, the numerical difference of the permutation entropy is small, the load fluctuation mode forms are close, the subcomponents can be combined and recombined, and the computing resources and the time cost of the subsequent prediction work are saved.
5. The residential air conditioning load prediction model considering frequency domain data characteristic decomposition as claimed in claim 4, wherein: the specific flow of the time series arrangement entropy algorithm in the step 2 comprises the following steps:
step 2.1: for one-dimensional timing load x Load ={x 1 ,x 2 ,...,x N And reconstructing the phase space of the image into a two-dimensional matrix X, namely:
Figure FDA0004024068610000031
in the formula: l represents embedding dimension and determines the number of line vector samples; τ represents the number of interval samples;
step 2.2: for the reconstructed row vector X in X i The elements of (a) are sorted in a descending order to obtain a set of matrix element coordinate indexes { (i, j) 1 ),(i,j 2 ),...,(i,j L ) Is caused to satisfy
Figure FDA0004024068610000032
The larger the ordinate index value is, the more the element values are the same, the higher the ranking is;
step 2.3: for arbitrary row vectors X i Defining a corresponding load fluctuation pattern S i ={j 1 ,j 2 ,...,j L }, then there is L! A wave pattern is set to be a wave pattern,counting the probability of all fluctuation patterns in X { P } 1 ,P 2 ,...,P C Define the time sequence load x Load ={x 1 ,x 2 ,...,x N The permutation entropy H (L) of } is:
Figure FDA0004024068610000033
Figure FDA0004024068610000034
the permutation entropy is normalized to be between 0 and 1 through the formula (9), the closer H is to 1, the richer the fluctuation mode is, the closer H is to 0, and the monotonous fluctuation mode is.
6. The residential air conditioning load prediction model considering frequency domain data characteristic decomposition as claimed in claim 1, wherein: the specific flow of the data information optimization extraction algorithm selected based on the Catboost characteristics in the step 3 comprises the following steps:
step 3.1: compared with the traditional gradient lifting decision tree algorithm, the Catboost can better process the category characteristics of time, weather and the like, the traditional Greeny TBS takes the category label mean value as the standard of node splitting, when the distribution of training and testing sample data sets is different, the problem of condition deviation easily occurs, the Catboost effectively reduces the influence of noise and low-frequency sample data on the distribution by adding prior distribution, and the formula (10) shows that:
Figure FDA0004024068610000035
in the formula: σ = { σ = 12 ,...,σ n When }, when>
Figure FDA0004024068610000036
When it is in motion [ ]]=1, otherwise [ ·]=0;
Figure FDA0004024068610000037
Is a category label value; p is prior; a is a weight coefficient;
Figure FDA0004024068610000038
Average label values for the training set;
Figure FDA0004024068610000041
Classifying feature values for the training set;
step 3.2: the feature importance degree can be evaluated by the Catboost in the training process, based on the feature importance degree, a plurality of feature selection strategies can be constructed, PVC represents the average fluctuation amount of the predicted value of the Catboost model when the unit of the feature value changes, and if the importance degree of the feature relative to the model is higher, the PVC is also larger, as shown in a formula (11); the LFC reflects the effect of the characteristics on accelerating the convergence of the model by comparing the change condition of the Catboost model loss function if the characteristics exist, as shown in formula (13):
Figure FDA0004024068610000042
Figure FDA0004024068610000043
in the formula, W l 、V l 、W r 、V r Respectively representing the weight and the target value of the left leaf and the weight and the target value of the right leaf; LFC = L (X) -L (X) i ) (13) in the formula, X represents an input set { X ] having N feature components 1 ,x 2 ,...,x N };X i Representing a set of inputs { x ] having N-1 feature components 1 ,x 2 ,...,x i-1 ,x i+1 ,...,x N }; l (-) represents the loss function value of the model after the input features; the evaluation index I obtained by weighting calculation of PVC and LFC can give consideration to the advantages of PVC and LFC in different application scenes to comprehensively embody the characteristic importance semantics, as shown in formula (14): i = a · PVC + b · LFC (14), wherein: a and b are weight coefficients, the importance degree of PVC or LFC indexes can be enhanced by adjusting the sizes of a and b, and the method adapts to a differentiated application scene;
step 3.3: and (3) completing a feature selection process by adopting a recursive feature reduction method, searching an optimal feature subset based on a greedy strategy, and removing the least important features by repeatedly constructing a model.
7. The residential air conditioning load prediction model considering frequency domain data characteristic decomposition as claimed in claim 6, wherein: the recursive feature reduction method in step 3.3 specifically includes the following steps:
step 3.31: initializing parameters: input load data and associated impact signature X = { X = 1 ,x 2 ,...,x N As an argument, predicted data Y = { Y = 1 ,y 2 ,...,y M As a dependent variable;
step 3.32: generating a Catboost model: the first stage is to generate a regression tree based on greedy algorithm by calculating different features X in feature set X i And selecting the feature x with the minimum MSE error i Constructing an optimal tree model; the second stage is gradient lifting, a new regression tree is continuously constructed in the gradient descending direction of the current regression tree, and finally a plurality of regression trees are integrated to obtain a Catboost gradient lifting regression tree model;
step 3.33: removing characteristics: all features { x ] are calculated by equation (14) 1 ,x 2 ,...,x N Importance metric of { I } the importance of the target 1 ,I 2 ,...,I N And in descending order of feature importance I k1 ≥I k2 ≥...≥I kN Obtaining the sorting result
Figure FDA0004024068610000051
Recording the prediction accuracy p of the model on the test set at the moment N And feature combination>
Figure FDA0004024068610000052
Rejection of the feature of lowest feature importance>
Figure FDA0004024068610000053
Determining a residual characteristic>
Figure FDA0004024068610000054
Whether the number is equal to 1, ifIf not, inputting the model into a new Catboost model again for learning and training, and repeating the steps from 3.32 to 3.33; if yes, entering step 4;
step 3.34: optimal feature subset selection: load prediction accuracy p of different feature numbers N ,p N-1 ,...,p 1 Sorting in descending order, and assuming that the number of characteristic inputs is j, having the highest prediction precision
Figure FDA0004024068610000055
The corresponding optimal characteristic input is £ er>
Figure FDA0004024068610000056
8. The residential air conditioning load prediction model considering frequency domain data characteristic decomposition as claimed in claim 1, wherein: the air conditioner load prediction model based on the XGboost algorithm in the step 4 specifically comprises the following steps:
step 4.1: the XGBoost is an integrated learning algorithm taking a tree model as a basic model, the principle of the XGBoost is that the whole performance of the model is improved by building and integrating a plurality of basic learning device training results, a traditional gradient improvement decision tree (GBDT) is used as a basis, the XGBOSst performs second-order Taylor expansion aiming at a loss function of the XGBOSst, in addition, a regular term is added in the model, the model is accelerated to converge while the complexity of the model is effectively reduced, the number of samples to be predicted is given as N, and an air conditioner load data set with the characteristic number of M is as follows: d = { (x) i ,y i ):i=1,2,...,n,x i ∈R M ,y i The method belongs to the field of the following integrated prediction models, wherein the integrated prediction models belong to the group R and have K classification regression decision trees (CART):
Figure FDA0004024068610000057
in the formula, y i * Representing an air conditioner load prediction result of the XGBoost model; k is the decision tree number size; f. of k Represents the kth CART decision tree, each f k All have corresponding independent decision tree structures and leaf nodes with different weightsPoint; f is a set space representing a decision tree, and its specific meaning is:
Figure FDA0004024068610000061
In the formula: q represents respective independent decision tree structures, in which all samples can find the leaf nodes corresponding to the samples under the decision tree through mapping and pass through omega q(xi) Further mapping information contained in the leaf node into a specific numerical value;
step 4.2: calculating the deviation between the model prediction result and the real air conditioner load by defining an objective function, and training the XGBoost model by taking a minimized loss function as a target, wherein the objective function is defined as:
Figure FDA0004024068610000062
in the formula: l (-) represents the loss function error, and adopts the mean square error; the regularization term Ω (·) is defined as:
Figure FDA0004024068610000063
In the formula, T represents the number of leaf nodes, and the complexity of the tree structure can be set by changing the number of the leaf nodes; w is a j Representing the weight of the jth leaf node, and keeping the weight at a smaller value can effectively prevent overfitting; gamma and lambda represent penalty coefficients, and the relative importance degree of the two penalty terms can be set by changing the numerical values of the two coefficients;
step 4.3: based on forward step-by-step algorithm, by optimizing newly added CART decision tree f t To minimize the objective function, the t step removes the constant term and applies the objective function Obj of the second order Taylor expansion (t) Comprises the following steps:
Figure FDA0004024068610000064
in the formula: i is j Representing all sample number sets mapped into jth leaf nodes through the CART decision tree;
Figure FDA0004024068610000065
individual watchFirst and second derivatives of the loss function are shown; formula (20) relates to w j The optimal leaf node weight under a certain specific CART decision tree can be obtained by the following steps:
Figure FDA0004024068610000066
Substituting equation (20) into equation (19) yields the corresponding optimal objective function under the specific CART decision tree:
Figure FDA0004024068610000067
Combining formula (15), training the XGboost model with formula (21) as an objective function, and obtaining a final training result, which is an air conditioning load time sequence data set obtained by the air conditioning load prediction model, and recording as:
Figure FDA0004024068610000071
CN202211700889.5A 2022-12-28 2022-12-28 Resident air conditioner load prediction model considering frequency domain data characteristic decomposition Pending CN115860260A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211700889.5A CN115860260A (en) 2022-12-28 2022-12-28 Resident air conditioner load prediction model considering frequency domain data characteristic decomposition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211700889.5A CN115860260A (en) 2022-12-28 2022-12-28 Resident air conditioner load prediction model considering frequency domain data characteristic decomposition

Publications (1)

Publication Number Publication Date
CN115860260A true CN115860260A (en) 2023-03-28

Family

ID=85655623

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211700889.5A Pending CN115860260A (en) 2022-12-28 2022-12-28 Resident air conditioner load prediction model considering frequency domain data characteristic decomposition

Country Status (1)

Country Link
CN (1) CN115860260A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116706902A (en) * 2023-08-03 2023-09-05 国网湖北省电力有限公司营销服务中心(计量中心) Domestic electricity optimizing method for regional house, electronic equipment and computer readable medium
CN117894491A (en) * 2024-03-15 2024-04-16 济南宝林信息技术有限公司 Physiological monitoring data processing method for assessing mental activities

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116706902A (en) * 2023-08-03 2023-09-05 国网湖北省电力有限公司营销服务中心(计量中心) Domestic electricity optimizing method for regional house, electronic equipment and computer readable medium
CN116706902B (en) * 2023-08-03 2023-11-14 国网湖北省电力有限公司营销服务中心(计量中心) Domestic electricity optimizing method for regional house, electronic equipment and computer readable medium
CN117894491A (en) * 2024-03-15 2024-04-16 济南宝林信息技术有限公司 Physiological monitoring data processing method for assessing mental activities
CN117894491B (en) * 2024-03-15 2024-06-11 济南宝林信息技术有限公司 Physiological monitoring data processing method for assessing mental activities

Similar Documents

Publication Publication Date Title
CN111860982B (en) VMD-FCM-GRU-based wind power plant short-term wind power prediction method
CN108805188B (en) Image classification method for generating countermeasure network based on feature recalibration
CN115860260A (en) Resident air conditioner load prediction model considering frequency domain data characteristic decomposition
CN111697621B (en) Short-term wind power prediction method based on EWT-PDBN combination
CN109886464B (en) Low-information-loss short-term wind speed prediction method based on optimized singular value decomposition generated feature set
CN109784473A (en) A kind of short-term wind power prediction method based on Dual Clocking feature learning
CN112232244A (en) Fault diagnosis method for rolling bearing
CN114399032B (en) Method and system for predicting metering error of electric energy meter
CN110991721A (en) Short-term wind speed prediction method based on improved empirical mode decomposition and support vector machine
CN114358389B (en) Short-term power load prediction method combining VMD decomposition and time convolution network
CN110766060B (en) Time series similarity calculation method, system and medium based on deep learning
CN114912077B (en) Sea wave forecasting method integrating random search and mixed decomposition error correction
CN112434891A (en) Method for predicting solar irradiance time sequence based on WCNN-ALSTM
CN109447333A (en) A kind of Time Series Forecasting Methods and device based on random length fuzzy information granule
CN113411216A (en) Network flow prediction method based on discrete wavelet transform and FA-ELM
CN116050621A (en) Multi-head self-attention offshore wind power ultra-short-time power prediction method integrating lifting mode
CN111141879B (en) Deep learning air quality monitoring method, device and equipment
CN116629431A (en) Photovoltaic power generation amount prediction method and device based on variation modal decomposition and ensemble learning
CN117239722A (en) System wind load short-term prediction method considering multi-element load influence
CN112464981A (en) Self-adaptive knowledge distillation method based on space attention mechanism
CN117034055A (en) L-converter-based short-term photovoltaic power generation power prediction method
CN117407660B (en) Regional sea wave forecasting method based on deep learning
Luo et al. A novel nonlinear combination model based on support vector machine for stock market prediction
CN117407704A (en) Renewable energy source generation power prediction method, computer equipment and storage medium thereof
CN113496255B (en) Power distribution network mixed observation point distribution method based on deep learning and decision tree driving

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination