CN112329813A

CN112329813A - Energy consumption prediction feature extraction method and system

Info

Publication number: CN112329813A
Application number: CN202011051505.2A
Authority: CN
Inventors: 陈志文; 梁可天; 邓仕均; 阳春华; 彭涛; 蒋朝辉; 桂卫华
Original assignee: Central South University
Current assignee: Central South University
Priority date: 2020-09-29
Filing date: 2020-09-29
Publication date: 2021-02-05
Anticipated expiration: 2040-09-29
Also published as: CN112329813B

Abstract

The invention relates to the field of central air conditioner energy consumption prediction, and discloses a method and a system for extracting energy consumption prediction features, which are used for rapidly screening input features of energy consumption prediction and improving generalization performance of an energy consumption prediction algorithm; the method comprises the steps of collecting historical operation data of a central air conditioning system to be analyzed, and preprocessing the historical operation data to obtain an initial feature set; training according to the initial feature set to obtain a gradient lifting tree energy consumption prediction model, and calculating the contribution degree of each input feature; performing feature screening according to the contribution degree to obtain an optimized feature set; optimizing a gradient lifting tree energy consumption prediction model according to the optimization feature set, and obtaining a predicted value according to the optimized gradient lifting tree energy consumption prediction model; calculating the mean square error of the contribution degree and the predicted value; and screening the mean square error by adopting a preset feature screening termination condition to obtain an optimal feature set.

Description

Energy consumption prediction feature extraction method and system

Technical Field

The invention relates to the field of central air conditioner energy consumption prediction, in particular to a method and a system for extracting characteristics for energy consumption prediction.

Background

The central air conditioning system is one of the main systems of energy consumption of public buildings, so that the research on the energy consumption prediction and optimization control method of the central air conditioning system is widely concerned. The central air-conditioning system is generally composed of a plurality of refrigerating units, a water pump, a cooling tower and other energy consumption equipment, and the working state of the central air-conditioning system is influenced by factors such as user demand, seasons, weather and the like, so that the running data of the central air-conditioning system is complex, and the energy consumption change rule is difficult to analyze and model. In order to achieve accurate central air conditioning system energy consumption prediction, it is necessary to perform feature extraction and selection on the operation data. In the existing central air-conditioning system energy consumption prediction method based on data driving, most methods for processing input characteristics adopt expert experience and manual screening, and have the defects of low efficiency, poor universality and the like, so that the generalization performance of energy consumption prediction is weak. In addition, a multi-objective optimization method can be used for optimizing the feature combination, but due to the fact that the number of the original features is large, optimization time is very long. Therefore, it is necessary to provide a rapid extraction method for central air conditioner energy consumption prediction features, so as to achieve rapid extraction and screening of the features and improve generalization performance of an energy consumption prediction algorithm.

Disclosure of Invention

The invention provides a method and a system for rapidly extracting energy consumption prediction characteristics of a central air conditioner, aiming at the problems, so as to rapidly screen the input characteristics of energy consumption prediction and improve the generalization performance of an energy consumption prediction algorithm.

In order to achieve the above object, the present invention provides a method for extracting features for energy consumption prediction, comprising the steps of:

s1: acquiring historical operation data of a central air conditioning system to be analyzed, and preprocessing the historical operation data to obtain an initial feature set;

s2: training according to the initial feature set to obtain a gradient lifting tree energy consumption prediction model, and calculating the contribution degree of each input feature;

s3: performing feature screening according to the contribution degree to obtain an optimized feature set;

s4: optimizing the energy consumption prediction model of the gradient lifting tree according to the optimization feature set, and obtaining a predicted value according to the optimized energy consumption prediction model of the gradient lifting tree;

s5: calculating the mean square error of the contribution degree and the predicted value; and judging whether the mean square error meets a preset feature screening termination condition, if so, obtaining an optimal feature set according to a screening result, and if not, returning to the S4 until obtaining the optimal feature set.

Preferably, the historical operation data includes one or a combination of any of historical input power, chilled water outlet temperature set value, chilled water outlet temperature, chilled water inlet temperature of each refrigerating unit, water supply temperature, water return temperature, water supply pressure, water return pressure, valve opening and instantaneous flow of a refrigeration main pipe, input power, working frequency, water inlet pressure and water outlet pressure of each refrigeration pump, input power and working frequency of each cooling tower, and one or a combination of any of outdoor temperature, humidity, dew point temperature and wet bulb temperature of each refrigeration unit.

Preferably, the step of preprocessing the historical operating data to obtain an initial feature set specifically includes the following steps:

s11: dividing the collected historical operating data into a plurality of subsequences at intervals of set time, and filling missing values of each collection item in a preset mode respectively;

s12: the method comprises the steps of combining the same acquisition items of all the refrigeration pumps under the same chilled water supply main pipe into a first average value; the same collection items of all cooling pumps and cooling towers under the same main pipe are combined into a second average value;

s13: selecting a prediction target item Y from the acquisition items, wherein the calculation formula is as follows:

Y＝[y₀,y₁,y₂,...,y_L]；

wherein L is the length of the running data, y₀To predict the initial value of the target item, y_LThe L value of the target item is predicted, wherein the L value is 0, 1 and … L;

s14: keeping all the collected item data when the value of the predicted target item in the operation data is not 0, deleting the rest data, wherein the length of the deleted data is M, and forming an initial feature set X by the data of other collected items except the predicted target item₀And is recorded as:

in the formula (I), the compound is shown in the specification,

j data representing the ith acquisition item, and N being the number of acquisition items.

Preferably, the filling the missing value of each acquisition item in a preset manner specifically includes the following steps:

and calculating Euclidean distances between the non-missing part of each acquisition item and the same part of all the subsequences, taking the average value of K subsequences with the closest distance as a reference value, and filling the missing part of the acquisition item into the corresponding reference value.

Preferably, the S2 specifically includes the following steps:

by using X₀And training the constructed gradient lifting tree model according to preset conditions by Y, and calculating X after training is finished₀The calculation formula of the first contribution degree and the second contribution degree of each feature is as follows:

Imp_g(xⁿ)＝gain(xⁿ) (2)

Imp_s(xⁿ)＝split(xⁿ) (3)

in the formula, xⁿIs X₀Column n, Imp_g(xⁿ) Denotes a first contribution, Imp, of the nth feature_s(xⁿ) Representing a second contribution, gain (x), of the nth featureⁿ) Represents the average information gain, split (x), obtained when using the nth feature for the sub-node partitioning in the gradient lifting tree modelⁿ) Representing the number of times of using the nth characteristic to divide the child nodes in the gradient lifting tree model;

wherein the preset conditions include: the loss function is composed of the L1 norm of all parameters in the model and the mean square error between the model output value and the actual value; and limiting the maximum depth of all the base learners in the gradient lifting tree model to be P.

Preferably, the S3 specifically includes the following steps:

s31: deleting the features of which the first contribution degree and the second contribution degree are 0 in all the features;

s32: calculating the average value of the first contribution degrees of all the features, and marking the feature with all the first contribution degrees larger than the average value of the first contribution degrees as F₁(ii) a Calculating the average value of the second contribution degrees of all the features, and marking the feature with all the second contribution degrees larger than the average value of the second contribution degrees as F₂；

S33: is selected to be included in F₁And F₂To obtain a primary optimization feature set X₁。

Preferably, the S4 specifically includes the following steps:

by using X₁And training the constructed gradient lifting tree model by Y, after training is finished, calculating a first contribution degree and a second contribution degree of each characteristic according to a formula (2) and a formula (3), and calculating a mean square error of a model output value as the predicted value.

Preferably, the preset feature screening termination condition includes that the difference between the maximum value and the minimum value of the first contribution degree or the second contribution degree is smaller than Q times of the maximum value, wherein Q is a decimal number between 0 and 1; the total number of features is less than R, wherein R is an integer; the mean square error of the model output values is greater than T, where T is a real number.

Preferably, the S5 specifically includes the following steps:

s51: judging whether the current characteristics meet the characteristic screening termination condition, if so, judging whether the current characteristics meet the characteristic screening termination conditionThe screening is terminated, and the optimal characteristic set is obtained and is X₁Otherwise, re-screening the features and recording the obtained feature set as a secondary optimization feature set X₂；

S52: optimizing feature set X in two₂Calculating the predicted value again for input and Y for output, then judging whether the feature screening termination condition is met again, if yes, terminating the feature screening to obtain the optimal feature set X₂Otherwise, repeatedly executing S3 and S4 to obtain a plurality of optimized feature sets, and obtaining the optimal feature set X until the feature screening termination condition is met_RAnd R is the total optimization times.

As a general inventive concept, the present invention also provides a feature extraction system for energy consumption prediction, comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the above method when executing the computer program.

The invention has the following beneficial effects:

the invention provides a method and a system for extracting characteristics for energy consumption prediction.A characteristic contribution degree is calculated through a gradient lifting tree model with preset condition constraint, wherein the preset condition comprises that a loss function is formed by L1 norms of all parameters in the model and mean square errors between model output values and actual values; and limiting the maximum depth of all the base learners in the gradient lifting tree model to be P. And the feature contribution degree is utilized to carry out feature screening, so that the feature extraction speed is effectively improved, and the generalization performance of the energy consumption prediction algorithm is improved.

The present invention will be described in further detail below with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the invention and, together with the description, serve to explain the invention and not to limit the invention. In the drawings:

FIG. 1 is a flowchart of a method for extracting features for energy consumption prediction according to a preferred embodiment of the present invention;

FIG. 2 is a first degree of contribution of all the features in the initial feature set of the preferred embodiment of the present invention;

FIG. 3 is a second degree of contribution of all the features in the initial feature set of the preferred embodiment of the present invention;

FIG. 4 is a first contribution and a second contribution for all the features in the quartic optimized feature set of the preferred embodiment of the present invention;

Detailed Description

The embodiments of the invention will be described in detail below with reference to the drawings, but the invention can be implemented in many different ways as defined and covered by the claims.

Example 1

Referring to fig. 1, the present embodiment provides a method for extracting features for energy consumption prediction, including the following steps:

s4: optimizing a gradient lifting tree energy consumption prediction model according to the optimization feature set, and obtaining a predicted value according to the optimized gradient lifting tree energy consumption prediction model;

The characteristic extraction method for energy consumption prediction can quickly screen the input characteristics of energy consumption prediction and improve the generalization performance of an energy consumption prediction algorithm.

In practical application, in the above steps, the energy consumption prediction feature extraction method of the present invention may further perform optimization, and the optimized embodiment is as follows:

first, it should be noted that the acquisition items in this embodiment specifically include: the refrigerating unit comprises one or the combination of any of historical input power, a chilled water outlet water temperature set value, a chilled water outlet water temperature, a chilled water inlet water temperature, a cooling water outlet water temperature, a cooling water inlet water temperature, a water supply temperature, a water return temperature, a water supply pressure, a water return pressure, a valve opening degree and instantaneous flow of a refrigeration main pipe, input power, working frequency, water inlet pressure and water outlet pressure of each refrigeration pump, one or the combination of any of the input power and the working frequency of each cooling tower, and one or the combination of any of outdoor temperature, humidity, dew point temperature and wet bulb temperature.

Further, in this embodiment, the step S1 specifically includes the following steps:

s11: dividing the collected operation data into a plurality of subsequences at intervals of set time, and filling missing values of each collection item in a preset mode respectively;

Y＝[y₀,y₁,y₂,...,y_L]；

in the formula (I), the compound is shown in the specification,

j data representing the ith acquisition item, and N is the number of acquisition items, namely the feature number.

Filling missing values of each acquisition item, wherein the filling method comprises the following steps: and calculating Euclidean distances between the non-missing part and the same part of all the subsequences, taking the average value of K subsequences closest to the non-missing part as a reference value, and filling the missing part into the corresponding reference value.

The filling of the missing value of each acquisition item in a preset mode specifically comprises the following steps:

and calculating Euclidean distances between the non-missing part of each acquisition item and the same part of all the subsequences, taking the average value of the K subsequences closest to the Euclidean distances as a reference value, and filling the missing part of the acquisition item into the corresponding reference value.

Further, using X₀And training the constructed gradient lifting tree model according to preset conditions by Y, and calculating X after training is finished₀The calculation formula of the first contribution degree and the second contribution degree of each feature is as follows:

Imp_g(xⁿ)＝gain(xⁿ) (2)

Imp_s(xⁿ)＝split(xⁿ) (3)

In this embodiment, S3 specifically includes the following steps:

s32: calculating the average value of the first contribution degrees of all the features, and marking the feature with all the first contribution degrees larger than the average value of the first contribution degrees as F₁(ii) a Calculating the average value of the second contribution degrees of all the features, and marking the feature with all the second contribution degrees larger than the average value of the second contribution degrees as F₂F₂；

As a preferred implementation manner of this embodiment, S4 specifically includes the following steps:

by using X₁And training the constructed gradient lifting tree model by Y, calculating a first contribution degree and a second contribution degree of each characteristic according to a formula (2) and a formula (3) after training is finished, and calculating a mean square error of a model output value as a predicted value.

It should be noted that, in this embodiment, the preset feature screening termination condition includes that a difference between a maximum value and a minimum value of the first contribution degree or the second contribution degree is smaller than Q times of the maximum value, where Q is a decimal number between 0 and 1; the total number of features is less than R, wherein R is an integer; the mean square error of the model output values is greater than T, where T is a real number. In specific implementation, the preset feature screening termination condition may be one of the above conditions or a combination of any several of the above conditions.

In this embodiment, S5 specifically includes the following steps:

s51: judging whether the current characteristics meet the characteristic screening termination condition, if so, terminating the characteristic screening to obtain the optimal characteristic set X₁Otherwise, re-screening the features and recording the obtained feature set as a secondary optimization feature set X₂；

S52: optimizing feature set X in two₂If the input is input and Y is output, calculating the predicted value again, then judging whether the feature screening termination condition is met again, and if the feature screening termination condition is met, judging whether the feature screening termination condition is metIf the feature is satisfied, the feature screening is terminated, and the optimal feature set X is obtained₂Otherwise, repeatedly executing S3 and S4 to obtain a plurality of optimized feature sets, and obtaining the optimal feature set X until the feature screening termination condition is met_RAnd R is the total optimization times.

Further, the present embodiment further describes and verifies the method of the present invention by taking a certain central air conditioning system as an example. It should be noted that the main devices of the central air-conditioning system include 4 refrigeration units, 4 refrigeration pumps, 4 cooling pumps, and 16 cooling towers, all the refrigeration pumps are connected to the same chilled water main pipe, and all the cooling pumps are connected to the same cooling water main pipe. The initial characteristics of the system operation data are 178, the sampling interval is 5 minutes, the refrigerating unit starts to operate every morning and stops operating every night, and the daily downtime is not fixed.

Taking the prediction of the energy consumption of a certain refrigerating unit in the central air-conditioning system as an example, firstly, the operation data of the central air-conditioning system for 90 days from 7 months to 9 months is collected and is divided into a plurality of subsequences at intervals of 24 hours. And (4) filling missing values by taking the average value of 10 subsequences with the nearest 10 subsequences of the Euclidean distance matching distance, wherein the missing values exist. Since the same manifold is used for both the chilled water and the cooling water, the related acquisition items of all the chilled pumps and the cooling pumps are respectively combined, for example, the working frequency of 4 chilled pumps is averaged as a characteristic, and the working frequency of 16 cooling towers is averaged as a characteristic. And deleting the data of the host in the shutdown state to obtain an initial feature set, wherein the initial feature set comprises 120 features and 7500 samples. And taking the input power of a certain refrigerating unit as a prediction target, randomly extracting 6000 samples as a training data set according to the ratio of 8:2, and taking the rest 1500 samples as a test data set.

The LightGBM model is constructed using a training set, limiting the maximum depth of all base learners to 5. After the model training is completed, the mean square error of the predicted value on the test set is 206, the average relative error is 0.3%, and the first contribution degree and the second contribution degree of all the characteristics are calculated. Fig. 2 shows a first contribution of all features in the initial feature set, and fig. 3 shows a second contribution of all features in the initial feature set, in which the dashed line represents the mean value. As can be seen from fig. 2 and 3, the contribution degrees obtained by using different calculation methods for the same feature are different, and in order to avoid losing effective information, two contribution degrees are used for feature screening. 11 and 23 features are selected by utilizing the first contribution degree and the second contribution degree respectively, wherein 9 features are selected at the same time, so that a primary optimization feature set with 26 features is obtained after combination.

And (3) retraining the LightGBM model by using the primary optimization feature set, wherein after the training is finished, the mean square error of a predicted value on the test set is 225, the average relative error is 0.5%, and the feature contribution degree after the primary optimization is calculated. At this time, the feature screening termination condition is set as follows: the difference between the maximum value and the minimum value of the first contribution degree or the second contribution degree is less than 0.5 times of the maximum value; the total number of features is less than 5; the mean square error of the model output values is greater than 300. After verification, the optimization feature set can be optimized continuously. After 4 sub-optimizations in total, the four optimization feature sets have 3 input features, namely the return water temperature of the freezing main pipe, the water supply temperature of the freezing main pipe and the three-phase active power of the freezing pump, the mean square error of the predicted value on the test set is 249, the average relative error is 0.6%, and fig. 4 shows the first contribution and the second contribution of all the features in the four optimization feature sets. As can be seen from fig. 4, the contribution degree of each feature is large at this time. After four times of optimization, 3 key features are selected from 120 features of the initial feature set, the total time consumption is less than 300 seconds, more than 6 hours are needed for realizing the same task by using the multi-objective optimization algorithm, and the final result obtained by the multi-objective optimization algorithm is consistent with the optimal feature set of the method. Therefore, it can be said that the method of the present invention is superior in the rapidity of feature extraction.

Example 2

In accordance with the above method embodiments, the present embodiment provides a system for extracting features for energy consumption prediction, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the method when executing the computer program.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for extracting features for energy consumption prediction is characterized by comprising the following steps:

2. The method for extracting characteristics for energy consumption prediction according to claim 1, wherein the historical operating data includes one or a combination of any of historical input power, a chilled water outlet temperature set value, a chilled water outlet temperature, a chilled water inlet temperature, a water supply temperature, a water return pressure, a valve opening, and an instantaneous flow rate of each refrigeration unit, and further includes one or a combination of any of input power, an operating frequency, a water inlet pressure, and a water outlet pressure of each refrigeration pump, input power, and an operating frequency of each cooling tower, and further includes one or a combination of any of outdoor temperature, humidity, dew point temperature, and wet bulb temperature.

3. The method for extracting features for energy consumption prediction according to claim 2, wherein the step of preprocessing the historical operating data to obtain an initial feature set specifically comprises the steps of:

Y＝[y₀,y₁,y₂,...,y_L]；

s14: keeping all the collected item data when the value of the prediction target item in the operation data is not 0, deleting the rest data, recording the length of the deleted data as M, and forming an initial feature set X by the data of other collected items except the prediction target item₀And is recorded as:

in the formula (I), the compound is shown in the specification,

4. The method for extracting features for energy consumption prediction according to claim 3, wherein the filling of the missing values of each of the collection items in a preset manner comprises the following steps:

5. The method for extracting features for energy consumption prediction according to claim 3, wherein the step S2 specifically includes the steps of:

Imp_g(xⁿ)＝gain(xⁿ) (2)

Imp_s(xⁿ)＝split(xⁿ) (3)

6. The method for extracting features for energy consumption prediction according to claim 5, wherein the step S3 specifically includes the steps of:

7. The method for extracting features for energy consumption prediction according to claim 6, wherein the step S4 specifically includes the steps of:

8. The method for extracting features for energy consumption prediction according to claim 7, wherein the preset feature screening termination condition includes that a difference between a maximum value and a minimum value of the first contribution degree or the second contribution degree is smaller than Q times of the maximum value, where Q is a decimal between 0 and 1; the total number of features is less than R, wherein R is an integer; the mean square error of the model output values is greater than T, where T is a real number.

9. The method for extracting features for energy consumption prediction according to claim 8, wherein the step S5 specifically includes the steps of:

S52: optimizing feature set X in two₂If the input is input and Y is output, the predicted value is calculated again, then whether the feature screening termination condition is met or not is judged again, if yes, the feature screening is terminated, and the result is obtainedTo an optimal feature set of X₂Otherwise, repeatedly executing S3 and S4 to obtain a plurality of optimized feature sets, and obtaining the optimal feature set X until the feature screening termination condition is met_RAnd R is the total optimization times.

10. A system for extracting features for energy consumption prediction, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method according to any of the preceding claims 1 to 9 when executing the computer program.