CN108921222A - A kind of air-conditioning energy consumption feature selection approach based on big data - Google Patents
A kind of air-conditioning energy consumption feature selection approach based on big data Download PDFInfo
- Publication number
- CN108921222A CN108921222A CN201810730455.7A CN201810730455A CN108921222A CN 108921222 A CN108921222 A CN 108921222A CN 201810730455 A CN201810730455 A CN 201810730455A CN 108921222 A CN108921222 A CN 108921222A
- Authority
- CN
- China
- Prior art keywords
- feature
- energy consumption
- data
- air
- algorithm
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/254—Fusion techniques of classification results, e.g. of results related to same input data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/251—Fusion techniques of input or preprocessed data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
Abstract
The invention discloses a kind of air-conditioning energy consumption feature selection approach based on big data.Acquisition air-conditioning energy consumption operation data in advance, and the data are pre-processed.According to pretreated energy consumption data feature set, respective energy consumption characters subset is created using boruta feature selecting algorithm and lasso regression algorithm respectively, extracts the important feature for influencing goal in research.Method of different nature for two kinds is selected to obtain character subset, in conjunction with expert opinion, carries out Fusion Features using the method that intersection is sorted out, obtains final key feature.The present invention uses two main methods of feature selecting:Lasso regression algorithm and boruta feature selecting algorithm.Algorithm essential difference is obvious, avoid single method bring limitation, and effective solution big data redundancy issue reduces the complexity of air-conditioning energy consumption data model.
Description
Technical field
The present invention relates to the technical field of central air-conditioning energy research, the data digging method being related under big data background,
More particularly to a kind of air-conditioning energy consumption feature selection approach based on big data.
Background technique
Since the 21th century, BAS (buildingautomationsystem, BAS) is to implement building
System performance diagnosis and optimization provide required Information Technology Platform.Store mega structure actual operating data in BAS,
But these data are seldom fully utilized.In the metering of central air conditioner system operation energy consumption, the reality of a large amount of higher-dimensions is had accumulated
When energy consumption data, conventional method is difficult to find and summarizes the knowledge that these data contain.Data mining is emerging more as one
Subject technology makes high non-linearity system modelling have new dawn, especially data mining technology answering in field of central air-conditioning
It is also more and more with studying.
In air-conditioning energy consumption research, wherein important one is central air conditioner system energy consumption spy's variable.Currently, for each
The difference of a central air conditioner system, central air conditioner system Energy Consumption Factors are also different, and it is special to lack a set of pervasive air-conditioning energy consumption
Levy selection method.
Summary of the invention
In the research of central air conditioner system energy consumption characters variable, energy consumption model is related to multiparameter problem, including external parameter and
Inner parameter.It establishes a set of reliable and pervasive air-conditioning energy consumption feature based on data mining technology and selects frame, to operation
Energy Saving Strategy is significant.The present invention provides a kind of air-conditioning energy consumption feature extracting method based on big data, reduces big number
According to redundancy, be added surface on conventional energy consumption internal feature, realize more accurate energy consumption characters model.
The present invention is achieved through the following technical solutions:
A kind of air-conditioning energy consumption feature selection approach based on big data, includes the following steps:
Step 1: carrying out preliminary screening to characteristic data set using expert opinion;
Step 2: being pre-processed to the characteristic data set Jing Guo preliminary screening;
Step 3: being based on pretreated feature set, new character subset 1 is extracted using boruta feature selecting algorithm;
Step 4: being based on pretreated feature set, new character subset 2 is extracted using lasso feature selecting algorithm;
Step 5: the character subset 2 that the character subset 1 and step 4 that are obtained based on step 3 are obtained, in conjunction with expert opinion,
Air-conditioning energy consumption key feature set is obtained using the method that intersection is sorted out.
Preferably, the pretreatment of the step 2 specifically includes following steps:
Step 2.1, restriction range, excluding outlier are set;
Step 2.2, the operation data under system stable condition is obtained using decision tree;
Step 2.3, it is spaced 5 minutes and takes data mean value, remove repetition point;
Step 2.4, merging data, and carry out data amplification;
Step 2.5, interpolation supplements missing data.
Preferably, used in the step 3 boruta feature selecting algorithm extract new character subset 1 specifically include with
Lower step:
It step 3.1, is that given data set increases randomness by the shadow character of creation mixing copy;
Step 3.2, the growth data collection of one random forest of training classification is higher to assess the importance of each feature
It is then more important;
Step 3.3, check whether each former feature than best shadow character has higher importance, and constantly
It deletes it and is considered as very unessential feature;
Step 3.4, when all features are confirmed or are refused or algorithm reaches limit as defined in one of random forest operation
When processed, algorithm stops.
Preferably, new character subset 2 is extracted specifically using model using lasso feature selecting algorithm in the step 4
The ABS function of coefficient carrys out compact model coefficient as punishment, becomes smaller or be set as 0 for part regression coefficient.
The present invention has the advantage that and beneficial effect:
The present invention uses two main methods using feature selecting:Lasso regression algorithm and boruta feature selecting are calculated
Method.Algorithm essential difference is obvious, avoids single method bring limitation, effective solution big data redundancy issue, drop
The low complexity of air-conditioning energy consumption data model;And the present invention does not need too many expert's domain knowledge, jumps out complexity
Formula is calculated to be absorbed in the angle of data and handle problem, is obtained better Energy Saving Strategy for the later period and is laid a good foundation.Number of the present invention
It is a kind of packaging algorithm of random forest according to the boruta feature selecting algorithm in excavation, eliminates the degree of redundancy of data, lasso
Regression algorithm can remove synteny attribute and noise attribute, reduce the interference and influence analyzed data.
Detailed description of the invention
Attached drawing described herein is used to provide to further understand the embodiment of the present invention, constitutes one of the application
Point, do not constitute the restriction to the embodiment of the present invention.In the accompanying drawings:
Fig. 1 is feature selection approach functional block diagram of the invention.
Fig. 2 is the character subset importance ranking figure of the invention obtained using boruta algorithm.
Specific embodiment
To make the objectives, technical solutions, and advantages of the present invention clearer, below with reference to embodiment and attached drawing, to this
Invention is described in further detail, and exemplary embodiment of the invention and its explanation for explaining only the invention, are not made
For limitation of the invention.
Embodiment
As shown in Figure 1, a kind of air-conditioning energy consumption feature selection approach based on big data of the present embodiment, this method is surveyed
Examination is completed on same computer, and concrete configuration is:Intel (R) Core (TM) i5-7400,8G memory, Windous10
Operating system.
Test data uses central air conditioner system operation data of the data from Guanghan City market, totally 4032 samples.
It is as shown in Table 1 to acquire data point.
Table one:Data collection point
Step 1:As can be seen from the above table, sensing data point quantity is more, in order to find out representative characteristic into
The following operation of row.It is as shown in Table 2 by an expert opinion preliminary screening data point.
Table two:Expert opinion preliminary screening data point
By secondary expert opinion preliminary screening data point as shown in following table three, energy consumption characters are carried out using three data point of table
The character subset for influencing air-conditioning load rate and power is found in analysis.
Table three:Secondary expert opinion preliminary screening data point
Step 2:The data of direct sources reality are often incomplete, Noise and inconsistent.Pretreatment is specific
Steps are as follows:
2.1, restriction range, excluding outlier are set;
2.2, the operation data under system stable condition is obtained using decision tree;
2.3, it is spaced 5 minutes and takes data mean value, remove repetition point;
2.4, merging data, and carry out data amplification;
2.5, interpolation supplements missing data.
Step 3, the character subset for influencing air-conditioning load rate and power, the working principle of boruta feature selecting algorithm are found
It is as follows:
Firstly, it is that given data set increases at random by all features (i.e. shadow character) of creation mixing copy
Property.
Then, it trains the growth data collection of random forest classification, and (is often adopted using a feature importance measure
With mean square sesidual), it is more high with the importance of each feature of assessment, mean more important.
In each iteration, it checks whether each former feature than best shadow character has higher importance
It (i.e. whether this feature is higher than maximum shadow character score) and constantly deletes it and is considered as very unessential feature.
Finally, when all features are confirmed or are refused or algorithm reaches limitation as defined in one of random forest operation
When, algorithm stops.
Writing program realization Boruta characteristics algorithm, steps are as follows:
3.1, shuffle is carried out to each feature value of eigenmatrix X, by the feature after shuffle
(shadowfeatures) new eigenmatrix is spliced to form with former feature (realfeatures);
3.2, use new feature matrix as input, training can export the model of feature_importance;
3.3, the Z_score of realfeature and shadowfeature is calculated;
3.4, maximum Z_score is found out in shadowfeatures is denoted as Z_max;
3.5, the realfeature by Z_socre greater than Z_max is labeled as " important ", by Z_score significantly less than Z_
MAX
Realfeature be labeled as " inessential ", and permanently rejected from characteristic set;
3.6,3.1~3.5 are repeated, until all features are all marked as " important " or " inessential ";
3.7, it is as shown in Figure 2 to obtain feature set importance ranking.
Step 4, new character subset is extracted using lasso algorithm.
The side Lasso (Leastabsoluteshrinkageandselectionoperator, Tibshirani (1996))
Method is a kind of Shrinkage estimation.The method uses the ABS function of model coefficient to carry out compact model coefficient as punishment, makes some times
Return coefficient to become smaller, or even the lesser coefficient of some absolute values is made directly to become 0.It obtains one by constructing a punishment penalty function
A model more refined, therefore the advantages of subset is shunk is remained, it is that there is a kind of handle having for multi-collinearity data to estimate partially
Meter.
It writes program and realizes that Lasso key step is as follows:
4.1. data set is converted into csv format, comma separates;
4.2. in R language, data is read, data are then changed into matrix form;
4.3. lars function is called, determines the smallest step number of Cp value;
4.4. it determines the variable filtered out, and calculates weight system;
4.5. it obtains character subset and returns weight, optimal feature subset weight is as shown in Table 4.
Table four:Optimal feature subset weight
x1 | 0.3651524 |
x3 | -45.5834741 |
x4 | 65.7041534 |
x5 | 10.6551992 |
x6 | 3.5123054 |
x9 | 13.5124891 |
x10 | 5.6813998 |
x11 | -15.4818693 |
x12 | -1.0127342 |
x14 | -70.1725065 |
x15 | 31.4574995 |
x16 | 27.3613065 |
x17 | 2.3798402 |
x18 | -15.531637 |
x19 | 21.5408714 |
x20 | -15.2884767 |
x21 | -72.6756832 |
Step 5, in conjunction with expert opinion, the method choice determinant attribute merged using intersection finally obtains central air-conditioning energy
It is as shown in Table 5 to consume key feature.
Table five:Energy consumption determinant attribute table
Above-described specific embodiment has carried out further the purpose of the present invention, technical scheme and beneficial effects
It is described in detail, it should be understood that being not intended to limit the present invention the foregoing is merely a specific embodiment of the invention
Protection scope, all within the spirits and principles of the present invention, any modification, equivalent substitution, improvement and etc. done should all include
Within protection scope of the present invention.
Claims (4)
1. a kind of air-conditioning energy consumption feature selection approach based on big data, which is characterized in that include the following steps:
Step 1: carrying out preliminary screening to characteristic data set using expert opinion;
Step 2: being pre-processed to the characteristic data set Jing Guo preliminary screening;
Step 3: being based on pretreated feature set, new character subset 1 is extracted using boruta feature selecting algorithm;
Step 4: being based on pretreated feature set, new character subset 2 is extracted using lasso feature selecting algorithm;
Step 5: the character subset 2 that the character subset 1 and step 4 that are obtained based on step 3 are obtained is used in conjunction with expert opinion
The method that intersection is sorted out obtains air-conditioning energy consumption key feature set.
2. a kind of air-conditioning energy consumption feature selection approach based on big data according to claim 1, which is characterized in that
The pretreatment of the step 2 specifically includes following steps:
Step 2.1, restriction range, excluding outlier are set;
Step 2.2, the operation data under system stable condition is obtained using decision tree;
Step 2.3, it is spaced 5 minutes and takes data mean value, remove repetition point;
Step 2.4, merging data, and carry out data amplification;
Step 2.5, interpolation supplements missing data.
3. a kind of air-conditioning energy consumption feature selection approach based on big data according to claim 1, which is characterized in that
New character subset 1 is extracted using boruta feature selecting algorithm in the step 3 and specifically includes following steps:
It step 3.1, is that given data set increases randomness by the shadow character of creation mixing copy;
Step 3.2, the growth data collection of one random forest of training classification, to assess the importance of each feature, more Gao Zeyue
It is important;
Step 3.3, it checks whether each former feature than best shadow character has higher importance, and constantly deletes
It is considered as very unessential feature;
Step 3.4, when all features are confirmed or are refused or algorithm reaches limitation as defined in one of random forest operation
When, algorithm stops.
4. a kind of air-conditioning energy consumption feature selection approach based on big data according to claim 1, which is characterized in that
The absolute value letter that new character subset 2 specifically uses model coefficient is extracted using lasso feature selecting algorithm in the step 4
Number carrys out compact model coefficient as punishment, becomes smaller or be set as 0 for part regression coefficient.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810730455.7A CN108921222A (en) | 2018-07-05 | 2018-07-05 | A kind of air-conditioning energy consumption feature selection approach based on big data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810730455.7A CN108921222A (en) | 2018-07-05 | 2018-07-05 | A kind of air-conditioning energy consumption feature selection approach based on big data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108921222A true CN108921222A (en) | 2018-11-30 |
Family
ID=64424904
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810730455.7A Pending CN108921222A (en) | 2018-07-05 | 2018-07-05 | A kind of air-conditioning energy consumption feature selection approach based on big data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108921222A (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140207714A1 (en) * | 2013-01-18 | 2014-07-24 | International Business Machines Corporation | Transductive lasso for high-dimensional data regression problems |
CN107169284A (en) * | 2017-05-12 | 2017-09-15 | 北京理工大学 | A kind of biomedical determinant attribute system of selection |
CN107273387A (en) * | 2016-04-08 | 2017-10-20 | 上海市玻森数据科技有限公司 | Towards higher-dimension and unbalanced data classify it is integrated |
CN107730154A (en) * | 2017-11-23 | 2018-02-23 | 安趣盈(上海)投资咨询有限公司 | Based on the parallel air control application method of more machine learning models and system |
CN107909077A (en) * | 2017-10-10 | 2018-04-13 | 安徽信息工程学院 | Feature selection approach based on rarefaction theory in the case of semi-supervised |
CN107992447A (en) * | 2017-12-13 | 2018-05-04 | 电子科技大学 | A kind of feature selecting decomposition method applied to river level prediction data |
-
2018
- 2018-07-05 CN CN201810730455.7A patent/CN108921222A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140207714A1 (en) * | 2013-01-18 | 2014-07-24 | International Business Machines Corporation | Transductive lasso for high-dimensional data regression problems |
CN107273387A (en) * | 2016-04-08 | 2017-10-20 | 上海市玻森数据科技有限公司 | Towards higher-dimension and unbalanced data classify it is integrated |
CN107169284A (en) * | 2017-05-12 | 2017-09-15 | 北京理工大学 | A kind of biomedical determinant attribute system of selection |
CN107909077A (en) * | 2017-10-10 | 2018-04-13 | 安徽信息工程学院 | Feature selection approach based on rarefaction theory in the case of semi-supervised |
CN107730154A (en) * | 2017-11-23 | 2018-02-23 | 安趣盈(上海)投资咨询有限公司 | Based on the parallel air control application method of more machine learning models and system |
CN107992447A (en) * | 2017-12-13 | 2018-05-04 | 电子科技大学 | A kind of feature selecting decomposition method applied to river level prediction data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107563381B (en) | Multi-feature fusion target detection method based on full convolution network | |
Peng | Zipf’s law for Chinese cities: Rolling sample regressions | |
CN101409634B (en) | Quantitative analysis tools and method for internet news influence based on information retrieval | |
CN107169628A (en) | A kind of distribution network reliability evaluation method based on big data mutual information attribute reduction | |
CN107918664B (en) | Social network data differential privacy protection method based on uncertain graph | |
CN106228068A (en) | Android malicious code detecting method based on composite character | |
CN106372239A (en) | Social network event correlation analysis method based on heterogeneous network | |
CN108711103A (en) | Personal loan repays Risk Forecast Method, device, computer equipment and medium | |
CN106067034A (en) | A kind of distribution network load curve clustering method based on higher dimensional matrix characteristic root | |
CN111309777A (en) | Report data mining method for improving association rule based on mutual exclusion expression | |
CN110365603A (en) | A kind of self adaptive network traffic classification method open based on 5G network capabilities | |
CN107465691A (en) | Network attack detection system and detection method based on router log analysis | |
US20230169244A1 (en) | Method for evaluating fracture connectivity and optimizing fracture parameters based on complex network theory | |
CN102663083A (en) | Large-scale social network information extraction method based on distributed computation | |
CN107358534A (en) | The unbiased data collecting system and acquisition method of social networks | |
CN103440308B (en) | A kind of digital thesis search method based on form concept analysis | |
CN107861965A (en) | Data intelligence recognition methods and system | |
CN102208027A (en) | Method for evaluating land utilization spatial pattern based on clearance degree dimension | |
CN108921222A (en) | A kind of air-conditioning energy consumption feature selection approach based on big data | |
CN105139373A (en) | Objective non-reference image quality evaluation method based on independent subspace analysis | |
CN106550387B (en) | A kind of wireless sensor network routing layer QoS evaluating method | |
CN113706459B (en) | Detection and simulation repair device for abnormal brain area of autism patient | |
CN115272776A (en) | Hyperspectral image classification method based on double-path convolution and double attention and storage medium | |
CN107506361A (en) | Raster data polymerization and device, raster data decoupling method and apparatus and system | |
CN106447112A (en) | Construction land scale prediction method of multiple cities based on stack limited Boltzmann machine |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181130 |