CN108921222A - A kind of air-conditioning energy consumption feature selection approach based on big data - Google Patents

A kind of air-conditioning energy consumption feature selection approach based on big data Download PDF

Info

Publication number
CN108921222A
CN108921222A CN201810730455.7A CN201810730455A CN108921222A CN 108921222 A CN108921222 A CN 108921222A CN 201810730455 A CN201810730455 A CN 201810730455A CN 108921222 A CN108921222 A CN 108921222A
Authority
CN
China
Prior art keywords
feature
energy consumption
data
air
algorithm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810730455.7A
Other languages
Chinese (zh)
Inventor
李碧军
史翔
何彬
陈耕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Terry Zhihui Technology Co Ltd
Original Assignee
Sichuan Terry Zhihui Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Terry Zhihui Technology Co Ltd filed Critical Sichuan Terry Zhihui Technology Co Ltd
Priority to CN201810730455.7A priority Critical patent/CN108921222A/en
Publication of CN108921222A publication Critical patent/CN108921222A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/254Fusion techniques of classification results, e.g. of results related to same input data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/251Fusion techniques of input or preprocessed data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features

Abstract

The invention discloses a kind of air-conditioning energy consumption feature selection approach based on big data.Acquisition air-conditioning energy consumption operation data in advance, and the data are pre-processed.According to pretreated energy consumption data feature set, respective energy consumption characters subset is created using boruta feature selecting algorithm and lasso regression algorithm respectively, extracts the important feature for influencing goal in research.Method of different nature for two kinds is selected to obtain character subset, in conjunction with expert opinion, carries out Fusion Features using the method that intersection is sorted out, obtains final key feature.The present invention uses two main methods of feature selecting:Lasso regression algorithm and boruta feature selecting algorithm.Algorithm essential difference is obvious, avoid single method bring limitation, and effective solution big data redundancy issue reduces the complexity of air-conditioning energy consumption data model.

Description

A kind of air-conditioning energy consumption feature selection approach based on big data
Technical field
The present invention relates to the technical field of central air-conditioning energy research, the data digging method being related under big data background, More particularly to a kind of air-conditioning energy consumption feature selection approach based on big data.
Background technique
Since the 21th century, BAS (buildingautomationsystem, BAS) is to implement building System performance diagnosis and optimization provide required Information Technology Platform.Store mega structure actual operating data in BAS, But these data are seldom fully utilized.In the metering of central air conditioner system operation energy consumption, the reality of a large amount of higher-dimensions is had accumulated When energy consumption data, conventional method is difficult to find and summarizes the knowledge that these data contain.Data mining is emerging more as one Subject technology makes high non-linearity system modelling have new dawn, especially data mining technology answering in field of central air-conditioning It is also more and more with studying.
In air-conditioning energy consumption research, wherein important one is central air conditioner system energy consumption spy's variable.Currently, for each The difference of a central air conditioner system, central air conditioner system Energy Consumption Factors are also different, and it is special to lack a set of pervasive air-conditioning energy consumption Levy selection method.
Summary of the invention
In the research of central air conditioner system energy consumption characters variable, energy consumption model is related to multiparameter problem, including external parameter and Inner parameter.It establishes a set of reliable and pervasive air-conditioning energy consumption feature based on data mining technology and selects frame, to operation Energy Saving Strategy is significant.The present invention provides a kind of air-conditioning energy consumption feature extracting method based on big data, reduces big number According to redundancy, be added surface on conventional energy consumption internal feature, realize more accurate energy consumption characters model.
The present invention is achieved through the following technical solutions:
A kind of air-conditioning energy consumption feature selection approach based on big data, includes the following steps:
Step 1: carrying out preliminary screening to characteristic data set using expert opinion;
Step 2: being pre-processed to the characteristic data set Jing Guo preliminary screening;
Step 3: being based on pretreated feature set, new character subset 1 is extracted using boruta feature selecting algorithm;
Step 4: being based on pretreated feature set, new character subset 2 is extracted using lasso feature selecting algorithm;
Step 5: the character subset 2 that the character subset 1 and step 4 that are obtained based on step 3 are obtained, in conjunction with expert opinion, Air-conditioning energy consumption key feature set is obtained using the method that intersection is sorted out.
Preferably, the pretreatment of the step 2 specifically includes following steps:
Step 2.1, restriction range, excluding outlier are set;
Step 2.2, the operation data under system stable condition is obtained using decision tree;
Step 2.3, it is spaced 5 minutes and takes data mean value, remove repetition point;
Step 2.4, merging data, and carry out data amplification;
Step 2.5, interpolation supplements missing data.
Preferably, used in the step 3 boruta feature selecting algorithm extract new character subset 1 specifically include with Lower step:
It step 3.1, is that given data set increases randomness by the shadow character of creation mixing copy;
Step 3.2, the growth data collection of one random forest of training classification is higher to assess the importance of each feature It is then more important;
Step 3.3, check whether each former feature than best shadow character has higher importance, and constantly It deletes it and is considered as very unessential feature;
Step 3.4, when all features are confirmed or are refused or algorithm reaches limit as defined in one of random forest operation When processed, algorithm stops.
Preferably, new character subset 2 is extracted specifically using model using lasso feature selecting algorithm in the step 4 The ABS function of coefficient carrys out compact model coefficient as punishment, becomes smaller or be set as 0 for part regression coefficient.
The present invention has the advantage that and beneficial effect:
The present invention uses two main methods using feature selecting:Lasso regression algorithm and boruta feature selecting are calculated Method.Algorithm essential difference is obvious, avoids single method bring limitation, effective solution big data redundancy issue, drop The low complexity of air-conditioning energy consumption data model;And the present invention does not need too many expert's domain knowledge, jumps out complexity Formula is calculated to be absorbed in the angle of data and handle problem, is obtained better Energy Saving Strategy for the later period and is laid a good foundation.Number of the present invention It is a kind of packaging algorithm of random forest according to the boruta feature selecting algorithm in excavation, eliminates the degree of redundancy of data, lasso Regression algorithm can remove synteny attribute and noise attribute, reduce the interference and influence analyzed data.
Detailed description of the invention
Attached drawing described herein is used to provide to further understand the embodiment of the present invention, constitutes one of the application Point, do not constitute the restriction to the embodiment of the present invention.In the accompanying drawings:
Fig. 1 is feature selection approach functional block diagram of the invention.
Fig. 2 is the character subset importance ranking figure of the invention obtained using boruta algorithm.
Specific embodiment
To make the objectives, technical solutions, and advantages of the present invention clearer, below with reference to embodiment and attached drawing, to this Invention is described in further detail, and exemplary embodiment of the invention and its explanation for explaining only the invention, are not made For limitation of the invention.
Embodiment
As shown in Figure 1, a kind of air-conditioning energy consumption feature selection approach based on big data of the present embodiment, this method is surveyed Examination is completed on same computer, and concrete configuration is:Intel (R) Core (TM) i5-7400,8G memory, Windous10 Operating system.
Test data uses central air conditioner system operation data of the data from Guanghan City market, totally 4032 samples.
It is as shown in Table 1 to acquire data point.
Table one:Data collection point
Step 1:As can be seen from the above table, sensing data point quantity is more, in order to find out representative characteristic into The following operation of row.It is as shown in Table 2 by an expert opinion preliminary screening data point.
Table two:Expert opinion preliminary screening data point
By secondary expert opinion preliminary screening data point as shown in following table three, energy consumption characters are carried out using three data point of table The character subset for influencing air-conditioning load rate and power is found in analysis.
Table three:Secondary expert opinion preliminary screening data point
Step 2:The data of direct sources reality are often incomplete, Noise and inconsistent.Pretreatment is specific Steps are as follows:
2.1, restriction range, excluding outlier are set;
2.2, the operation data under system stable condition is obtained using decision tree;
2.3, it is spaced 5 minutes and takes data mean value, remove repetition point;
2.4, merging data, and carry out data amplification;
2.5, interpolation supplements missing data.
Step 3, the character subset for influencing air-conditioning load rate and power, the working principle of boruta feature selecting algorithm are found It is as follows:
Firstly, it is that given data set increases at random by all features (i.e. shadow character) of creation mixing copy Property.
Then, it trains the growth data collection of random forest classification, and (is often adopted using a feature importance measure With mean square sesidual), it is more high with the importance of each feature of assessment, mean more important.
In each iteration, it checks whether each former feature than best shadow character has higher importance It (i.e. whether this feature is higher than maximum shadow character score) and constantly deletes it and is considered as very unessential feature.
Finally, when all features are confirmed or are refused or algorithm reaches limitation as defined in one of random forest operation When, algorithm stops.
Writing program realization Boruta characteristics algorithm, steps are as follows:
3.1, shuffle is carried out to each feature value of eigenmatrix X, by the feature after shuffle (shadowfeatures) new eigenmatrix is spliced to form with former feature (realfeatures);
3.2, use new feature matrix as input, training can export the model of feature_importance;
3.3, the Z_score of realfeature and shadowfeature is calculated;
3.4, maximum Z_score is found out in shadowfeatures is denoted as Z_max;
3.5, the realfeature by Z_socre greater than Z_max is labeled as " important ", by Z_score significantly less than Z_ MAX
Realfeature be labeled as " inessential ", and permanently rejected from characteristic set;
3.6,3.1~3.5 are repeated, until all features are all marked as " important " or " inessential ";
3.7, it is as shown in Figure 2 to obtain feature set importance ranking.
Step 4, new character subset is extracted using lasso algorithm.
The side Lasso (Leastabsoluteshrinkageandselectionoperator, Tibshirani (1996)) Method is a kind of Shrinkage estimation.The method uses the ABS function of model coefficient to carry out compact model coefficient as punishment, makes some times Return coefficient to become smaller, or even the lesser coefficient of some absolute values is made directly to become 0.It obtains one by constructing a punishment penalty function A model more refined, therefore the advantages of subset is shunk is remained, it is that there is a kind of handle having for multi-collinearity data to estimate partially Meter.
It writes program and realizes that Lasso key step is as follows:
4.1. data set is converted into csv format, comma separates;
4.2. in R language, data is read, data are then changed into matrix form;
4.3. lars function is called, determines the smallest step number of Cp value;
4.4. it determines the variable filtered out, and calculates weight system;
4.5. it obtains character subset and returns weight, optimal feature subset weight is as shown in Table 4.
Table four:Optimal feature subset weight
x1 0.3651524
x3 -45.5834741
x4 65.7041534
x5 10.6551992
x6 3.5123054
x9 13.5124891
x10 5.6813998
x11 -15.4818693
x12 -1.0127342
x14 -70.1725065
x15 31.4574995
x16 27.3613065
x17 2.3798402
x18 -15.531637
x19 21.5408714
x20 -15.2884767
x21 -72.6756832
Step 5, in conjunction with expert opinion, the method choice determinant attribute merged using intersection finally obtains central air-conditioning energy It is as shown in Table 5 to consume key feature.
Table five:Energy consumption determinant attribute table
Above-described specific embodiment has carried out further the purpose of the present invention, technical scheme and beneficial effects It is described in detail, it should be understood that being not intended to limit the present invention the foregoing is merely a specific embodiment of the invention Protection scope, all within the spirits and principles of the present invention, any modification, equivalent substitution, improvement and etc. done should all include Within protection scope of the present invention.

Claims (4)

1. a kind of air-conditioning energy consumption feature selection approach based on big data, which is characterized in that include the following steps:
Step 1: carrying out preliminary screening to characteristic data set using expert opinion;
Step 2: being pre-processed to the characteristic data set Jing Guo preliminary screening;
Step 3: being based on pretreated feature set, new character subset 1 is extracted using boruta feature selecting algorithm;
Step 4: being based on pretreated feature set, new character subset 2 is extracted using lasso feature selecting algorithm;
Step 5: the character subset 2 that the character subset 1 and step 4 that are obtained based on step 3 are obtained is used in conjunction with expert opinion The method that intersection is sorted out obtains air-conditioning energy consumption key feature set.
2. a kind of air-conditioning energy consumption feature selection approach based on big data according to claim 1, which is characterized in that The pretreatment of the step 2 specifically includes following steps:
Step 2.1, restriction range, excluding outlier are set;
Step 2.2, the operation data under system stable condition is obtained using decision tree;
Step 2.3, it is spaced 5 minutes and takes data mean value, remove repetition point;
Step 2.4, merging data, and carry out data amplification;
Step 2.5, interpolation supplements missing data.
3. a kind of air-conditioning energy consumption feature selection approach based on big data according to claim 1, which is characterized in that New character subset 1 is extracted using boruta feature selecting algorithm in the step 3 and specifically includes following steps:
It step 3.1, is that given data set increases randomness by the shadow character of creation mixing copy;
Step 3.2, the growth data collection of one random forest of training classification, to assess the importance of each feature, more Gao Zeyue It is important;
Step 3.3, it checks whether each former feature than best shadow character has higher importance, and constantly deletes It is considered as very unessential feature;
Step 3.4, when all features are confirmed or are refused or algorithm reaches limitation as defined in one of random forest operation When, algorithm stops.
4. a kind of air-conditioning energy consumption feature selection approach based on big data according to claim 1, which is characterized in that The absolute value letter that new character subset 2 specifically uses model coefficient is extracted using lasso feature selecting algorithm in the step 4 Number carrys out compact model coefficient as punishment, becomes smaller or be set as 0 for part regression coefficient.
CN201810730455.7A 2018-07-05 2018-07-05 A kind of air-conditioning energy consumption feature selection approach based on big data Pending CN108921222A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810730455.7A CN108921222A (en) 2018-07-05 2018-07-05 A kind of air-conditioning energy consumption feature selection approach based on big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810730455.7A CN108921222A (en) 2018-07-05 2018-07-05 A kind of air-conditioning energy consumption feature selection approach based on big data

Publications (1)

Publication Number Publication Date
CN108921222A true CN108921222A (en) 2018-11-30

Family

ID=64424904

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810730455.7A Pending CN108921222A (en) 2018-07-05 2018-07-05 A kind of air-conditioning energy consumption feature selection approach based on big data

Country Status (1)

Country Link
CN (1) CN108921222A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140207714A1 (en) * 2013-01-18 2014-07-24 International Business Machines Corporation Transductive lasso for high-dimensional data regression problems
CN107169284A (en) * 2017-05-12 2017-09-15 北京理工大学 A kind of biomedical determinant attribute system of selection
CN107273387A (en) * 2016-04-08 2017-10-20 上海市玻森数据科技有限公司 Towards higher-dimension and unbalanced data classify it is integrated
CN107730154A (en) * 2017-11-23 2018-02-23 安趣盈(上海)投资咨询有限公司 Based on the parallel air control application method of more machine learning models and system
CN107909077A (en) * 2017-10-10 2018-04-13 安徽信息工程学院 Feature selection approach based on rarefaction theory in the case of semi-supervised
CN107992447A (en) * 2017-12-13 2018-05-04 电子科技大学 A kind of feature selecting decomposition method applied to river level prediction data

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140207714A1 (en) * 2013-01-18 2014-07-24 International Business Machines Corporation Transductive lasso for high-dimensional data regression problems
CN107273387A (en) * 2016-04-08 2017-10-20 上海市玻森数据科技有限公司 Towards higher-dimension and unbalanced data classify it is integrated
CN107169284A (en) * 2017-05-12 2017-09-15 北京理工大学 A kind of biomedical determinant attribute system of selection
CN107909077A (en) * 2017-10-10 2018-04-13 安徽信息工程学院 Feature selection approach based on rarefaction theory in the case of semi-supervised
CN107730154A (en) * 2017-11-23 2018-02-23 安趣盈(上海)投资咨询有限公司 Based on the parallel air control application method of more machine learning models and system
CN107992447A (en) * 2017-12-13 2018-05-04 电子科技大学 A kind of feature selecting decomposition method applied to river level prediction data

Similar Documents

Publication Publication Date Title
CN107563381B (en) Multi-feature fusion target detection method based on full convolution network
Peng Zipf’s law for Chinese cities: Rolling sample regressions
CN101409634B (en) Quantitative analysis tools and method for internet news influence based on information retrieval
CN107169628A (en) A kind of distribution network reliability evaluation method based on big data mutual information attribute reduction
CN107918664B (en) Social network data differential privacy protection method based on uncertain graph
CN106228068A (en) Android malicious code detecting method based on composite character
CN106372239A (en) Social network event correlation analysis method based on heterogeneous network
CN108711103A (en) Personal loan repays Risk Forecast Method, device, computer equipment and medium
CN106067034A (en) A kind of distribution network load curve clustering method based on higher dimensional matrix characteristic root
CN111309777A (en) Report data mining method for improving association rule based on mutual exclusion expression
CN110365603A (en) A kind of self adaptive network traffic classification method open based on 5G network capabilities
CN107465691A (en) Network attack detection system and detection method based on router log analysis
US20230169244A1 (en) Method for evaluating fracture connectivity and optimizing fracture parameters based on complex network theory
CN102663083A (en) Large-scale social network information extraction method based on distributed computation
CN107358534A (en) The unbiased data collecting system and acquisition method of social networks
CN103440308B (en) A kind of digital thesis search method based on form concept analysis
CN107861965A (en) Data intelligence recognition methods and system
CN102208027A (en) Method for evaluating land utilization spatial pattern based on clearance degree dimension
CN108921222A (en) A kind of air-conditioning energy consumption feature selection approach based on big data
CN105139373A (en) Objective non-reference image quality evaluation method based on independent subspace analysis
CN106550387B (en) A kind of wireless sensor network routing layer QoS evaluating method
CN113706459B (en) Detection and simulation repair device for abnormal brain area of autism patient
CN115272776A (en) Hyperspectral image classification method based on double-path convolution and double attention and storage medium
CN107506361A (en) Raster data polymerization and device, raster data decoupling method and apparatus and system
CN106447112A (en) Construction land scale prediction method of multiple cities based on stack limited Boltzmann machine

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20181130