CN107301604A - Multi-model fusion estimation system - Google Patents

Multi-model fusion estimation system Download PDF

Info

Publication number
CN107301604A
CN107301604A CN201710756125.0A CN201710756125A CN107301604A CN 107301604 A CN107301604 A CN 107301604A CN 201710756125 A CN201710756125 A CN 201710756125A CN 107301604 A CN107301604 A CN 107301604A
Authority
CN
China
Prior art keywords
energy
model
feature
estimation system
classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710756125.0A
Other languages
Chinese (zh)
Inventor
戴佳毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hualong Chongqing Strong Chongqing Credit Management Co Ltd
Original Assignee
Hualong Chongqing Strong Chongqing Credit Management Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hualong Chongqing Strong Chongqing Credit Management Co Ltd filed Critical Hualong Chongqing Strong Chongqing Credit Management Co Ltd
Priority to CN201710756125.0A priority Critical patent/CN107301604A/en
Publication of CN107301604A publication Critical patent/CN107301604A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/02Agriculture; Fishing; Forestry; Mining

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Marketing (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Tourism & Hospitality (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Animal Husbandry (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Agronomy & Crop Science (AREA)
  • Mining & Mineral Resources (AREA)
  • Marine Sciences & Fisheries (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A kind of multi-model fusion estimation system, the present invention relates to multi-model fusion estimation system.The present invention is difficult to select to solve existing energy efficiency calculating feature, the problem of model evaluation result is forbidden.Step of the present invention is:Step one:Data are normalized, obtain normalizing training set;Step 2:The normalization training set obtained to step one carries out feature selecting;Using the fusion method selected characteristic for being combined information gain and kernel principal component analysis;Calculated and obtained after feature ordering using information gain, calculation and check is done using principal component analysis method.Step 3:The evaluation model of multiple Classifiers Combination is set up according to step one and step 2, the classification results of energy efficiency evaluation are obtained;Step 4:The classification results obtained to step 3 carry out clustering, obtain final cluster result.The present invention is applied to the effective evaluation areas of energy efficiency.

Description

Multi-model fusion estimation system
Technical field
The present invention relates to multi-model fusion estimation system.
Background technology
With becoming increasingly conspicuous for energy problem and environmental problem, energy efficiency evaluation method is also increasingly subject to pay attention to.It is international Upper many scholars have studied improvement and the energy-saving potential of efficiency of energy utilization from different perspectives.By taking China as an example, passed through in recent years Ji maintains the powerful development of high speed, but the style of economic increase is still very extensive, resource and energy resource consumption are high, utilization rate is low, The serious present situation of environmental pollution is still undisputable fact, and efficiency of energy utilization is still within falling behind the stage in the world.At present, Unreasonable energy consumption structure of the China based on coal, has had a strong impact on the efficiency of energy utilization in whole energy system, right Social sustainable development constitutes challenge.Accordingly, it would be desirable to the key influence factor of energy efficiency is cleared, and each factor of quantitative analysis Influence degree.The quantitative study to efficiency of energy utilization, is mostly based on DATA ENVELOPMENT ANALYSIS METHOD (DEA) to energy efficiency at present Value carries out evaluation study.Some scholars also have studied on the basis of total-factor energy efficiency is calculated the industrial structure, technological progress, Influence of the factors such as open degree to energy efficiency.However, due to CHINESE REGION complexity and spatial development lack of uniformity, There are many scholars using the energy panel data between interzone, province, energy efficiency size between analysis different zones or province, And achieve effective computational methods and evaluation method.Therefore, energy efficiency is calculated using different energy sources index, it is impossible to true The practical factor of real reflection influence energy efficiency.
Utility model content
The invention aims to solve existing energy efficiency calculating feature to be difficult to select, and model evaluation result is not Quasi- the problem of, propose a kind of multi-model fusion estimation system.
A kind of multi-model fusion estimation system comprises the following steps:
The main body strategy of classification model construction of the present invention is as follows:Data are carried out with the standardization pretreatment of characteristic value, in order to just Really carry out feature selecting.On this basis, classification mark is carried out to data acquisition system, provides class label so that sorting algorithm learns To training set.Then, the disaggregated model of multiple Classifiers Combination that the present invention can be used is obtained by comparative analysis, and can be Used in prediction.
Step one:Data are normalized, obtain normalizing training set;
Step 2:The normalization training set obtained to step one carries out feature selecting;
Step 3:The evaluation model of multiple Classifiers Combination is set up according to step one and step 2, energy efficiency evaluation is obtained Classification results;
Step 4:The classification results obtained to step 3 carry out clustering, obtain final cluster result.
Beneficial effects of the present invention are:
The present invention proposes a kind of energy Performance Evaluation Methods based on multi-model convergence strategy, not only establishes based on many The disaggregated model of Multiple Classifier Fusion strategy, and predicted for the height of energy efficiency value;But also establish poly alanysis The Fusion Model of method, can make a distinction the high province of the energy efficiency province low with efficiency.Then utilized with Chinese energy Example research is carried out exemplified by efficiency rating:First, the 24 province related energy efficiency data of 9 years are collected, and are known using 2 kinds of features Other method determines the key influence factor of energy efficiency;Further, the degree of fitting of the fusion for classification model to being set up is carried out Comparative analysis, and for the prediction to energy efficiency height;Then, based on multi-model Fusion of Clustering strategy, further by the energy The province of the efficiency high province low with efficiency, which is accurately distinguished, comes.Finally, sent out for the overall energy efficiency of the China summed up Exhibition problem, gives and is correspondingly improved Proposals.Test result indicates that:The relatively single model method tool of multi-model convergence strategy There are preferably classification prediction and clustering effect.Therefore, the present invention, which has, preferably actually answers engineering application value.
1) Effective selection can be carried out to the alternative features for calculating energy efficiency, finding out wherein influences the relative of energy efficiency Principal element.
2) three kinds of single sorter models and Multiple Classifiers Combination Model Based are set up to energy efficiency between each province of China, classified And the numerical results of prediction are shown:Classification of the energy efficiency classification prediction effect than single model of Multiple Classifiers Combination Model Based Prediction effect will be got well, and the height of energy efficiency value more accurately can be classified.
3) it is based on multi-model Fusion of Clustering analysis method, it was found that the otherness of the energy efficiency of China each department and change Rule, can adaptably provide the analysis of causes and Suggestions for Development.
Brief description of the drawings
Fig. 1 is based on three kinds of grader Parallel Fusion strategic process figures.
Fig. 2 is multi-model Fusion of Clustering analysis strategy flow chart.
Embodiment
Embodiment one:A kind of multi-model fusion estimation system is concretely comprised the following steps:
Step one:Data are normalized, obtain normalizing training set;
Step 2:The normalization training set obtained to step one carries out feature selecting;
Step 3:The evaluation model of multiple Classifiers Combination is set up according to step one and step 2, energy efficiency evaluation is obtained Classification results;
Step 4:The classification results obtained to step 3 carry out clustering, obtain final cluster result.
Embodiment two:Present embodiment from unlike embodiment one:Data in the step one Specifically include:Primary energy output, energy resource consumption total amount, energy consumption elasticity, GDP, energy industry investment, unit Total output value energy consumption, stock of capital and sulfur dioxide (SO2) emissions coefficient.
Other steps and parameter are identical with embodiment one.
Embodiment three:Present embodiment from unlike embodiment one or two:Will in the step one Data are normalized, and the detailed process for obtaining normalizing training set is:
Collect the panel data of national multiple provinces, cities and autonomous regions, the pretreatment that data are standardized.The standard of data Change is the unit limitation that data bi-directional scaling is removed to data, is translated into nondimensional pure values, is convenient for comparing And weighting.0-1 standardization (also crying normalization) is the most typical method of data normalization, passes through the linear transformation to initial data Result is set to fall on [0,1] interval.Characteristic value in the data set used in view of the present invention is on the occasion of so after using simplifying Transfer function each component is normalized.If there is N number of sample, each m-th of feature of sample is handled, its table Up to form such as formula (1) Suo Shi:
Pretreated characteristic value is distributed in [0,1] interval, wherein the xim* returns for m-th of feature of i-th of sample Value after one change, xim is m-th of feature original value of i-th of sample.
Other steps and parameter are identical with embodiment one or two.
Embodiment four:Unlike one of present embodiment and embodiment one to three:The step 2 In the normalization training set that is obtained to step one carry out the detailed process of feature selecting and be:
Consider the various factors of influence energy efficiency, set up feature space, collect corresponding data, sample data carries out immeasurable Guiding principleization processing, carries out feature selecting.In order that the result of feature selecting is more accurate, the present invention is used information gain and core master The convergence strategy that analysis of components is combined chooses final feature.First, calculated using information gain and obtain feature ordering, then Calculation and check is done using principal component analysis method.
Using the fusion method selected characteristic for being combined information gain and kernel principal component analysis;Obtained using information gain It is descending to be ranked up to the corresponding information gain of different characteristic, the sequence of feature relative importance is obtained, main composition point is utilized Analysis method does calculation and check.
Core principle component analysis KPCA is principal component analysis PCA nonlinear extensions, and KPCA is by mapping function Φ handles Original vector is mapped to higher dimensional space F, and PCA analyses are carried out on F, the information of index can be extracted to greatest extent.Assuming that x1, X2 ... ... xM are training sample, and the input space is represented with { xi }.The basic thought of KPCA methods is will by certain implicit The input space is mapped to some higher dimensional space (being frequently referred to feature space), and realizes in feature space principal component analysis PCA.
Assuming that be mapped as Φ accordingly, kernel function K by mapping φ by implicit realization from point x to F mapping, and by Data meet the condition [15] of centralization in feature space obtained by this mapping, i.e.,
Then the covariance matrix in feature space is:
Now ask C eigenvalue λ >=0 and characteristic vector V ∈ F { 0 }, C ν=λ ν, and in view of all Characteristic vector be represented by Φ (x1), Φ (x2) ..., the linear of Φ (xM) then has
Wherein, v=1,2 ..., M.M × M dimension matrix Ks are defined, characteristic value and characteristic vector can be obtained, for test sample In being projected as characteristic vector space Vk
Inner product kernel function, which is replaced, then to be had
And it is possible to which further nuclear matrix is modified to
Other steps and parameter are identical with one of embodiment one to three.
Embodiment five:Unlike one of present embodiment and embodiment one to four:It is described to utilize letter Breath gain, which is calculated, to be obtained the detailed process of feature ordering and is:
Feature selecting is exactly, by searching for all possible characteristic set in data set, one group to be chosen according to certain rule Effective feature is to reduce the dimension of feature space.Meanwhile, avoid these by removing some redundancies of feature space Influence of the information to classification prediction, so as to improve the predictablity rate and computational efficiency of sorting algorithm.Information gain (IG) be into The most popular method of row feature selecting.
Wherein, in information gain, criterion is to see feature how much information can be brought for categorizing system, the letter brought Breath is more, and this feature is more important.For a feature, information content will change when system has it and do not have it, and front and rear letter The difference of breath amount is exactly the information content that this feature is brought to system.So-called information content, is exactly entropy.
If feature space is X, sample m-th of feature Xm, its information gain IG (Xm) are:
IG (Xm)=H (C)-H (C | Xm)
Class categories needed for wherein C is represented, H (C) represents the comentropy corresponding to C classes, and H (C | Xm) represent in feature Xm bars Comentropy when being C under part, belonging to class;
If classification C value is n kinds, each probability got is that p (Cj), j=1,2 ..., n, H (C) are:
Other steps and parameter are identical with one of embodiment one to four.
Embodiment six:Unlike one of present embodiment and embodiment one to five:The step 3 Middle evaluation model (the J48 in decision Tree algorithms after as training that multiple Classifiers Combination is set up according to step one and step 2 LogitBoost models in model, rule-based sorting algorithm, the JRip type learners three based on meta learning strategy it Between and sequence melt
Close), the detailed process for obtaining the classification results of energy efficiency evaluation is:
The algorithms that three kinds of present invention selection has good classification effect in many fields, including decision Tree algorithms, based on rule Sorting algorithm then and the meta learning device based on meta learning strategy.
Decision tree is also known as decision tree, is the induced learning algorithm based on example, from one group of out of order, random member The classifying rules of decision tree representation is inferred in group.It uses top-down recursive fashion, is saved in the inside of decision tree Point carries out the comparison of property value, and according to different property values from the node to inferior division.In tree each nonleaf node (including Root node) correspond to the test that training sample concentrates a non-category attribute, one of each branch correspondence attribute of nonleaf node Test result, each leaf node then represents a class or class distribution.From root to one classification of paths correspondence of leaf node Rule, whole decision tree just correspond to one group of expression formula rule of extracting.The present invention uses extensive C4.5 algorithms.C4.5 algorithms are It is improved and proposes for previous ID3 algorithms, it uses the method choice testing attribute based on information gain-ratio, information Ratio of profit increase is equal to ratio of the information gain to segmentation information amount.In the present invention, C4.5 is realized with J48 decision trees.
Rule-based classification is the method classified using one group of if ... then rule.The present invention uses JRip points Class device sets up rule, is realized by RIPPER algorithms.RIPPER algorithms use class-based sequencing schemes, belong to of a sort Rule occurs together in regular collection, and then these category informations of rule according to belonging to them sort together.Of a sort rule Relative ranks between then are unimportant, because they belong to same class.The algorithm directly from extracting data rule, is extracting rule When then, class y all training records are counted as positive example, and the training record of other classes is counted as counter-example.
Meta learning is to be learnt again on the basis of learning outcome or repeatedly learn and obtain final result.By A kind of improved machine learning method Adboost algorithms extensive uses in practice of Freud and Schapire.Its basic thought It is:One basic " Weak Classifier " is built based on available sample data set, " Weak Classifier " is called repeatedly, by every wheel The sample of misjudgement assigns bigger weight, it is more paid close attention to the sample that those difficulties are sentenced, final to use what is weighted through excessive repeating query ring Method is by " Weak Classifier " synthesis " strong classifier " of each wheel.
Multiple Classifiers Combination strategy can generally be summarized as string sequence fusion with and sequence merge.Due to Parallel Fusion classification side It is not present between the classification results inconsistence problems that formula can avoid string sequence fusion sequence different and cause, various graders mutual The problem of influence.Therefore, the mode of present invention selection and sequence fusion is classified to each attribute of factors affecting periodicals, simultaneously In the design of sequence integrated classification device, the possible bad student's deviation of result of different classifications device, this is accomplished by ballot and provides final result.Simply Ballot mode is that the weight between a kind of very directly perceived and efficient strategy, different classifications device is consistent so that classification results It can be explained stronger.In order that data classification average effect is obtained more preferably, it is necessary to select more random to data, thus present invention use The form of right-angled intersection computing chooses data.Classification results are the average value of 10 subseries, and between different base graders It is independent of each other.Based on the multi-model convergence strategy of above-mentioned three kinds conventional base graders, it is illustrated in fig. 1 shown below.
Analysis to energy efficiency is attributed to two class problems, i.e., the example in data set is divided into high energy source efficiency and low energy The class of source efficiency two, is set to 2, column label value takes 0 and 1, and 0 represents high energy source efficiency, and 1 represents low energy source efficiency by classification number.
Sorting algorithm is a lot, and three kinds of present invention selection has the algorithm of good classification effect, including decision-making in many fields Tree algorithm, rule-based sorting algorithm and the meta learning device based on meta learning strategy, carry out effective integration, so as to obtain by three Obtain the more optimal evaluation model based on multiple Classifiers Combination.
Acquisition is respectively carried out to the training set obtained in step one and step 2 using the method for 10 folding cross validations The disaggregated model training of tri- kinds of methods of J48, LogitBoost, JRip, to ensure model generalization performance.
The mode of simultaneously sequence fusion is taken afterwards, because the result of different classifications device takes the side of ballot there may be deviation Formula provides final result.Simple vote mode is that the weight between a kind of very directly perceived and efficient strategy, different classifications device is Consistent so that classification results can be explained relatively by force, and classification results are the average value of 10 test gained classification results.
J48 models in decision Tree algorithms are carried out respectively to the normalization training set of gained in step one, it is rule-based LogitBoost models in sorting algorithm, the training of the JRip types learner based on meta learning strategy obtain 3 kinds of models and (obtained In 3 model be training after decision Tree algorithms in J48 models, the LogitBoost moulds in rule-based sorting algorithm Type, the JRip types learner based on meta learning strategy);
Using the feature selected in step 2 as mode input variable, model is output as 0,1 classification (wherein, Mei Zhongmo The training of type is using the feature selected in step 2 as mode input variable, and 0,1 classification is as output, and 0 represents high-energy source Efficiency, 1 represents low energy source efficiency;The Training strategy used is 10 folding cross validation method), 0 represents high energy source efficiency, and 1 represents Low energy source efficiency;The Training strategy used is 10 folding cross validation method;
Whenever testing a new samples, it is separately input into 3 kinds of obtained models, 3 results is obtained, by weighing throwing The mode of ticket (the ballot mode that the minority is subordinate to the majority) obtains classification results.
Other steps and parameter are identical with one of embodiment one to five.
Embodiment seven:Unlike one of present embodiment and embodiment one to six:The step 4 In the classification results that are obtained to step 3 carry out clustering, the detailed process for obtaining final cluster result is:
The present invention is used as fusion basis from the class algorithm of Simple K-means, EM and FCM tri-.
Simple K-means are k means clustering algorithms:First have to specify the classification number k of cluster, k sample is taken at random As the center of initial classes, calculate the distance of each sample and class center and sorted out, all samples are counted again after the completion of dividing Suan Lei centers, repeat this process until class center no longer changes, the k classes of gained are final cluster result.
EM algorithms:Greatest hope (EM) algorithm is that parameter maximal possibility estimation or maximum a posteriori are found in probabilistic model The algorithm of estimation.It is seen as a successive approximation algorithm:The parameter of model is not aware that in advance, selection one that can be random Cover parameter or roughly give some initial parameter λ 0 in advance, determine the most probable state corresponding to this group of parameter, count The probability of the possible outcome of each training sample is calculated, again by sample to parameters revision in the state of current, parameter is reevaluated λ, and the state of model is redefined under new parameter, so, by multiple iteration, circulation until some condition of convergence expires Untill foot, it is possible to so that the parameter of model gradually approaching to reality parameter.
FCM clustering methods:Professor Zha De in California, USA university Berkeley branch school proposes the concept of " set " for the first time, By the development of more than ten years, in terms of Fuzzy Set Theory is applied to each practical application gradually.To overcome either-or point Class shortcoming, occurs in that the clustering using fuzzy set theory as Fundamentals of Mathematics.Clustering is carried out with the method for fuzzy mathematics, just It is fuzzy cluster analysis.FCM algorithms be it is a kind of determined with degree of membership each data point belong to some cluster degree algorithm, be A kind of improvement of traditional hard clustering algorithm.
In order that cluster result is more credible, the multi-model Fusion of Clustering analysis method that the present invention is used is as follows:Due to The class algorithm of Simple K-means and EM two is clustered using based on division methods, therefore elects basic clustering method as. Also, two kinds of algorithms are packed using Make Density Based Clusterer, enable to intend for each cluster The discrete distribution of unification or a symmetrical normal distribution.Realize and gradually clustered from entirety to local, local search ability is strong, receive Hold back speed fast.Both identical cluster results are picked out as preliminary Fusion of Clustering result, then utilize FCM clustering methods Calculation and check is carried out, final Fusion of Clustering result is provided.It is specific as shown in Figure 2.
Analyzed again for the high efficiency of energy class sample in classification results in step 3, carry out 2 cluster process, further Sample in efficient class is finely divided, wherein energy efficiency junior is filtered out, then return to poorly efficient class, as to step 3 Amendment, to obtain more accurate result.
Fusion basis is used as from the class algorithm of Simple K-means, EM and FCM tri-.The multi-model Fusion of Clustering of use Analysis method is as follows:Because the class algorithm of Simple K-means and EM two is clustered using based on division methods, therefore Elect basic clustering method as.Also, two kinds of algorithms are packed using Make Density Based Clusterer, are allowed to Can be one discrete distribution of each cluster fitting or a symmetrical normal distribution.Both identical cluster results are picked out It is used as preliminary Fusion of Clustering result, then carries out calculation and check using FCM clustering methods, provide final Fusion of Clustering knot Really.
Other steps and parameter are identical with one of embodiment one to six.
Embodiment one:
Example data sample is obtained and feature space is set up
The present invention collects 2005 to 2013 years whole nations, 24 provinces, cities and autonomous regions of China (without Tibet, Hong Kong, Macao and Taiwan, Jilin, black Longjiang, Guizhou, Yunnan, Gansu, Qinghai) panel data.According to the achievement in research of document, the feature space selected by the present invention Include primary energy output (F1), energy resource consumption total amount (F2), energy consumption elasticity (F3), GDP (F4), energy industry This 8 factors of investment (F5), production of units total value energy consumption (F6), stock of capital (F7) and sulfur dioxide (SO2) emissions coefficient (F8):
F1:Enterprise's (unit) of production primary energy is within the report period by the existing energy of nature by exploiting and output Qualified products, such as colliery digging raw coal, the crude oil of oilfield exploitation, natural gas, hydroelectric power plant's electricity that gas-field exploitation goes out etc. Deng.
F2:The various energy quantities of goods produced of energy unit actual consumption within the statistical report phase, take by defined computational methods Numerical value after summing and being converted with required unit of measurement.
F3:Ratio between energy-consuming growth rate and growth of the national economic speed.
F4:All final products kimonos that one all resident unit of country (in the range of national boundaries) produces over a period to come The market price of business.GDP is the core index of national economic accounting, is also to weigh a country macroeconomy situation weight Want index.
F5:Put into the capital investment of energy industry.
F6:In regular period, a country often produces the energy that the GDP of a unit is consumed, That is the ratio of energy total amount consumed and GDP.
F7:The existing all capital resource of enterprise, is the summation for all kinds of capitals for having put into enterprise.It is deposited with asset form It is being called asset reserve.According to it in process of production state in which can be divided into two classes:Participate in reproduction Asset reserve and asset reserve in idle state include idle factory building, machinery equipment etc..
F8:Sulfur dioxide (SO2) emissions quantity during the burning of each energy or use produced by unit source.
Feature selecting result and analysis
First, nondimensionalization processing is carried out to acquired sample data.Then, then carry out feature selecting calculate analysis.By Influenceed in the height of energy efficiency by factors, therefore multiple indexs will be considered to the measurement of energy efficiency, herein On the basis of, identify key influence factor, and uneven the making a prediction to each department future source of energy efficiency accordingly.
The research conclusion for being selected and being set according to existing information yield value, have selected information gain value is more than 0.0025 6 Individual feature is sorted.Further verified using principal component analysis method, obtained final result is as shown in table 1:
The different characteristic of table 1 sorts to the information gain of classification
It can be seen that from feature selecting result:6 features and the correlation of category attribute are stronger in the table 1 filtered out, are The key influence factor of energy efficiency.Wherein, F6 influence degree is maximum, is secondly F8, this 5 features pair of F7, F4, F1, F3 The influence degree of energy efficiency is close.And F5, F2 in data set the two features are filtered, to energy efficiency almost without shadow Ring.
Classification results and analysis
Analysis of the present invention to energy efficiency can equally be attributed to two class problems, i.e., the example in data set is divided into height Energy efficiency and the class of low energy source efficiency two, so classification number is set into 2 herein, column label value takes 0 and 1, and 0 represents high-energy source effect Rate, 1 represents low energy source efficiency.Then, selection F6, F8, F7, F4, F1, F3 is the key influence factor of energy efficiency, removes divisor According to the two attributes of concentration F5 and F2.
Select general measurement index:Accuracy precision-rate (PR), recall rate recall-Rate (RR) and F- Measure assesses the performance of three kinds of graders used in experiment.When calculating accuracy and recall rate, use bent in ROC Four indexs in line analysis:True positives (TP), false positive (FP), false negative (FN) and true negative (TN).Then, F- is taken Measure (FM) is used as the key index for weighing classifier performance for the harmonic-mean of accuracy and recall rate.Such as the institute of table 2 Show, be the classification results of energy efficiency influence factor data set difference integrated classification device and single grader.Wherein, OVSM and MCF represents single model result optimal value and multi-model fusion results respectively.
The data set of table 2 is respectively using the classification results of three kinds of graders
From table 2 it can be found that Fusion Model to compare single classifier performance more superior, also imply that according to hereinbefore The key influence factor of selection carries out the best grader of classification processing to energy efficiency, and the disaggregated model can be used to data Other provinces and cities not included or the data in other times are concentrated to carry out the prediction of energy efficiency height.
Predict the outcome and analyze
It has collected Jilin, Heilungkiang, Guizhou, Yunnan, Gansu, this six provinces energy efficiency influence factor in 2013 of Qinghai Data, after the standardization of 6 characteristic values, be predicted using above-mentioned disaggregated model, it is as a result as shown in table 3 below.
The test set of table 3 predicting the outcome using three kinds of graders respectively
As can be seen from Table 3:Each province is carried out pre- using integrated classification device model and single sorter model respectively The result measured is consistent.Jilin, Heilungkiang, Yunnan and Gansu are predicted to be 0 class, belong to high energy source efficiency;And it is expensive State, Qinghai predict the outcome as 1 class, that is, belong to low energy source efficiency.Also, the forecast confidence of integrated classification model is higher than list One model prediction optimal value, therefore, this predicts the outcome is easier to be adopted compared to single model.
Multi-model convergence strategy and cluster result analysis
First, the cluster result using single Simple two kinds of algorithms of K-means and EM is as follows:
1) 216 examples are divided into 2 classes by K-means.Cluster0 examples meter 140, accounts for whole instance number percentages For 65%;Cluster1 examples meter 76, it is 35% to account for whole instance number percentages.Data intensive data is by year compared Compared with overall condition is:Cluster0 classes F6 is lower than cluster1, i.e., the energy for often producing the GDP of a unit disappears Consumption is low;F3 is low compared with cluster1, during growth of the national economic speed identical, cluster0 class example energy consumption growth rate It is low;F8 is relatively relatively low compared with cluster1, i.e., the sulfur dioxide (SO2) emissions quantity produced by the burning of cluster0 unit sources is relatively low, can See that the pollution factor that it is produced to environment is relatively low.It was determined that cluster0 examples are high efficiency of energy class, cluster1 examples For the poorly efficient class of the energy.
2) 216 examples are also divided into 2 classes using EM clusters, wherein cluster0 examples meter 118, account for instance number hundred Divide than being 55%;Cluster1 examples meter 98, it is 45% to account for instance number percentage.Data are concentrated with the same year number according to comparing Compared with it is same that cluster0 classes F6 and F8 are generally lower than that cluster1, i.e. energy resource consumption efficiently used in economic growth When, environmental pollution degree is also relatively low.It is the poorly efficient class of the energy to thereby determine that cluster0 examples, and cluster1 examples are the energy Efficient class.
Based on the original fusion strategy shown in Fig. 2, the higher cluster result of preliminary precision has been obtained as shown in table 4:
The EM cluster results of China's each province's energy efficiency of table 4
Cluster result after being merged using FCM to EM with K-means further verifies analysis, and cluster result is consistent with table 4. As can be seen from the table:The poorly efficient class example quantity of the energy is incremented by with the time, in the majority for poorly efficient province from efficient transition, such as the Liao Dynasty Rather, Shanghai, Zhejiang, Hubei, Hunan, Sichuan and Shaanxi etc..Be chronically at high efficiency of energy state has Beijing, Fujian, Hainan, river The provinces such as west, and Shanxi, Shandong, the using energy source in Guangdong are chronically at inefficient state.To find out its cause, the transverse direction between each department Difference is attributable to economic structure difference, and the community energy efficiency by pillar industry of technology-intensive industries is generally high, to pass The manufacturing industry and processing industry etc. of uniting are generally low for the energy efficiency of pillar industry.And, although national data display per Unit GDP Energy Consumption It has been reduced that, but energy consumption elasticity is constantly in fluctuation status, and environmental pollution improvement's cost is in increase, energy loss amount Increase year by year.Search to the bottom, energy resource structure is unreasonable for a long time for China, more using coal as main energy sources;Economic Development Mode Resource consumption is relied primarily on, rather than by technological progress, the mode of management innovation.Accordingly, it would be desirable to Optimization of Energy Structure, transformation The style of economic increase, by science and technology, higher economic growth is maintained with relatively low energy consumption elasticity, is only significantly Improve the key of energy efficiency in degree ground.
The present embodiment have studied using the 9 years energy efficiency related datas in Chinese each province as example and merge plan based on multi-model Energy Efficiency Analysis evaluation method slightly, has drawn to draw a conclusion:
1) based on the various factors mentioned in collected various kinds of document, by information gain and principal component analysis method phase Feature selecting is implemented in combination with, the determinant of influence energy efficiency is have found, six kinds of determinants is identified from eight kinds of factors.
2) three kinds of single sorter models and Multiple Classifiers Combination Model Based are set up to energy efficiency between each province of China, classified And the numerical results of prediction are shown:Classification of the energy efficiency classification prediction effect than single model of Multiple Classifiers Combination Model Based Prediction effect will get well.
3) it is based on multi-model Fusion of Clustering analysis method, it was found that the otherness of the energy efficiency of China each department and change Rule, and give the corresponding analysis of causes and Suggestions for Development.
Therefore, the striving direction of Chinese energy efficiency improvement is:It is conceived to energy efficiency key influence factor, science, Targetedly Optimization of Energy Structure, transform mode of economic growth.Encourage and support technological invention and creation (to be particularly energy technology Field), the technological innovation of using energy source links is promoted, remains higher so as to realize with relatively low energy consumption elasticity Economic growth.In addition, it is necessary to according to taking into account tradition on the basis of energy supply and demand situation and energy utilization technology is considered The principle of the energy and new energy carries out energy consumption structure optimization and adjustment.
The present invention can also have other various embodiments, in the case of without departing substantially from spirit of the invention and its essence, this area Technical staff works as can make various corresponding changes and deformation according to the present invention, but these corresponding changes and deformation should all belong to The protection domain of appended claims of the invention.

Claims (7)

1. a kind of multi-model fusion estimation system, it is characterised in that:The multi-model fusion estimation system comprises the following steps:
Step one:Data are normalized, obtain normalizing training set;
Step 2:The normalization training set obtained to step one carries out feature selecting;
Step 3:The evaluation model of multiple Classifiers Combination is set up according to step one and step 2, point of energy efficiency evaluation is obtained Class result;
Step 4:The classification results obtained to step 3 carry out clustering, obtain final cluster result.
2. a kind of multi-model fusion estimation system according to claim 1, it is characterised in that:Data in the step one Specifically include:Primary energy output, energy resource consumption total amount, energy consumption elasticity, GDP, energy industry investment, unit Total output value energy consumption, stock of capital and sulfur dioxide (SO2) emissions coefficient.
3. a kind of multi-model fusion estimation system according to claim 2, it is characterised in that:By data in the step one It is normalized, the detailed process for obtaining normalizing training set is:The pretreatment of data normalization is also referred to as normalization, makes The pretreatment that data are standardized with the transfer function after simplification, if there is N number of sample, to each m-th of feature of sample Handled, shown in its expression-form such as formula (1):Xim*=xim Σ i=1Nxim, i=1,2 ..., N--- (1) pretreatments Characteristic value afterwards is distributed in [0,1] interval, wherein the xim* is the value after m-th of feature normalization of i-th of sample, xim For m-th of feature original value of i-th of sample.
4. a kind of multi-model fusion estimation system according to claim 3, it is characterised in that:To step in the step 2 The detailed process that one obtained normalization training set carries out feature selecting is:Mutually tied with kernel principal component analysis using by information gain The fusion method selected characteristic of conjunction;The corresponding information gain of different characteristic is obtained using information gain, it is descending to be arranged Sequence, calculation and check is done using principal component analysis method.
5. a kind of multi-model fusion estimation system according to claim 4, it is characterised in that:The utilization information gain is obtained Detailed process to the corresponding information gain of different characteristic is:If feature space is X, m-th of feature Xm of sample, its information gain IG (Xm) is:Class categories needed for IG (Xm)=H (C)-H (C | Xm) wherein C is represented, H (C) represents the comentropy corresponding to C classes, H (C | Xm) represent under the conditions of feature Xm, comentropy when being C belonging to class;If classification C value is n kinds, each is got Probability be for p (Cj), j=1,2 ..., n, H (C):H (C)=- Σ j=1np (Cj) logp (Cj).
6. a kind of multi-model fusion estimation system according to claim 5, it is characterised in that:According to step in the step 3 Rapid one and step 2 set up the evaluation model of multiple Classifiers Combination, obtain the detailed process of the classification results of energy efficiency evaluation For:J48 models in decision Tree algorithms, rule-based classification calculation are carried out respectively to the normalization training set of gained in step one LogitBoost models in method, the training of the JRip types learner based on meta learning strategy obtain 3 kinds of models;With institute in step 2 The feature of selection is as mode input, and model is output as 0,1 classification, and 0 represents high energy source efficiency, and 1 represents low energy source efficiency;Using Training strategy be 10 folding cross validation methods;Whenever testing a new samples, it is separately input into 3 kinds of obtained models, 3 results are obtained, classification results are obtained by way of weighing ballot.
7. a kind of multi-model fusion estimation system according to claim 6, it is characterised in that:To step in the step 4 Three obtained classification results carry out clustering, and the detailed process for obtaining final cluster result is:Obtained for step 3 High energy source efficiency sample in classification results is analyzed again, the identical that will be obtained using k means clustering algorithms and EM algorithms Cluster result recycles FCM clustering methods to carry out calculation and check, obtains final Fusion of Clustering as original fusion cluster result As a result.
CN201710756125.0A 2017-08-29 2017-08-29 Multi-model fusion estimation system Pending CN107301604A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710756125.0A CN107301604A (en) 2017-08-29 2017-08-29 Multi-model fusion estimation system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710756125.0A CN107301604A (en) 2017-08-29 2017-08-29 Multi-model fusion estimation system

Publications (1)

Publication Number Publication Date
CN107301604A true CN107301604A (en) 2017-10-27

Family

ID=60132562

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710756125.0A Pending CN107301604A (en) 2017-08-29 2017-08-29 Multi-model fusion estimation system

Country Status (1)

Country Link
CN (1) CN107301604A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229850A (en) * 2018-01-31 2018-06-29 厦门奥普拓自控科技有限公司 City-level energy consumption, environment protection digital management method and system based on industrial production network
CN109815989A (en) * 2018-12-28 2019-05-28 重庆华龙强渝信用管理有限公司 A kind of multi-model fusion estimation system
CN110009030A (en) * 2019-03-29 2019-07-12 华南理工大学 Sewage treatment method for diagnosing faults based on stacking meta learning strategy
CN110194041A (en) * 2019-05-19 2019-09-03 瑞立集团瑞安汽车零部件有限公司 The adaptive bodywork height adjusting method of Multi-source Information Fusion
CN110322150A (en) * 2019-07-04 2019-10-11 优估(上海)信息科技有限公司 A kind of signal auditing method, device and server
CN110378389A (en) * 2019-06-24 2019-10-25 苏州浪潮智能科技有限公司 A kind of Adaboost classifier calculated machine creating device
CN113255778A (en) * 2021-05-28 2021-08-13 广汽本田汽车有限公司 Welding spot quality detection method and device based on multi-model fusion and storage medium
CN113341704A (en) * 2021-05-28 2021-09-03 北京理工大学 Composite cycle energy conversion system
CN113392642A (en) * 2021-06-04 2021-09-14 北京师范大学 System and method for automatically labeling child-bearing case based on meta-learning

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229850A (en) * 2018-01-31 2018-06-29 厦门奥普拓自控科技有限公司 City-level energy consumption, environment protection digital management method and system based on industrial production network
CN109815989A (en) * 2018-12-28 2019-05-28 重庆华龙强渝信用管理有限公司 A kind of multi-model fusion estimation system
CN110009030A (en) * 2019-03-29 2019-07-12 华南理工大学 Sewage treatment method for diagnosing faults based on stacking meta learning strategy
CN110009030B (en) * 2019-03-29 2021-03-30 华南理工大学 Sewage treatment fault diagnosis method based on stacking meta-learning strategy
CN110194041A (en) * 2019-05-19 2019-09-03 瑞立集团瑞安汽车零部件有限公司 The adaptive bodywork height adjusting method of Multi-source Information Fusion
CN110378389A (en) * 2019-06-24 2019-10-25 苏州浪潮智能科技有限公司 A kind of Adaboost classifier calculated machine creating device
CN110322150A (en) * 2019-07-04 2019-10-11 优估(上海)信息科技有限公司 A kind of signal auditing method, device and server
CN110322150B (en) * 2019-07-04 2023-04-18 优估(上海)信息科技有限公司 Information auditing method, device and server
CN113255778A (en) * 2021-05-28 2021-08-13 广汽本田汽车有限公司 Welding spot quality detection method and device based on multi-model fusion and storage medium
CN113341704A (en) * 2021-05-28 2021-09-03 北京理工大学 Composite cycle energy conversion system
CN113392642A (en) * 2021-06-04 2021-09-14 北京师范大学 System and method for automatically labeling child-bearing case based on meta-learning
CN113392642B (en) * 2021-06-04 2023-06-02 北京师范大学 Automatic labeling system and method for child care cases based on meta learning

Similar Documents

Publication Publication Date Title
CN106845717A (en) A kind of energy efficiency evaluation method based on multi-model convergence strategy
CN107301604A (en) Multi-model fusion estimation system
CN111178624B (en) New product demand prediction method
CN106528874B (en) The CLR multi-tag data classification method of big data platform is calculated based on Spark memory
CN106022509A (en) Power distribution network space load prediction method taking region and load property dual differences into consideration
CN109902953A (en) A kind of classification of power customers method based on adaptive population cluster
CN103309953A (en) Method for labeling and searching for diversified pictures based on integration of multiple RBFNN classifiers
CN111178611A (en) Method for predicting daily electric quantity
CN109919236A (en) A kind of BP neural network multi-tag classification method based on label correlation
CN112418485A (en) Household load prediction method and system based on load characteristics and power consumption behavior mode
CN109816010A (en) A kind of CART increment study classification method based on selective ensemble for flight delay prediction
Wang et al. Design of the Sports Training Decision Support System Based on the Improved Association Rule, the Apriori Algorithm.
CN115131131A (en) Credit risk assessment method for unbalanced data set multi-stage integration model
CN106097094A (en) A kind of man-computer cooperation credit evaluation new model towards medium-sized and small enterprises
Qi et al. An interval-valued data classification method based on the unified representation frame
CN112785156B (en) Industrial collar and sleeve identification method based on clustering and comprehensive evaluation
CN104217296A (en) Listed company performance comprehensive evaluation method
CN111737924B (en) Method for selecting typical load characteristic transformer substation based on multi-source data
CN113591947A (en) Power data clustering method and device based on power consumption behaviors and storage medium
CN113762703A (en) Method and device for determining enterprise portrait, computing equipment and storage medium
CN117034046A (en) Flexible load adjustable potential evaluation method based on ISODATA clustering
CN114372835B (en) Comprehensive energy service potential customer identification method, system and computer equipment
Mao et al. Naive Bayesian algorithm classification model with local attribute weighted based on KNN
CN109992592A (en) Impoverished College Studentss recognition methods based on campus consumption card pipelined data
CN115829683A (en) Power integration commodity recommendation method and system based on inverse reward learning optimization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20171027

WD01 Invention patent application deemed withdrawn after publication