CN107301604A - Multi-model fusion estimation system - Google Patents
Multi-model fusion estimation system Download PDFInfo
- Publication number
- CN107301604A CN107301604A CN201710756125.0A CN201710756125A CN107301604A CN 107301604 A CN107301604 A CN 107301604A CN 201710756125 A CN201710756125 A CN 201710756125A CN 107301604 A CN107301604 A CN 107301604A
- Authority
- CN
- China
- Prior art keywords
- energy
- model
- feature
- estimation system
- classification
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000004927 fusion Effects 0.000 title claims abstract description 42
- 238000012549 training Methods 0.000 claims abstract description 30
- 238000011156 evaluation Methods 0.000 claims abstract description 14
- 238000010606 normalization Methods 0.000 claims abstract description 13
- 238000004364 calculation method Methods 0.000 claims abstract description 8
- 238000013210 evaluation model Methods 0.000 claims abstract description 7
- 238000000513 principal component analysis Methods 0.000 claims abstract description 5
- 238000012847 principal component analysis method Methods 0.000 claims abstract description 5
- 238000007500 overflow downdraw method Methods 0.000 claims abstract description 3
- 238000004422 calculation algorithm Methods 0.000 claims description 41
- 238000000034 method Methods 0.000 claims description 38
- RAHZWNYVWXNFOC-UHFFFAOYSA-N Sulphur dioxide Chemical compound O=S=O RAHZWNYVWXNFOC-UHFFFAOYSA-N 0.000 claims description 20
- 230000008569 process Effects 0.000 claims description 14
- 238000005265 energy consumption Methods 0.000 claims description 13
- 238000003066 decision tree Methods 0.000 claims description 11
- 238000012360 testing method Methods 0.000 claims description 9
- 230000006870 function Effects 0.000 claims description 5
- 238000002790 cross-validation Methods 0.000 claims description 4
- 238000005303 weighing Methods 0.000 claims description 3
- 238000003064 k means clustering Methods 0.000 claims description 2
- 238000012546 transfer Methods 0.000 claims description 2
- 238000002203 pretreatment Methods 0.000 claims 1
- 238000004458 analytical method Methods 0.000 description 22
- 230000000694 effects Effects 0.000 description 9
- 238000011161 development Methods 0.000 description 7
- 230000018109 developmental process Effects 0.000 description 7
- 230000008859 change Effects 0.000 description 5
- 238000009826 distribution Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 230000006872 improvement Effects 0.000 description 4
- 238000004519 manufacturing process Methods 0.000 description 4
- 238000013507 mapping Methods 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 238000013145 classification model Methods 0.000 description 3
- 239000003245 coal Substances 0.000 description 3
- 238000003912 environmental pollution Methods 0.000 description 3
- 238000005259 measurement Methods 0.000 description 3
- 238000005457 optimization Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 238000010835 comparative analysis Methods 0.000 description 2
- 238000000205 computational method Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- VNWKTOKETHGBQD-UHFFFAOYSA-N methane Chemical compound C VNWKTOKETHGBQD-UHFFFAOYSA-N 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 206010011469 Crying Diseases 0.000 description 1
- 241001269238 Data Species 0.000 description 1
- 102000008297 Nuclear Matrix-Associated Proteins Human genes 0.000 description 1
- 108010035916 Nuclear Matrix-Associated Proteins Proteins 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000007621 cluster analysis Methods 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 239000010779 crude oil Substances 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 239000003345 natural gas Substances 0.000 description 1
- 210000000299 nuclear matrix Anatomy 0.000 description 1
- 238000004445 quantitative analysis Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/02—Agriculture; Fishing; Forestry; Mining
Landscapes
- Business, Economics & Management (AREA)
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Marketing (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- Health & Medical Sciences (AREA)
- Economics (AREA)
- General Health & Medical Sciences (AREA)
- Human Resources & Organizations (AREA)
- Tourism & Hospitality (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- Animal Husbandry (AREA)
- Life Sciences & Earth Sciences (AREA)
- Agronomy & Crop Science (AREA)
- Mining & Mineral Resources (AREA)
- Marine Sciences & Fisheries (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A kind of multi-model fusion estimation system, the present invention relates to multi-model fusion estimation system.The present invention is difficult to select to solve existing energy efficiency calculating feature, the problem of model evaluation result is forbidden.Step of the present invention is:Step one:Data are normalized, obtain normalizing training set;Step 2:The normalization training set obtained to step one carries out feature selecting;Using the fusion method selected characteristic for being combined information gain and kernel principal component analysis;Calculated and obtained after feature ordering using information gain, calculation and check is done using principal component analysis method.Step 3:The evaluation model of multiple Classifiers Combination is set up according to step one and step 2, the classification results of energy efficiency evaluation are obtained;Step 4:The classification results obtained to step 3 carry out clustering, obtain final cluster result.The present invention is applied to the effective evaluation areas of energy efficiency.
Description
Technical field
The present invention relates to multi-model fusion estimation system.
Background technology
With becoming increasingly conspicuous for energy problem and environmental problem, energy efficiency evaluation method is also increasingly subject to pay attention to.It is international
Upper many scholars have studied improvement and the energy-saving potential of efficiency of energy utilization from different perspectives.By taking China as an example, passed through in recent years
Ji maintains the powerful development of high speed, but the style of economic increase is still very extensive, resource and energy resource consumption are high, utilization rate is low,
The serious present situation of environmental pollution is still undisputable fact, and efficiency of energy utilization is still within falling behind the stage in the world.At present,
Unreasonable energy consumption structure of the China based on coal, has had a strong impact on the efficiency of energy utilization in whole energy system, right
Social sustainable development constitutes challenge.Accordingly, it would be desirable to the key influence factor of energy efficiency is cleared, and each factor of quantitative analysis
Influence degree.The quantitative study to efficiency of energy utilization, is mostly based on DATA ENVELOPMENT ANALYSIS METHOD (DEA) to energy efficiency at present
Value carries out evaluation study.Some scholars also have studied on the basis of total-factor energy efficiency is calculated the industrial structure, technological progress,
Influence of the factors such as open degree to energy efficiency.However, due to CHINESE REGION complexity and spatial development lack of uniformity,
There are many scholars using the energy panel data between interzone, province, energy efficiency size between analysis different zones or province,
And achieve effective computational methods and evaluation method.Therefore, energy efficiency is calculated using different energy sources index, it is impossible to true
The practical factor of real reflection influence energy efficiency.
Utility model content
The invention aims to solve existing energy efficiency calculating feature to be difficult to select, and model evaluation result is not
Quasi- the problem of, propose a kind of multi-model fusion estimation system.
A kind of multi-model fusion estimation system comprises the following steps:
The main body strategy of classification model construction of the present invention is as follows:Data are carried out with the standardization pretreatment of characteristic value, in order to just
Really carry out feature selecting.On this basis, classification mark is carried out to data acquisition system, provides class label so that sorting algorithm learns
To training set.Then, the disaggregated model of multiple Classifiers Combination that the present invention can be used is obtained by comparative analysis, and can be
Used in prediction.
Step one:Data are normalized, obtain normalizing training set;
Step 2:The normalization training set obtained to step one carries out feature selecting;
Step 3:The evaluation model of multiple Classifiers Combination is set up according to step one and step 2, energy efficiency evaluation is obtained
Classification results;
Step 4:The classification results obtained to step 3 carry out clustering, obtain final cluster result.
Beneficial effects of the present invention are:
The present invention proposes a kind of energy Performance Evaluation Methods based on multi-model convergence strategy, not only establishes based on many
The disaggregated model of Multiple Classifier Fusion strategy, and predicted for the height of energy efficiency value;But also establish poly alanysis
The Fusion Model of method, can make a distinction the high province of the energy efficiency province low with efficiency.Then utilized with Chinese energy
Example research is carried out exemplified by efficiency rating:First, the 24 province related energy efficiency data of 9 years are collected, and are known using 2 kinds of features
Other method determines the key influence factor of energy efficiency;Further, the degree of fitting of the fusion for classification model to being set up is carried out
Comparative analysis, and for the prediction to energy efficiency height;Then, based on multi-model Fusion of Clustering strategy, further by the energy
The province of the efficiency high province low with efficiency, which is accurately distinguished, comes.Finally, sent out for the overall energy efficiency of the China summed up
Exhibition problem, gives and is correspondingly improved Proposals.Test result indicates that:The relatively single model method tool of multi-model convergence strategy
There are preferably classification prediction and clustering effect.Therefore, the present invention, which has, preferably actually answers engineering application value.
1) Effective selection can be carried out to the alternative features for calculating energy efficiency, finding out wherein influences the relative of energy efficiency
Principal element.
2) three kinds of single sorter models and Multiple Classifiers Combination Model Based are set up to energy efficiency between each province of China, classified
And the numerical results of prediction are shown:Classification of the energy efficiency classification prediction effect than single model of Multiple Classifiers Combination Model Based
Prediction effect will be got well, and the height of energy efficiency value more accurately can be classified.
3) it is based on multi-model Fusion of Clustering analysis method, it was found that the otherness of the energy efficiency of China each department and change
Rule, can adaptably provide the analysis of causes and Suggestions for Development.
Brief description of the drawings
Fig. 1 is based on three kinds of grader Parallel Fusion strategic process figures.
Fig. 2 is multi-model Fusion of Clustering analysis strategy flow chart.
Embodiment
Embodiment one:A kind of multi-model fusion estimation system is concretely comprised the following steps:
Step one:Data are normalized, obtain normalizing training set;
Step 2:The normalization training set obtained to step one carries out feature selecting;
Step 3:The evaluation model of multiple Classifiers Combination is set up according to step one and step 2, energy efficiency evaluation is obtained
Classification results;
Step 4:The classification results obtained to step 3 carry out clustering, obtain final cluster result.
Embodiment two:Present embodiment from unlike embodiment one:Data in the step one
Specifically include:Primary energy output, energy resource consumption total amount, energy consumption elasticity, GDP, energy industry investment, unit
Total output value energy consumption, stock of capital and sulfur dioxide (SO2) emissions coefficient.
Other steps and parameter are identical with embodiment one.
Embodiment three:Present embodiment from unlike embodiment one or two:Will in the step one
Data are normalized, and the detailed process for obtaining normalizing training set is:
Collect the panel data of national multiple provinces, cities and autonomous regions, the pretreatment that data are standardized.The standard of data
Change is the unit limitation that data bi-directional scaling is removed to data, is translated into nondimensional pure values, is convenient for comparing
And weighting.0-1 standardization (also crying normalization) is the most typical method of data normalization, passes through the linear transformation to initial data
Result is set to fall on [0,1] interval.Characteristic value in the data set used in view of the present invention is on the occasion of so after using simplifying
Transfer function each component is normalized.If there is N number of sample, each m-th of feature of sample is handled, its table
Up to form such as formula (1) Suo Shi:
Pretreated characteristic value is distributed in [0,1] interval, wherein the xim* returns for m-th of feature of i-th of sample
Value after one change, xim is m-th of feature original value of i-th of sample.
Other steps and parameter are identical with embodiment one or two.
Embodiment four:Unlike one of present embodiment and embodiment one to three:The step 2
In the normalization training set that is obtained to step one carry out the detailed process of feature selecting and be:
Consider the various factors of influence energy efficiency, set up feature space, collect corresponding data, sample data carries out immeasurable
Guiding principleization processing, carries out feature selecting.In order that the result of feature selecting is more accurate, the present invention is used information gain and core master
The convergence strategy that analysis of components is combined chooses final feature.First, calculated using information gain and obtain feature ordering, then
Calculation and check is done using principal component analysis method.
Using the fusion method selected characteristic for being combined information gain and kernel principal component analysis;Obtained using information gain
It is descending to be ranked up to the corresponding information gain of different characteristic, the sequence of feature relative importance is obtained, main composition point is utilized
Analysis method does calculation and check.
Core principle component analysis KPCA is principal component analysis PCA nonlinear extensions, and KPCA is by mapping function Φ handles
Original vector is mapped to higher dimensional space F, and PCA analyses are carried out on F, the information of index can be extracted to greatest extent.Assuming that x1,
X2 ... ... xM are training sample, and the input space is represented with { xi }.The basic thought of KPCA methods is will by certain implicit
The input space is mapped to some higher dimensional space (being frequently referred to feature space), and realizes in feature space principal component analysis PCA.
Assuming that be mapped as Φ accordingly, kernel function K by mapping φ by implicit realization from point x to F mapping, and by
Data meet the condition [15] of centralization in feature space obtained by this mapping, i.e.,
Then the covariance matrix in feature space is:
Now ask C eigenvalue λ >=0 and characteristic vector V ∈ F { 0 }, C ν=λ ν, and in view of all
Characteristic vector be represented by Φ (x1), Φ (x2) ..., the linear of Φ (xM) then has
Wherein, v=1,2 ..., M.M × M dimension matrix Ks are defined, characteristic value and characteristic vector can be obtained, for test sample
In being projected as characteristic vector space Vk
Inner product kernel function, which is replaced, then to be had
And it is possible to which further nuclear matrix is modified to
Other steps and parameter are identical with one of embodiment one to three.
Embodiment five:Unlike one of present embodiment and embodiment one to four:It is described to utilize letter
Breath gain, which is calculated, to be obtained the detailed process of feature ordering and is:
Feature selecting is exactly, by searching for all possible characteristic set in data set, one group to be chosen according to certain rule
Effective feature is to reduce the dimension of feature space.Meanwhile, avoid these by removing some redundancies of feature space
Influence of the information to classification prediction, so as to improve the predictablity rate and computational efficiency of sorting algorithm.Information gain (IG) be into
The most popular method of row feature selecting.
Wherein, in information gain, criterion is to see feature how much information can be brought for categorizing system, the letter brought
Breath is more, and this feature is more important.For a feature, information content will change when system has it and do not have it, and front and rear letter
The difference of breath amount is exactly the information content that this feature is brought to system.So-called information content, is exactly entropy.
If feature space is X, sample m-th of feature Xm, its information gain IG (Xm) are:
IG (Xm)=H (C)-H (C | Xm)
Class categories needed for wherein C is represented, H (C) represents the comentropy corresponding to C classes, and H (C | Xm) represent in feature Xm bars
Comentropy when being C under part, belonging to class;
If classification C value is n kinds, each probability got is that p (Cj), j=1,2 ..., n, H (C) are:
Other steps and parameter are identical with one of embodiment one to four.
Embodiment six:Unlike one of present embodiment and embodiment one to five:The step 3
Middle evaluation model (the J48 in decision Tree algorithms after as training that multiple Classifiers Combination is set up according to step one and step 2
LogitBoost models in model, rule-based sorting algorithm, the JRip type learners three based on meta learning strategy it
Between and sequence melt
Close), the detailed process for obtaining the classification results of energy efficiency evaluation is:
The algorithms that three kinds of present invention selection has good classification effect in many fields, including decision Tree algorithms, based on rule
Sorting algorithm then and the meta learning device based on meta learning strategy.
Decision tree is also known as decision tree, is the induced learning algorithm based on example, from one group of out of order, random member
The classifying rules of decision tree representation is inferred in group.It uses top-down recursive fashion, is saved in the inside of decision tree
Point carries out the comparison of property value, and according to different property values from the node to inferior division.In tree each nonleaf node (including
Root node) correspond to the test that training sample concentrates a non-category attribute, one of each branch correspondence attribute of nonleaf node
Test result, each leaf node then represents a class or class distribution.From root to one classification of paths correspondence of leaf node
Rule, whole decision tree just correspond to one group of expression formula rule of extracting.The present invention uses extensive C4.5 algorithms.C4.5 algorithms are
It is improved and proposes for previous ID3 algorithms, it uses the method choice testing attribute based on information gain-ratio, information
Ratio of profit increase is equal to ratio of the information gain to segmentation information amount.In the present invention, C4.5 is realized with J48 decision trees.
Rule-based classification is the method classified using one group of if ... then rule.The present invention uses JRip points
Class device sets up rule, is realized by RIPPER algorithms.RIPPER algorithms use class-based sequencing schemes, belong to of a sort
Rule occurs together in regular collection, and then these category informations of rule according to belonging to them sort together.Of a sort rule
Relative ranks between then are unimportant, because they belong to same class.The algorithm directly from extracting data rule, is extracting rule
When then, class y all training records are counted as positive example, and the training record of other classes is counted as counter-example.
Meta learning is to be learnt again on the basis of learning outcome or repeatedly learn and obtain final result.By
A kind of improved machine learning method Adboost algorithms extensive uses in practice of Freud and Schapire.Its basic thought
It is:One basic " Weak Classifier " is built based on available sample data set, " Weak Classifier " is called repeatedly, by every wheel
The sample of misjudgement assigns bigger weight, it is more paid close attention to the sample that those difficulties are sentenced, final to use what is weighted through excessive repeating query ring
Method is by " Weak Classifier " synthesis " strong classifier " of each wheel.
Multiple Classifiers Combination strategy can generally be summarized as string sequence fusion with and sequence merge.Due to Parallel Fusion classification side
It is not present between the classification results inconsistence problems that formula can avoid string sequence fusion sequence different and cause, various graders mutual
The problem of influence.Therefore, the mode of present invention selection and sequence fusion is classified to each attribute of factors affecting periodicals, simultaneously
In the design of sequence integrated classification device, the possible bad student's deviation of result of different classifications device, this is accomplished by ballot and provides final result.Simply
Ballot mode is that the weight between a kind of very directly perceived and efficient strategy, different classifications device is consistent so that classification results
It can be explained stronger.In order that data classification average effect is obtained more preferably, it is necessary to select more random to data, thus present invention use
The form of right-angled intersection computing chooses data.Classification results are the average value of 10 subseries, and between different base graders
It is independent of each other.Based on the multi-model convergence strategy of above-mentioned three kinds conventional base graders, it is illustrated in fig. 1 shown below.
Analysis to energy efficiency is attributed to two class problems, i.e., the example in data set is divided into high energy source efficiency and low energy
The class of source efficiency two, is set to 2, column label value takes 0 and 1, and 0 represents high energy source efficiency, and 1 represents low energy source efficiency by classification number.
Sorting algorithm is a lot, and three kinds of present invention selection has the algorithm of good classification effect, including decision-making in many fields
Tree algorithm, rule-based sorting algorithm and the meta learning device based on meta learning strategy, carry out effective integration, so as to obtain by three
Obtain the more optimal evaluation model based on multiple Classifiers Combination.
Acquisition is respectively carried out to the training set obtained in step one and step 2 using the method for 10 folding cross validations
The disaggregated model training of tri- kinds of methods of J48, LogitBoost, JRip, to ensure model generalization performance.
The mode of simultaneously sequence fusion is taken afterwards, because the result of different classifications device takes the side of ballot there may be deviation
Formula provides final result.Simple vote mode is that the weight between a kind of very directly perceived and efficient strategy, different classifications device is
Consistent so that classification results can be explained relatively by force, and classification results are the average value of 10 test gained classification results.
J48 models in decision Tree algorithms are carried out respectively to the normalization training set of gained in step one, it is rule-based
LogitBoost models in sorting algorithm, the training of the JRip types learner based on meta learning strategy obtain 3 kinds of models and (obtained
In 3 model be training after decision Tree algorithms in J48 models, the LogitBoost moulds in rule-based sorting algorithm
Type, the JRip types learner based on meta learning strategy);
Using the feature selected in step 2 as mode input variable, model is output as 0,1 classification (wherein, Mei Zhongmo
The training of type is using the feature selected in step 2 as mode input variable, and 0,1 classification is as output, and 0 represents high-energy source
Efficiency, 1 represents low energy source efficiency;The Training strategy used is 10 folding cross validation method), 0 represents high energy source efficiency, and 1 represents
Low energy source efficiency;The Training strategy used is 10 folding cross validation method;
Whenever testing a new samples, it is separately input into 3 kinds of obtained models, 3 results is obtained, by weighing throwing
The mode of ticket (the ballot mode that the minority is subordinate to the majority) obtains classification results.
Other steps and parameter are identical with one of embodiment one to five.
Embodiment seven:Unlike one of present embodiment and embodiment one to six:The step 4
In the classification results that are obtained to step 3 carry out clustering, the detailed process for obtaining final cluster result is:
The present invention is used as fusion basis from the class algorithm of Simple K-means, EM and FCM tri-.
Simple K-means are k means clustering algorithms:First have to specify the classification number k of cluster, k sample is taken at random
As the center of initial classes, calculate the distance of each sample and class center and sorted out, all samples are counted again after the completion of dividing
Suan Lei centers, repeat this process until class center no longer changes, the k classes of gained are final cluster result.
EM algorithms:Greatest hope (EM) algorithm is that parameter maximal possibility estimation or maximum a posteriori are found in probabilistic model
The algorithm of estimation.It is seen as a successive approximation algorithm:The parameter of model is not aware that in advance, selection one that can be random
Cover parameter or roughly give some initial parameter λ 0 in advance, determine the most probable state corresponding to this group of parameter, count
The probability of the possible outcome of each training sample is calculated, again by sample to parameters revision in the state of current, parameter is reevaluated
λ, and the state of model is redefined under new parameter, so, by multiple iteration, circulation until some condition of convergence expires
Untill foot, it is possible to so that the parameter of model gradually approaching to reality parameter.
FCM clustering methods:Professor Zha De in California, USA university Berkeley branch school proposes the concept of " set " for the first time,
By the development of more than ten years, in terms of Fuzzy Set Theory is applied to each practical application gradually.To overcome either-or point
Class shortcoming, occurs in that the clustering using fuzzy set theory as Fundamentals of Mathematics.Clustering is carried out with the method for fuzzy mathematics, just
It is fuzzy cluster analysis.FCM algorithms be it is a kind of determined with degree of membership each data point belong to some cluster degree algorithm, be
A kind of improvement of traditional hard clustering algorithm.
In order that cluster result is more credible, the multi-model Fusion of Clustering analysis method that the present invention is used is as follows:Due to
The class algorithm of Simple K-means and EM two is clustered using based on division methods, therefore elects basic clustering method as.
Also, two kinds of algorithms are packed using Make Density Based Clusterer, enable to intend for each cluster
The discrete distribution of unification or a symmetrical normal distribution.Realize and gradually clustered from entirety to local, local search ability is strong, receive
Hold back speed fast.Both identical cluster results are picked out as preliminary Fusion of Clustering result, then utilize FCM clustering methods
Calculation and check is carried out, final Fusion of Clustering result is provided.It is specific as shown in Figure 2.
Analyzed again for the high efficiency of energy class sample in classification results in step 3, carry out 2 cluster process, further
Sample in efficient class is finely divided, wherein energy efficiency junior is filtered out, then return to poorly efficient class, as to step 3
Amendment, to obtain more accurate result.
Fusion basis is used as from the class algorithm of Simple K-means, EM and FCM tri-.The multi-model Fusion of Clustering of use
Analysis method is as follows:Because the class algorithm of Simple K-means and EM two is clustered using based on division methods, therefore
Elect basic clustering method as.Also, two kinds of algorithms are packed using Make Density Based Clusterer, are allowed to
Can be one discrete distribution of each cluster fitting or a symmetrical normal distribution.Both identical cluster results are picked out
It is used as preliminary Fusion of Clustering result, then carries out calculation and check using FCM clustering methods, provide final Fusion of Clustering knot
Really.
Other steps and parameter are identical with one of embodiment one to six.
Embodiment one:
Example data sample is obtained and feature space is set up
The present invention collects 2005 to 2013 years whole nations, 24 provinces, cities and autonomous regions of China (without Tibet, Hong Kong, Macao and Taiwan, Jilin, black
Longjiang, Guizhou, Yunnan, Gansu, Qinghai) panel data.According to the achievement in research of document, the feature space selected by the present invention
Include primary energy output (F1), energy resource consumption total amount (F2), energy consumption elasticity (F3), GDP (F4), energy industry
This 8 factors of investment (F5), production of units total value energy consumption (F6), stock of capital (F7) and sulfur dioxide (SO2) emissions coefficient (F8):
F1:Enterprise's (unit) of production primary energy is within the report period by the existing energy of nature by exploiting and output
Qualified products, such as colliery digging raw coal, the crude oil of oilfield exploitation, natural gas, hydroelectric power plant's electricity that gas-field exploitation goes out etc.
Deng.
F2:The various energy quantities of goods produced of energy unit actual consumption within the statistical report phase, take by defined computational methods
Numerical value after summing and being converted with required unit of measurement.
F3:Ratio between energy-consuming growth rate and growth of the national economic speed.
F4:All final products kimonos that one all resident unit of country (in the range of national boundaries) produces over a period to come
The market price of business.GDP is the core index of national economic accounting, is also to weigh a country macroeconomy situation weight
Want index.
F5:Put into the capital investment of energy industry.
F6:In regular period, a country often produces the energy that the GDP of a unit is consumed,
That is the ratio of energy total amount consumed and GDP.
F7:The existing all capital resource of enterprise, is the summation for all kinds of capitals for having put into enterprise.It is deposited with asset form
It is being called asset reserve.According to it in process of production state in which can be divided into two classes:Participate in reproduction
Asset reserve and asset reserve in idle state include idle factory building, machinery equipment etc..
F8:Sulfur dioxide (SO2) emissions quantity during the burning of each energy or use produced by unit source.
Feature selecting result and analysis
First, nondimensionalization processing is carried out to acquired sample data.Then, then carry out feature selecting calculate analysis.By
Influenceed in the height of energy efficiency by factors, therefore multiple indexs will be considered to the measurement of energy efficiency, herein
On the basis of, identify key influence factor, and uneven the making a prediction to each department future source of energy efficiency accordingly.
The research conclusion for being selected and being set according to existing information yield value, have selected information gain value is more than 0.0025 6
Individual feature is sorted.Further verified using principal component analysis method, obtained final result is as shown in table 1:
The different characteristic of table 1 sorts to the information gain of classification
It can be seen that from feature selecting result:6 features and the correlation of category attribute are stronger in the table 1 filtered out, are
The key influence factor of energy efficiency.Wherein, F6 influence degree is maximum, is secondly F8, this 5 features pair of F7, F4, F1, F3
The influence degree of energy efficiency is close.And F5, F2 in data set the two features are filtered, to energy efficiency almost without shadow
Ring.
Classification results and analysis
Analysis of the present invention to energy efficiency can equally be attributed to two class problems, i.e., the example in data set is divided into height
Energy efficiency and the class of low energy source efficiency two, so classification number is set into 2 herein, column label value takes 0 and 1, and 0 represents high-energy source effect
Rate, 1 represents low energy source efficiency.Then, selection F6, F8, F7, F4, F1, F3 is the key influence factor of energy efficiency, removes divisor
According to the two attributes of concentration F5 and F2.
Select general measurement index:Accuracy precision-rate (PR), recall rate recall-Rate (RR) and F-
Measure assesses the performance of three kinds of graders used in experiment.When calculating accuracy and recall rate, use bent in ROC
Four indexs in line analysis:True positives (TP), false positive (FP), false negative (FN) and true negative (TN).Then, F- is taken
Measure (FM) is used as the key index for weighing classifier performance for the harmonic-mean of accuracy and recall rate.Such as the institute of table 2
Show, be the classification results of energy efficiency influence factor data set difference integrated classification device and single grader.Wherein, OVSM and
MCF represents single model result optimal value and multi-model fusion results respectively.
The data set of table 2 is respectively using the classification results of three kinds of graders
From table 2 it can be found that Fusion Model to compare single classifier performance more superior, also imply that according to hereinbefore
The key influence factor of selection carries out the best grader of classification processing to energy efficiency, and the disaggregated model can be used to data
Other provinces and cities not included or the data in other times are concentrated to carry out the prediction of energy efficiency height.
Predict the outcome and analyze
It has collected Jilin, Heilungkiang, Guizhou, Yunnan, Gansu, this six provinces energy efficiency influence factor in 2013 of Qinghai
Data, after the standardization of 6 characteristic values, be predicted using above-mentioned disaggregated model, it is as a result as shown in table 3 below.
The test set of table 3 predicting the outcome using three kinds of graders respectively
As can be seen from Table 3:Each province is carried out pre- using integrated classification device model and single sorter model respectively
The result measured is consistent.Jilin, Heilungkiang, Yunnan and Gansu are predicted to be 0 class, belong to high energy source efficiency;And it is expensive
State, Qinghai predict the outcome as 1 class, that is, belong to low energy source efficiency.Also, the forecast confidence of integrated classification model is higher than list
One model prediction optimal value, therefore, this predicts the outcome is easier to be adopted compared to single model.
Multi-model convergence strategy and cluster result analysis
First, the cluster result using single Simple two kinds of algorithms of K-means and EM is as follows:
1) 216 examples are divided into 2 classes by K-means.Cluster0 examples meter 140, accounts for whole instance number percentages
For 65%;Cluster1 examples meter 76, it is 35% to account for whole instance number percentages.Data intensive data is by year compared
Compared with overall condition is:Cluster0 classes F6 is lower than cluster1, i.e., the energy for often producing the GDP of a unit disappears
Consumption is low;F3 is low compared with cluster1, during growth of the national economic speed identical, cluster0 class example energy consumption growth rate
It is low;F8 is relatively relatively low compared with cluster1, i.e., the sulfur dioxide (SO2) emissions quantity produced by the burning of cluster0 unit sources is relatively low, can
See that the pollution factor that it is produced to environment is relatively low.It was determined that cluster0 examples are high efficiency of energy class, cluster1 examples
For the poorly efficient class of the energy.
2) 216 examples are also divided into 2 classes using EM clusters, wherein cluster0 examples meter 118, account for instance number hundred
Divide than being 55%;Cluster1 examples meter 98, it is 45% to account for instance number percentage.Data are concentrated with the same year number according to comparing
Compared with it is same that cluster0 classes F6 and F8 are generally lower than that cluster1, i.e. energy resource consumption efficiently used in economic growth
When, environmental pollution degree is also relatively low.It is the poorly efficient class of the energy to thereby determine that cluster0 examples, and cluster1 examples are the energy
Efficient class.
Based on the original fusion strategy shown in Fig. 2, the higher cluster result of preliminary precision has been obtained as shown in table 4:
The EM cluster results of China's each province's energy efficiency of table 4
Cluster result after being merged using FCM to EM with K-means further verifies analysis, and cluster result is consistent with table 4.
As can be seen from the table:The poorly efficient class example quantity of the energy is incremented by with the time, in the majority for poorly efficient province from efficient transition, such as the Liao Dynasty
Rather, Shanghai, Zhejiang, Hubei, Hunan, Sichuan and Shaanxi etc..Be chronically at high efficiency of energy state has Beijing, Fujian, Hainan, river
The provinces such as west, and Shanxi, Shandong, the using energy source in Guangdong are chronically at inefficient state.To find out its cause, the transverse direction between each department
Difference is attributable to economic structure difference, and the community energy efficiency by pillar industry of technology-intensive industries is generally high, to pass
The manufacturing industry and processing industry etc. of uniting are generally low for the energy efficiency of pillar industry.And, although national data display per Unit GDP Energy Consumption
It has been reduced that, but energy consumption elasticity is constantly in fluctuation status, and environmental pollution improvement's cost is in increase, energy loss amount
Increase year by year.Search to the bottom, energy resource structure is unreasonable for a long time for China, more using coal as main energy sources;Economic Development Mode
Resource consumption is relied primarily on, rather than by technological progress, the mode of management innovation.Accordingly, it would be desirable to Optimization of Energy Structure, transformation
The style of economic increase, by science and technology, higher economic growth is maintained with relatively low energy consumption elasticity, is only significantly
Improve the key of energy efficiency in degree ground.
The present embodiment have studied using the 9 years energy efficiency related datas in Chinese each province as example and merge plan based on multi-model
Energy Efficiency Analysis evaluation method slightly, has drawn to draw a conclusion:
1) based on the various factors mentioned in collected various kinds of document, by information gain and principal component analysis method phase
Feature selecting is implemented in combination with, the determinant of influence energy efficiency is have found, six kinds of determinants is identified from eight kinds of factors.
2) three kinds of single sorter models and Multiple Classifiers Combination Model Based are set up to energy efficiency between each province of China, classified
And the numerical results of prediction are shown:Classification of the energy efficiency classification prediction effect than single model of Multiple Classifiers Combination Model Based
Prediction effect will get well.
3) it is based on multi-model Fusion of Clustering analysis method, it was found that the otherness of the energy efficiency of China each department and change
Rule, and give the corresponding analysis of causes and Suggestions for Development.
Therefore, the striving direction of Chinese energy efficiency improvement is:It is conceived to energy efficiency key influence factor, science,
Targetedly Optimization of Energy Structure, transform mode of economic growth.Encourage and support technological invention and creation (to be particularly energy technology
Field), the technological innovation of using energy source links is promoted, remains higher so as to realize with relatively low energy consumption elasticity
Economic growth.In addition, it is necessary to according to taking into account tradition on the basis of energy supply and demand situation and energy utilization technology is considered
The principle of the energy and new energy carries out energy consumption structure optimization and adjustment.
The present invention can also have other various embodiments, in the case of without departing substantially from spirit of the invention and its essence, this area
Technical staff works as can make various corresponding changes and deformation according to the present invention, but these corresponding changes and deformation should all belong to
The protection domain of appended claims of the invention.
Claims (7)
1. a kind of multi-model fusion estimation system, it is characterised in that:The multi-model fusion estimation system comprises the following steps:
Step one:Data are normalized, obtain normalizing training set;
Step 2:The normalization training set obtained to step one carries out feature selecting;
Step 3:The evaluation model of multiple Classifiers Combination is set up according to step one and step 2, point of energy efficiency evaluation is obtained
Class result;
Step 4:The classification results obtained to step 3 carry out clustering, obtain final cluster result.
2. a kind of multi-model fusion estimation system according to claim 1, it is characterised in that:Data in the step one
Specifically include:Primary energy output, energy resource consumption total amount, energy consumption elasticity, GDP, energy industry investment, unit
Total output value energy consumption, stock of capital and sulfur dioxide (SO2) emissions coefficient.
3. a kind of multi-model fusion estimation system according to claim 2, it is characterised in that:By data in the step one
It is normalized, the detailed process for obtaining normalizing training set is:The pretreatment of data normalization is also referred to as normalization, makes
The pretreatment that data are standardized with the transfer function after simplification, if there is N number of sample, to each m-th of feature of sample
Handled, shown in its expression-form such as formula (1):Xim*=xim Σ i=1Nxim, i=1,2 ..., N--- (1) pretreatments
Characteristic value afterwards is distributed in [0,1] interval, wherein the xim* is the value after m-th of feature normalization of i-th of sample, xim
For m-th of feature original value of i-th of sample.
4. a kind of multi-model fusion estimation system according to claim 3, it is characterised in that:To step in the step 2
The detailed process that one obtained normalization training set carries out feature selecting is:Mutually tied with kernel principal component analysis using by information gain
The fusion method selected characteristic of conjunction;The corresponding information gain of different characteristic is obtained using information gain, it is descending to be arranged
Sequence, calculation and check is done using principal component analysis method.
5. a kind of multi-model fusion estimation system according to claim 4, it is characterised in that:The utilization information gain is obtained
Detailed process to the corresponding information gain of different characteristic is:If feature space is X, m-th of feature Xm of sample, its information gain
IG (Xm) is:Class categories needed for IG (Xm)=H (C)-H (C | Xm) wherein C is represented, H (C) represents the comentropy corresponding to C classes,
H (C | Xm) represent under the conditions of feature Xm, comentropy when being C belonging to class;If classification C value is n kinds, each is got
Probability be for p (Cj), j=1,2 ..., n, H (C):H (C)=- Σ j=1np (Cj) logp (Cj).
6. a kind of multi-model fusion estimation system according to claim 5, it is characterised in that:According to step in the step 3
Rapid one and step 2 set up the evaluation model of multiple Classifiers Combination, obtain the detailed process of the classification results of energy efficiency evaluation
For:J48 models in decision Tree algorithms, rule-based classification calculation are carried out respectively to the normalization training set of gained in step one
LogitBoost models in method, the training of the JRip types learner based on meta learning strategy obtain 3 kinds of models;With institute in step 2
The feature of selection is as mode input, and model is output as 0,1 classification, and 0 represents high energy source efficiency, and 1 represents low energy source efficiency;Using
Training strategy be 10 folding cross validation methods;Whenever testing a new samples, it is separately input into 3 kinds of obtained models,
3 results are obtained, classification results are obtained by way of weighing ballot.
7. a kind of multi-model fusion estimation system according to claim 6, it is characterised in that:To step in the step 4
Three obtained classification results carry out clustering, and the detailed process for obtaining final cluster result is:Obtained for step 3
High energy source efficiency sample in classification results is analyzed again, the identical that will be obtained using k means clustering algorithms and EM algorithms
Cluster result recycles FCM clustering methods to carry out calculation and check, obtains final Fusion of Clustering as original fusion cluster result
As a result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710756125.0A CN107301604A (en) | 2017-08-29 | 2017-08-29 | Multi-model fusion estimation system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710756125.0A CN107301604A (en) | 2017-08-29 | 2017-08-29 | Multi-model fusion estimation system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107301604A true CN107301604A (en) | 2017-10-27 |
Family
ID=60132562
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710756125.0A Pending CN107301604A (en) | 2017-08-29 | 2017-08-29 | Multi-model fusion estimation system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107301604A (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108229850A (en) * | 2018-01-31 | 2018-06-29 | 厦门奥普拓自控科技有限公司 | City-level energy consumption, environment protection digital management method and system based on industrial production network |
CN109815989A (en) * | 2018-12-28 | 2019-05-28 | 重庆华龙强渝信用管理有限公司 | A kind of multi-model fusion estimation system |
CN110009030A (en) * | 2019-03-29 | 2019-07-12 | 华南理工大学 | Sewage treatment method for diagnosing faults based on stacking meta learning strategy |
CN110194041A (en) * | 2019-05-19 | 2019-09-03 | 瑞立集团瑞安汽车零部件有限公司 | The adaptive bodywork height adjusting method of Multi-source Information Fusion |
CN110322150A (en) * | 2019-07-04 | 2019-10-11 | 优估(上海)信息科技有限公司 | A kind of signal auditing method, device and server |
CN110378389A (en) * | 2019-06-24 | 2019-10-25 | 苏州浪潮智能科技有限公司 | A kind of Adaboost classifier calculated machine creating device |
CN113255778A (en) * | 2021-05-28 | 2021-08-13 | 广汽本田汽车有限公司 | Welding spot quality detection method and device based on multi-model fusion and storage medium |
CN113341704A (en) * | 2021-05-28 | 2021-09-03 | 北京理工大学 | Composite cycle energy conversion system |
CN113392642A (en) * | 2021-06-04 | 2021-09-14 | 北京师范大学 | System and method for automatically labeling child-bearing case based on meta-learning |
-
2017
- 2017-08-29 CN CN201710756125.0A patent/CN107301604A/en active Pending
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108229850A (en) * | 2018-01-31 | 2018-06-29 | 厦门奥普拓自控科技有限公司 | City-level energy consumption, environment protection digital management method and system based on industrial production network |
CN109815989A (en) * | 2018-12-28 | 2019-05-28 | 重庆华龙强渝信用管理有限公司 | A kind of multi-model fusion estimation system |
CN110009030A (en) * | 2019-03-29 | 2019-07-12 | 华南理工大学 | Sewage treatment method for diagnosing faults based on stacking meta learning strategy |
CN110009030B (en) * | 2019-03-29 | 2021-03-30 | 华南理工大学 | Sewage treatment fault diagnosis method based on stacking meta-learning strategy |
CN110194041A (en) * | 2019-05-19 | 2019-09-03 | 瑞立集团瑞安汽车零部件有限公司 | The adaptive bodywork height adjusting method of Multi-source Information Fusion |
CN110378389A (en) * | 2019-06-24 | 2019-10-25 | 苏州浪潮智能科技有限公司 | A kind of Adaboost classifier calculated machine creating device |
CN110322150A (en) * | 2019-07-04 | 2019-10-11 | 优估(上海)信息科技有限公司 | A kind of signal auditing method, device and server |
CN110322150B (en) * | 2019-07-04 | 2023-04-18 | 优估(上海)信息科技有限公司 | Information auditing method, device and server |
CN113255778A (en) * | 2021-05-28 | 2021-08-13 | 广汽本田汽车有限公司 | Welding spot quality detection method and device based on multi-model fusion and storage medium |
CN113341704A (en) * | 2021-05-28 | 2021-09-03 | 北京理工大学 | Composite cycle energy conversion system |
CN113392642A (en) * | 2021-06-04 | 2021-09-14 | 北京师范大学 | System and method for automatically labeling child-bearing case based on meta-learning |
CN113392642B (en) * | 2021-06-04 | 2023-06-02 | 北京师范大学 | Automatic labeling system and method for child care cases based on meta learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106845717A (en) | A kind of energy efficiency evaluation method based on multi-model convergence strategy | |
CN107301604A (en) | Multi-model fusion estimation system | |
CN111178624B (en) | New product demand prediction method | |
CN106528874B (en) | The CLR multi-tag data classification method of big data platform is calculated based on Spark memory | |
CN106022509A (en) | Power distribution network space load prediction method taking region and load property dual differences into consideration | |
CN109902953A (en) | A kind of classification of power customers method based on adaptive population cluster | |
CN103309953A (en) | Method for labeling and searching for diversified pictures based on integration of multiple RBFNN classifiers | |
CN111178611A (en) | Method for predicting daily electric quantity | |
CN109919236A (en) | A kind of BP neural network multi-tag classification method based on label correlation | |
CN112418485A (en) | Household load prediction method and system based on load characteristics and power consumption behavior mode | |
CN109816010A (en) | A kind of CART increment study classification method based on selective ensemble for flight delay prediction | |
Wang et al. | Design of the Sports Training Decision Support System Based on the Improved Association Rule, the Apriori Algorithm. | |
CN115131131A (en) | Credit risk assessment method for unbalanced data set multi-stage integration model | |
CN106097094A (en) | A kind of man-computer cooperation credit evaluation new model towards medium-sized and small enterprises | |
Qi et al. | An interval-valued data classification method based on the unified representation frame | |
CN112785156B (en) | Industrial collar and sleeve identification method based on clustering and comprehensive evaluation | |
CN104217296A (en) | Listed company performance comprehensive evaluation method | |
CN111737924B (en) | Method for selecting typical load characteristic transformer substation based on multi-source data | |
CN113591947A (en) | Power data clustering method and device based on power consumption behaviors and storage medium | |
CN113762703A (en) | Method and device for determining enterprise portrait, computing equipment and storage medium | |
CN117034046A (en) | Flexible load adjustable potential evaluation method based on ISODATA clustering | |
CN114372835B (en) | Comprehensive energy service potential customer identification method, system and computer equipment | |
Mao et al. | Naive Bayesian algorithm classification model with local attribute weighted based on KNN | |
CN109992592A (en) | Impoverished College Studentss recognition methods based on campus consumption card pipelined data | |
CN115829683A (en) | Power integration commodity recommendation method and system based on inverse reward learning optimization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20171027 |
|
WD01 | Invention patent application deemed withdrawn after publication |