CN105469219A - Method for processing power load data based on decision tree - Google Patents
Method for processing power load data based on decision tree Download PDFInfo
- Publication number
- CN105469219A CN105469219A CN201511021630.8A CN201511021630A CN105469219A CN 105469219 A CN105469219 A CN 105469219A CN 201511021630 A CN201511021630 A CN 201511021630A CN 105469219 A CN105469219 A CN 105469219A
- Authority
- CN
- China
- Prior art keywords
- sample
- data
- attribute
- value
- decision tree
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003066 decision tree Methods 0.000 title claims abstract description 20
- 238000000034 method Methods 0.000 title abstract description 20
- 238000012549 training Methods 0.000 claims description 10
- 238000004364 calculation method Methods 0.000 claims description 3
- 230000000295 complement effect Effects 0.000 claims description 2
- 238000004458 analytical method Methods 0.000 abstract description 5
- 238000004088 simulation Methods 0.000 abstract description 3
- 238000004422 calculation algorithm Methods 0.000 description 18
- 238000007418 data mining Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 6
- 238000012360 testing method Methods 0.000 description 3
- 241001269238 Data Species 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 230000002950 deficient Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0637—Strategic management or analysis, e.g. setting a goal or target of an organisation; Planning actions based on goals; Analysis or evaluation of effectiveness of goals
- G06Q10/06375—Prediction of business process outcome or impact based on a proposed change
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
Landscapes
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Engineering & Computer Science (AREA)
- Economics (AREA)
- Strategic Management (AREA)
- General Physics & Mathematics (AREA)
- General Business, Economics & Management (AREA)
- Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Marketing (AREA)
- Entrepreneurship & Innovation (AREA)
- Educational Administration (AREA)
- Tourism & Hospitality (AREA)
- Physics & Mathematics (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Development Economics (AREA)
- Game Theory and Decision Science (AREA)
- Public Health (AREA)
- Water Supply & Treatment (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a method for processing power load data based on a decision tree. According to the invention, missing attribute values are complemented by adopting the sample similarity principle, and are applied in the power load prediction, so that the accuracy of the historical load data and the precision of the power load prediction result are improved, the feasibility and accuracy of the method is verified through an example simulation analysis, and the method has a certain practical value.
Description
Technical field
The present invention proposes a kind of Power system load data disposal route based on decision tree, belongs to network load prediction field.
Background technology
Load forecast is a very important job in electric power dispatching system.Load prediction is predicted according to demand history data and other all kinds of Correlative Influence Factors.Therefore its precision of prediction depends on the accuracy of historical data to a great extent.
Existing Methods of electric load forecasting adopts data mining technology.Data mining technology uses under all known prerequisite determined of all properties value, and in a lot of situation, especially enterprise of big companies all can gather hundreds of millions of information datas every day, often there will be some property value Loss of some sample.Because property value and this sample are not associated, or record is not carried out to it during collecting sample, or be the mistake caused people during data inputting database, thus occur sample attribute value deficient phenomena.If the data with missing values removed from infosystem, not only can cause the excess waste of resource, also may lose and lie in wherein, lost, valuable information, thus the rule sought by obliterated data digging technology.But, incorrect process is carried out to attribute missing values and can bring new noise pollution, make data mining technology produce the result of mistake, analysis is had an impact.It is imperfect or inconsistent that data in real world often there will be data, and data contain noise situations, and data prediction can improve the quality of data, improves validity and the accuracy of data mining process.High-quality decision-making technique is from high-quality data.Therefore, how correctly to process missing data is very important problem in data mining technology preprocessing process, and be also the committed step of whole data mining and Knowledge Discovery, the analysis result more to final is most important.
" dividing and rule " method of decision tree to be developed by the J.R.Quinlan of University of Sydney, Australia and perfect.He in 1986 on machine learning magazine dispatch describe ID3 algorithm, this algorithm, based on information entropy theory, is the earliest and the most influential decision Tree algorithms at that time.This algorithm is the choice criteria using information gain as testing attribute, but tends to the attribute of many values due to information gain tolerance, the attribute that the more attribute of value is not necessarily best, so this algorithm exists certain deviation and mistaken ideas; The attribute with discrete value can only be processed, do not consider the missing value problem in training set, so ID3 algorithm is further improved.C4.5 algorithm is the improvement on ID3 algorithm basis, not only can process discrete value attribute, can also process Continuous valued attributes.C4.5 algorithm adopts information gain-ratio as the standard selecting testing attribute, and the computing method of information gain-ratio are as follows:
If S is a set comprising s data sample, category attribute can get n different value, just corresponds to the individual different classification C of n
i, i ∈ 1,2,3 ..., n}.Suppose s
ifor classification C
iin number of samples, the quantity of information needed for so will classifying to a data-oriented object is:
In formula, p
ithat any one data object belongs to classification C
iprobability, can by s
i/ s calculates; I (s
1, s
2..., s
n) be the quantity of information of sample, namely the information of sample attribute is expected.
If attribute A has m different value, be respectively a
1, a
2..., a
m, with attribute A, S can be divided into m subset, be respectively S
1, S
2..., S
m, wherein S
jcomprise attribute A in S set and get a
jthe data sample of value.If A is selected as testing attribute, if s
ijfor subset S
jin belong to C
isample number.By the information entropy of A dividing subset be then:
In formula, the information entropy that E (A) is subset, p
jas the weights of a jth subset, it gets a by attribute A in all subsets
jthe sample data sum of value is divided by the total sample number in S set.And for a given subset S
j, its value of information is:
In formula, p
ij=s
ij/ | S
j|, i.e. subset S
jin any one data sample belong to classification C
iprobability.Utilize attribute A to carry out to current branch node the information gain Gain (A) that sample set division obtains like this to be:
Gain(A)=I(S
1,S
2,...,S
n)-E(A)(4)
The computing formula of information gain-ratio is:
As can be seen here, what the information gain-ratio that C4.5 algorithm adopts represented is the ratio of the useful information produced by branch, and this value is larger, and the useful information that expression branch comprises is more.Although C4.5 algorithm is the improvement on ID3 algorithm, it is perfect not to the complementing method of missing attribute values.
Summary of the invention
Goal of the invention: the present invention proposes a kind of Power system load data disposal route based on decision tree, improves the accuracy of historical load data.
Technical scheme: the present invention proposes a kind of Power system load data disposal route based on decision tree, comprises the following steps:
1) determined value sample set is divided into the sample that attribute a certain in training set T has determined value;
2) similarity of missing values sample and determined value sample in calculation training collection T;
3) to have the sample attribute of the full missing values sample of determined value sample attribute value complement of maximum similarity with missing values sample.
Preferably, described similarity is:
In formula, s
ja jth sample in determined value sample set, s '
ii-th sample in missing values sample set, D (s '
i, s
j) be s
jwith s '
isimilarity; A represents all properties set in data training set, A
ij={ a ∈ A|a
i=a
jrepresent s
iand s
jidentical and the community set determined of value, | A| and | A
ij| represent the element number in corresponding set respectively, δ
ijfor weight coefficient.
Beneficial effect: the employing Sample Similarity principle that the present invention proposes carries out completion to missing attribute values, and apply it in load forecast, not only increase the accuracy of historical load data, also improve the precision of load forecast result, analyzed by Simulation Example, demonstrate feasibility and the accuracy of the method, there is certain practical value.
Embodiment
Below in conjunction with specific embodiment, illustrate the present invention further, these embodiments should be understood only be not used in for illustration of the present invention and limit the scope of the invention, after having read the present invention, the amendment of those skilled in the art to various equivalents of the present invention has all fallen within the application's claims limited range.
The present invention adopts Sample Similarity principle to carry out completion to attribute missing values, and the similarity size according to known sample data and disappearance sample data revises missing data, improves the accuracy of raw data, thus improves the precision of load forecast.
If A is a certain attribute of training set T, the value of A is: a
1, a
2..., a
m, definition s is determined value sample, and s ' is missing values sample.Subclass T ': T '={ s ∈ T|a is defined according to T
x≠ unknown number (x=1,2 ..., m) }, subclass T ' is expressed as attribute a
xall sample sets that value is determined.For the missing values sample s ' in data training set T and the similarity of the determined value sample s in subclass T ' be so:
In formula, D (s '
i, s
j) be the similarity with sample s; A represents all properties set in data training set, A
ij={ a ∈ A|a
i=a
jrepresent s
iand s
jidentical and the community set determined of value, | A| and | A
ij| represent the element number in corresponding set respectively, δ
ijfor weight coefficient.
With with s
jthere is the sample s in the subclass T ' of maximum similarity
jproperty value as s '
iproperty value, completion missing values, leaves out the s ' of other nodes in decision tree simultaneously
i, until the missing values of all data supplement complete till.
Above-mentioned missing values completion principle is only applicable to the less situation of shortage of data value, and when when the data in database are less, missing values is more, the method may make analysis result produce deviation.But, if there is more property value deletion condition in the database with mass data, such data have lost the meaning and value of research, and in actual conditions, in acquisition of information, generally there will not be this situation.
Finally provide an example, as shown in table 1 is the historical load data in Jiangsu Province's on March 1st, 2013 to March 14, by above-mentioned Sample Similarity principle, completion is carried out to missing attribute values, and then utilize decision tree C4.5 algorithm to form decision tree, thus future electrical energy load is predicted.Provide concrete data below as shown in table 1, in table 1 "? " place represents this shortage of data.
Table 1: historical load data
First, objective attribute target attribute and conditional attribute is determined.Due to given data in table only have temperature, relative humidity, day type and load data, so rule of thumb can by the temperature in data, relative humidity and day type attribute be decided to be conditional attribute, load attribute is decided to be objective attribute target attribute.
Although a day type attribute is not continuous data, decision Tree algorithms can not identify this property value, must change it, and the attribute converting decision tree identification to could use.This paper numerical value 1,2,3,4,5,6,7 replaces Monday, Tu., Wednesday, Thursday, Friday, Saturday, Sun. respectively, so just the property value that decision tree can not identify is converted to the property value that can identify.
Secondly, from data in table, temperature, humidity and load data property value are continuous data, wherein temperature and relative humidity can directly apply in algorithm, because decision tree C4.5 algorithm can process continuous type property value, but load data is objective attribute target attribute, algorithm can not directly process, so need to carry out discretize to load data.Load is on average divided into four classes by the present invention, load data in example is all interval [42833,545412] in, so be four parts by interval division, i.e. four types: [42833,45760], [45760,48687], [48687,51614], [51614,54542], the present invention respectively with 1,2,3,4 replace these four types.
Finally, according to above-mentioned formula and method, MATLAB software is utilized to carry out programming simulation to decision tree C4.5 algorithm, by in the data substitution program after process, decision tree is obtained according to interpretation of result, according to decision tree formation rule, utilize these rules just can economize on March 15th, 2013 and carry out forecast analysis to the load on March 28 this.
Claims (2)
1., based on a Power system load data disposal route for decision tree, it is characterized in that, comprise the following steps:
1) determined value sample set is divided into the sample that attribute a certain in training set T has determined value;
2) similarity of missing values sample and determined value sample in calculation training collection T;
3) to have the sample attribute of the full missing values sample of determined value sample attribute value complement of maximum similarity with missing values sample.
2. the Power system load data disposal route based on decision tree according to claim 1, it is characterized in that, described similarity is:
In formula, s
ja jth sample in determined value sample set, s '
ii-th sample in missing values sample set, D (s '
i, s
j) be s
jwith s '
isimilarity; A represents all properties set in data training set, A
ij={ a ∈ A|a
i=a
jrepresent s
iand s
jidentical and the community set determined of value, | A| and | A
ij| represent the element number in corresponding set respectively, δ
ijfor weight coefficient.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201511021630.8A CN105469219A (en) | 2015-12-31 | 2015-12-31 | Method for processing power load data based on decision tree |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201511021630.8A CN105469219A (en) | 2015-12-31 | 2015-12-31 | Method for processing power load data based on decision tree |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105469219A true CN105469219A (en) | 2016-04-06 |
Family
ID=55606887
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201511021630.8A Pending CN105469219A (en) | 2015-12-31 | 2015-12-31 | Method for processing power load data based on decision tree |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105469219A (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106844781A (en) * | 2017-03-10 | 2017-06-13 | 广州视源电子科技股份有限公司 | Data processing method and device |
CN107220734A (en) * | 2017-06-26 | 2017-09-29 | 江南大学 | CNC Lathe Turning process Energy Consumption Prediction System based on decision tree |
CN108011367A (en) * | 2017-12-04 | 2018-05-08 | 贵州电网有限责任公司电力科学研究院 | A kind of Characteristics of Electric Load method for digging based on depth decision Tree algorithms |
CN108062560A (en) * | 2017-12-04 | 2018-05-22 | 贵州电网有限责任公司电力科学研究院 | A kind of power consumer feature recognition sorting technique based on random forest |
CN108539738A (en) * | 2018-05-10 | 2018-09-14 | 国网山东省电力公司电力科学研究院 | A kind of short-term load forecasting method promoting decision tree based on gradient |
CN109242174A (en) * | 2018-08-27 | 2019-01-18 | 广东工业大学 | A kind of adaptive division methods of seaonal load based on decision tree |
CN109446730A (en) * | 2018-12-05 | 2019-03-08 | 新奥数能科技有限公司 | Generating set rate of load condensate missing value complement based on short-term equipment operating data recruits method |
CN112613584A (en) * | 2021-01-07 | 2021-04-06 | 国网上海市电力公司 | Fault diagnosis method, device, equipment and storage medium |
CN115048815A (en) * | 2022-08-11 | 2022-09-13 | 广州海颐软件有限公司 | Database-based intelligent simulation management system and method for power service |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103559199A (en) * | 2013-09-29 | 2014-02-05 | 北京航空航天大学 | Web information extraction method and web information extraction device |
CN104318332A (en) * | 2014-10-29 | 2015-01-28 | 国家电网公司 | Power load predicting method and device |
-
2015
- 2015-12-31 CN CN201511021630.8A patent/CN105469219A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103559199A (en) * | 2013-09-29 | 2014-02-05 | 北京航空航天大学 | Web information extraction method and web information extraction device |
CN104318332A (en) * | 2014-10-29 | 2015-01-28 | 国家电网公司 | Power load predicting method and device |
Non-Patent Citations (2)
Title |
---|
林震 等: "基于决策树的数据挖掘算法优化研究", 《现代计算机(专业版)》 * |
郭景峰 等: "基于决策树的数据遗失值填充方法的研究", 《计算机工程与科学》 * |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106844781B (en) * | 2017-03-10 | 2020-04-21 | 广州视源电子科技股份有限公司 | Data processing method and device |
CN106844781A (en) * | 2017-03-10 | 2017-06-13 | 广州视源电子科技股份有限公司 | Data processing method and device |
CN107220734A (en) * | 2017-06-26 | 2017-09-29 | 江南大学 | CNC Lathe Turning process Energy Consumption Prediction System based on decision tree |
WO2019001220A1 (en) * | 2017-06-26 | 2019-01-03 | 江南大学 | Decision tree based turning process energy consumption prediction system and method of numerically controlled lathe |
CN107220734B (en) * | 2017-06-26 | 2020-05-12 | 江南大学 | Numerical control lathe turning process energy consumption prediction system based on decision tree |
CN108011367A (en) * | 2017-12-04 | 2018-05-08 | 贵州电网有限责任公司电力科学研究院 | A kind of Characteristics of Electric Load method for digging based on depth decision Tree algorithms |
CN108062560A (en) * | 2017-12-04 | 2018-05-22 | 贵州电网有限责任公司电力科学研究院 | A kind of power consumer feature recognition sorting technique based on random forest |
CN108011367B (en) * | 2017-12-04 | 2020-12-18 | 贵州电网有限责任公司电力科学研究院 | Power load characteristic mining method based on depth decision tree algorithm |
CN108539738B (en) * | 2018-05-10 | 2020-04-21 | 国网山东省电力公司电力科学研究院 | Short-term load prediction method based on gradient lifting decision tree |
CN108539738A (en) * | 2018-05-10 | 2018-09-14 | 国网山东省电力公司电力科学研究院 | A kind of short-term load forecasting method promoting decision tree based on gradient |
CN109242174A (en) * | 2018-08-27 | 2019-01-18 | 广东工业大学 | A kind of adaptive division methods of seaonal load based on decision tree |
CN109446730A (en) * | 2018-12-05 | 2019-03-08 | 新奥数能科技有限公司 | Generating set rate of load condensate missing value complement based on short-term equipment operating data recruits method |
CN109446730B (en) * | 2018-12-05 | 2022-11-29 | 新奥数能科技有限公司 | Short-term equipment operation data-based generator set load factor missing value recruitment method |
CN112613584A (en) * | 2021-01-07 | 2021-04-06 | 国网上海市电力公司 | Fault diagnosis method, device, equipment and storage medium |
CN115048815A (en) * | 2022-08-11 | 2022-09-13 | 广州海颐软件有限公司 | Database-based intelligent simulation management system and method for power service |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105469219A (en) | Method for processing power load data based on decision tree | |
CN110634080B (en) | Abnormal electricity utilization detection method, device, equipment and computer readable storage medium | |
CN106845717B (en) | Energy efficiency evaluation method based on multi-model fusion strategy | |
CN111160401B (en) | Abnormal electricity utilization discriminating method based on mean shift and XGBoost | |
CN104200288B (en) | A kind of equipment fault Forecasting Methodology based on dependency relation identification between factor and event | |
Ma et al. | A method for multiple periodic factor prediction problems using complex fuzzy sets | |
CN108985380B (en) | Point switch fault identification method based on cluster integration | |
CN110503245B (en) | Prediction method for large-area delay risk of airport flight | |
CN110610121B (en) | Small-scale source load power abnormal data identification and restoration method based on curve clustering | |
CN108694470B (en) | Data prediction method and device based on artificial intelligence | |
CN111160626B (en) | Power load time sequence control method based on decomposition fusion | |
CN115270965A (en) | Power distribution network line fault prediction method and device | |
CN106600037B (en) | Multi-parameter auxiliary load prediction method based on principal component analysis | |
CN101826090A (en) | WEB public opinion trend forecasting method based on optimal model | |
CN102324038A (en) | A kind of floristics recognition methods based on digital picture | |
CN104408667A (en) | Method and system for comprehensively evaluating power quality | |
CN111178585A (en) | Fault reporting amount prediction method based on multi-algorithm model fusion | |
CN115860797B (en) | Electric quantity demand prediction method suitable for new electricity price reform situation | |
CN105678406A (en) | Short-term load prediction method based on cloud model | |
CN111415049A (en) | Power failure sensitivity analysis method based on neural network and clustering | |
CN117235647B (en) | Mineral resource investigation business HSE data management method based on edge calculation | |
Kumar Srivastava et al. | Short term load forecasting using regression trees: Random forest, bagging and m5p | |
CN110490220A (en) | A kind of bus load discrimination method and system | |
CN104915727A (en) | Multi-dimensional isomorphic heterogeneous BP neural network optical power ultrashort-term prediction method | |
Ullah et al. | Adaptive data balancing method using stacking ensemble model and its application to non-technical loss detection in smart grids |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20160406 |
|
RJ01 | Rejection of invention patent application after publication |