CN105469219A - Method for processing power load data based on decision tree - Google Patents

Method for processing power load data based on decision tree Download PDF

Info

Publication number
CN105469219A
CN105469219A CN201511021630.8A CN201511021630A CN105469219A CN 105469219 A CN105469219 A CN 105469219A CN 201511021630 A CN201511021630 A CN 201511021630A CN 105469219 A CN105469219 A CN 105469219A
Authority
CN
China
Prior art keywords
sample
data
attribute
value
decision tree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201511021630.8A
Other languages
Chinese (zh)
Inventor
沈培锋
余昆
宁艺飞
陈星莺
嵇文路
周冬旭
王春宁
罗兴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
State Grid Jiangsu Electric Power Co Ltd
Hohai University HHU
Nanjing Power Supply Co of Jiangsu Electric Power Co
Original Assignee
State Grid Corp of China SGCC
State Grid Jiangsu Electric Power Co Ltd
Hohai University HHU
Nanjing Power Supply Co of Jiangsu Electric Power Co
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, State Grid Jiangsu Electric Power Co Ltd, Hohai University HHU, Nanjing Power Supply Co of Jiangsu Electric Power Co filed Critical State Grid Corp of China SGCC
Priority to CN201511021630.8A priority Critical patent/CN105469219A/en
Publication of CN105469219A publication Critical patent/CN105469219A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0637Strategic management or analysis, e.g. setting a goal or target of an organisation; Planning actions based on goals; Analysis or evaluation of effectiveness of goals
    • G06Q10/06375Prediction of business process outcome or impact based on a proposed change
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Engineering & Computer Science (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Marketing (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Educational Administration (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Development Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method for processing power load data based on a decision tree. According to the invention, missing attribute values are complemented by adopting the sample similarity principle, and are applied in the power load prediction, so that the accuracy of the historical load data and the precision of the power load prediction result are improved, the feasibility and accuracy of the method is verified through an example simulation analysis, and the method has a certain practical value.

Description

A kind of Power system load data disposal route based on decision tree
Technical field
The present invention proposes a kind of Power system load data disposal route based on decision tree, belongs to network load prediction field.
Background technology
Load forecast is a very important job in electric power dispatching system.Load prediction is predicted according to demand history data and other all kinds of Correlative Influence Factors.Therefore its precision of prediction depends on the accuracy of historical data to a great extent.
Existing Methods of electric load forecasting adopts data mining technology.Data mining technology uses under all known prerequisite determined of all properties value, and in a lot of situation, especially enterprise of big companies all can gather hundreds of millions of information datas every day, often there will be some property value Loss of some sample.Because property value and this sample are not associated, or record is not carried out to it during collecting sample, or be the mistake caused people during data inputting database, thus occur sample attribute value deficient phenomena.If the data with missing values removed from infosystem, not only can cause the excess waste of resource, also may lose and lie in wherein, lost, valuable information, thus the rule sought by obliterated data digging technology.But, incorrect process is carried out to attribute missing values and can bring new noise pollution, make data mining technology produce the result of mistake, analysis is had an impact.It is imperfect or inconsistent that data in real world often there will be data, and data contain noise situations, and data prediction can improve the quality of data, improves validity and the accuracy of data mining process.High-quality decision-making technique is from high-quality data.Therefore, how correctly to process missing data is very important problem in data mining technology preprocessing process, and be also the committed step of whole data mining and Knowledge Discovery, the analysis result more to final is most important.
" dividing and rule " method of decision tree to be developed by the J.R.Quinlan of University of Sydney, Australia and perfect.He in 1986 on machine learning magazine dispatch describe ID3 algorithm, this algorithm, based on information entropy theory, is the earliest and the most influential decision Tree algorithms at that time.This algorithm is the choice criteria using information gain as testing attribute, but tends to the attribute of many values due to information gain tolerance, the attribute that the more attribute of value is not necessarily best, so this algorithm exists certain deviation and mistaken ideas; The attribute with discrete value can only be processed, do not consider the missing value problem in training set, so ID3 algorithm is further improved.C4.5 algorithm is the improvement on ID3 algorithm basis, not only can process discrete value attribute, can also process Continuous valued attributes.C4.5 algorithm adopts information gain-ratio as the standard selecting testing attribute, and the computing method of information gain-ratio are as follows:
If S is a set comprising s data sample, category attribute can get n different value, just corresponds to the individual different classification C of n i, i ∈ 1,2,3 ..., n}.Suppose s ifor classification C iin number of samples, the quantity of information needed for so will classifying to a data-oriented object is:
I ( s 1 , s 2 , ... , s n ) = - Σ i = 1 n p i log 2 p i - - - ( 1 )
In formula, p ithat any one data object belongs to classification C iprobability, can by s i/ s calculates; I (s 1, s 2..., s n) be the quantity of information of sample, namely the information of sample attribute is expected.
If attribute A has m different value, be respectively a 1, a 2..., a m, with attribute A, S can be divided into m subset, be respectively S 1, S 2..., S m, wherein S jcomprise attribute A in S set and get a jthe data sample of value.If A is selected as testing attribute, if s ijfor subset S jin belong to C isample number.By the information entropy of A dividing subset be then:
E ( A ) = Σ j = 1 m p j I ( s 1 j , ... , s m j ) - - - ( 2 )
In formula, the information entropy that E (A) is subset, p jas the weights of a jth subset, it gets a by attribute A in all subsets jthe sample data sum of value is divided by the total sample number in S set.And for a given subset S j, its value of information is:
I ( s 1 j , s 2 j , ... , s n j ) = - Σ i = 1 n p i j log 2 p i j - - - ( 3 )
In formula, p ij=s ij/ | S j|, i.e. subset S jin any one data sample belong to classification C iprobability.Utilize attribute A to carry out to current branch node the information gain Gain (A) that sample set division obtains like this to be:
Gain(A)=I(S 1,S 2,...,S n)-E(A)(4)
The computing formula of information gain-ratio is:
G a i n R a t i o n ( A ) = G a i n ( A ) I ( A ) - - - ( 5 )
As can be seen here, what the information gain-ratio that C4.5 algorithm adopts represented is the ratio of the useful information produced by branch, and this value is larger, and the useful information that expression branch comprises is more.Although C4.5 algorithm is the improvement on ID3 algorithm, it is perfect not to the complementing method of missing attribute values.
Summary of the invention
Goal of the invention: the present invention proposes a kind of Power system load data disposal route based on decision tree, improves the accuracy of historical load data.
Technical scheme: the present invention proposes a kind of Power system load data disposal route based on decision tree, comprises the following steps:
1) determined value sample set is divided into the sample that attribute a certain in training set T has determined value;
2) similarity of missing values sample and determined value sample in calculation training collection T;
3) to have the sample attribute of the full missing values sample of determined value sample attribute value complement of maximum similarity with missing values sample.
Preferably, described similarity is:
D ( s i ′ , s j ) = | A i j | | A | + δ i j
δ i j = 1 , d ( s i ) = d ( s j ) 0 , d ( s i ) ≠ d ( s j )
In formula, s ja jth sample in determined value sample set, s ' ii-th sample in missing values sample set, D (s ' i, s j) be s jwith s ' isimilarity; A represents all properties set in data training set, A ij={ a ∈ A|a i=a jrepresent s iand s jidentical and the community set determined of value, | A| and | A ij| represent the element number in corresponding set respectively, δ ijfor weight coefficient.
Beneficial effect: the employing Sample Similarity principle that the present invention proposes carries out completion to missing attribute values, and apply it in load forecast, not only increase the accuracy of historical load data, also improve the precision of load forecast result, analyzed by Simulation Example, demonstrate feasibility and the accuracy of the method, there is certain practical value.
Embodiment
Below in conjunction with specific embodiment, illustrate the present invention further, these embodiments should be understood only be not used in for illustration of the present invention and limit the scope of the invention, after having read the present invention, the amendment of those skilled in the art to various equivalents of the present invention has all fallen within the application's claims limited range.
The present invention adopts Sample Similarity principle to carry out completion to attribute missing values, and the similarity size according to known sample data and disappearance sample data revises missing data, improves the accuracy of raw data, thus improves the precision of load forecast.
If A is a certain attribute of training set T, the value of A is: a 1, a 2..., a m, definition s is determined value sample, and s ' is missing values sample.Subclass T ': T '={ s ∈ T|a is defined according to T x≠ unknown number (x=1,2 ..., m) }, subclass T ' is expressed as attribute a xall sample sets that value is determined.For the missing values sample s ' in data training set T and the similarity of the determined value sample s in subclass T ' be so:
D ( s i ′ , s j ) = | A i j | | A | + δ i j
δ i j = 1 , d ( s i ) = d ( s j ) 0 , d ( s i ) ≠ d ( s j ) - - - ( 6 )
In formula, D (s ' i, s j) be the similarity with sample s; A represents all properties set in data training set, A ij={ a ∈ A|a i=a jrepresent s iand s jidentical and the community set determined of value, | A| and | A ij| represent the element number in corresponding set respectively, δ ijfor weight coefficient.
With with s jthere is the sample s in the subclass T ' of maximum similarity jproperty value as s ' iproperty value, completion missing values, leaves out the s ' of other nodes in decision tree simultaneously i, until the missing values of all data supplement complete till.
Above-mentioned missing values completion principle is only applicable to the less situation of shortage of data value, and when when the data in database are less, missing values is more, the method may make analysis result produce deviation.But, if there is more property value deletion condition in the database with mass data, such data have lost the meaning and value of research, and in actual conditions, in acquisition of information, generally there will not be this situation.
Finally provide an example, as shown in table 1 is the historical load data in Jiangsu Province's on March 1st, 2013 to March 14, by above-mentioned Sample Similarity principle, completion is carried out to missing attribute values, and then utilize decision tree C4.5 algorithm to form decision tree, thus future electrical energy load is predicted.Provide concrete data below as shown in table 1, in table 1 "? " place represents this shortage of data.
Table 1: historical load data
First, objective attribute target attribute and conditional attribute is determined.Due to given data in table only have temperature, relative humidity, day type and load data, so rule of thumb can by the temperature in data, relative humidity and day type attribute be decided to be conditional attribute, load attribute is decided to be objective attribute target attribute.
Although a day type attribute is not continuous data, decision Tree algorithms can not identify this property value, must change it, and the attribute converting decision tree identification to could use.This paper numerical value 1,2,3,4,5,6,7 replaces Monday, Tu., Wednesday, Thursday, Friday, Saturday, Sun. respectively, so just the property value that decision tree can not identify is converted to the property value that can identify.
Secondly, from data in table, temperature, humidity and load data property value are continuous data, wherein temperature and relative humidity can directly apply in algorithm, because decision tree C4.5 algorithm can process continuous type property value, but load data is objective attribute target attribute, algorithm can not directly process, so need to carry out discretize to load data.Load is on average divided into four classes by the present invention, load data in example is all interval [42833,545412] in, so be four parts by interval division, i.e. four types: [42833,45760], [45760,48687], [48687,51614], [51614,54542], the present invention respectively with 1,2,3,4 replace these four types.
Finally, according to above-mentioned formula and method, MATLAB software is utilized to carry out programming simulation to decision tree C4.5 algorithm, by in the data substitution program after process, decision tree is obtained according to interpretation of result, according to decision tree formation rule, utilize these rules just can economize on March 15th, 2013 and carry out forecast analysis to the load on March 28 this.

Claims (2)

1., based on a Power system load data disposal route for decision tree, it is characterized in that, comprise the following steps:
1) determined value sample set is divided into the sample that attribute a certain in training set T has determined value;
2) similarity of missing values sample and determined value sample in calculation training collection T;
3) to have the sample attribute of the full missing values sample of determined value sample attribute value complement of maximum similarity with missing values sample.
2. the Power system load data disposal route based on decision tree according to claim 1, it is characterized in that, described similarity is:
D ( s i ′ , s j ) = | A i j | | A | + δ i j
δ i j = 1 , d ( s i ) = d ( s j ) 0 , d ( s i ) ≠ d ( s j )
In formula, s ja jth sample in determined value sample set, s ' ii-th sample in missing values sample set, D (s ' i, s j) be s jwith s ' isimilarity; A represents all properties set in data training set, A ij={ a ∈ A|a i=a jrepresent s iand s jidentical and the community set determined of value, | A| and | A ij| represent the element number in corresponding set respectively, δ ijfor weight coefficient.
CN201511021630.8A 2015-12-31 2015-12-31 Method for processing power load data based on decision tree Pending CN105469219A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201511021630.8A CN105469219A (en) 2015-12-31 2015-12-31 Method for processing power load data based on decision tree

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201511021630.8A CN105469219A (en) 2015-12-31 2015-12-31 Method for processing power load data based on decision tree

Publications (1)

Publication Number Publication Date
CN105469219A true CN105469219A (en) 2016-04-06

Family

ID=55606887

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201511021630.8A Pending CN105469219A (en) 2015-12-31 2015-12-31 Method for processing power load data based on decision tree

Country Status (1)

Country Link
CN (1) CN105469219A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106844781A (en) * 2017-03-10 2017-06-13 广州视源电子科技股份有限公司 Data processing method and device
CN107220734A (en) * 2017-06-26 2017-09-29 江南大学 CNC Lathe Turning process Energy Consumption Prediction System based on decision tree
CN108011367A (en) * 2017-12-04 2018-05-08 贵州电网有限责任公司电力科学研究院 A kind of Characteristics of Electric Load method for digging based on depth decision Tree algorithms
CN108062560A (en) * 2017-12-04 2018-05-22 贵州电网有限责任公司电力科学研究院 A kind of power consumer feature recognition sorting technique based on random forest
CN108539738A (en) * 2018-05-10 2018-09-14 国网山东省电力公司电力科学研究院 A kind of short-term load forecasting method promoting decision tree based on gradient
CN109242174A (en) * 2018-08-27 2019-01-18 广东工业大学 A kind of adaptive division methods of seaonal load based on decision tree
CN109446730A (en) * 2018-12-05 2019-03-08 新奥数能科技有限公司 Generating set rate of load condensate missing value complement based on short-term equipment operating data recruits method
CN112613584A (en) * 2021-01-07 2021-04-06 国网上海市电力公司 Fault diagnosis method, device, equipment and storage medium
CN115048815A (en) * 2022-08-11 2022-09-13 广州海颐软件有限公司 Database-based intelligent simulation management system and method for power service

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103559199A (en) * 2013-09-29 2014-02-05 北京航空航天大学 Web information extraction method and web information extraction device
CN104318332A (en) * 2014-10-29 2015-01-28 国家电网公司 Power load predicting method and device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103559199A (en) * 2013-09-29 2014-02-05 北京航空航天大学 Web information extraction method and web information extraction device
CN104318332A (en) * 2014-10-29 2015-01-28 国家电网公司 Power load predicting method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
林震 等: "基于决策树的数据挖掘算法优化研究", 《现代计算机(专业版)》 *
郭景峰 等: "基于决策树的数据遗失值填充方法的研究", 《计算机工程与科学》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106844781B (en) * 2017-03-10 2020-04-21 广州视源电子科技股份有限公司 Data processing method and device
CN106844781A (en) * 2017-03-10 2017-06-13 广州视源电子科技股份有限公司 Data processing method and device
CN107220734A (en) * 2017-06-26 2017-09-29 江南大学 CNC Lathe Turning process Energy Consumption Prediction System based on decision tree
WO2019001220A1 (en) * 2017-06-26 2019-01-03 江南大学 Decision tree based turning process energy consumption prediction system and method of numerically controlled lathe
CN107220734B (en) * 2017-06-26 2020-05-12 江南大学 Numerical control lathe turning process energy consumption prediction system based on decision tree
CN108011367A (en) * 2017-12-04 2018-05-08 贵州电网有限责任公司电力科学研究院 A kind of Characteristics of Electric Load method for digging based on depth decision Tree algorithms
CN108062560A (en) * 2017-12-04 2018-05-22 贵州电网有限责任公司电力科学研究院 A kind of power consumer feature recognition sorting technique based on random forest
CN108011367B (en) * 2017-12-04 2020-12-18 贵州电网有限责任公司电力科学研究院 Power load characteristic mining method based on depth decision tree algorithm
CN108539738B (en) * 2018-05-10 2020-04-21 国网山东省电力公司电力科学研究院 Short-term load prediction method based on gradient lifting decision tree
CN108539738A (en) * 2018-05-10 2018-09-14 国网山东省电力公司电力科学研究院 A kind of short-term load forecasting method promoting decision tree based on gradient
CN109242174A (en) * 2018-08-27 2019-01-18 广东工业大学 A kind of adaptive division methods of seaonal load based on decision tree
CN109446730A (en) * 2018-12-05 2019-03-08 新奥数能科技有限公司 Generating set rate of load condensate missing value complement based on short-term equipment operating data recruits method
CN109446730B (en) * 2018-12-05 2022-11-29 新奥数能科技有限公司 Short-term equipment operation data-based generator set load factor missing value recruitment method
CN112613584A (en) * 2021-01-07 2021-04-06 国网上海市电力公司 Fault diagnosis method, device, equipment and storage medium
CN115048815A (en) * 2022-08-11 2022-09-13 广州海颐软件有限公司 Database-based intelligent simulation management system and method for power service

Similar Documents

Publication Publication Date Title
CN105469219A (en) Method for processing power load data based on decision tree
CN110634080B (en) Abnormal electricity utilization detection method, device, equipment and computer readable storage medium
CN106845717B (en) Energy efficiency evaluation method based on multi-model fusion strategy
CN111160401B (en) Abnormal electricity utilization discriminating method based on mean shift and XGBoost
CN104200288B (en) A kind of equipment fault Forecasting Methodology based on dependency relation identification between factor and event
Ma et al. A method for multiple periodic factor prediction problems using complex fuzzy sets
CN108985380B (en) Point switch fault identification method based on cluster integration
CN110503245B (en) Prediction method for large-area delay risk of airport flight
CN110610121B (en) Small-scale source load power abnormal data identification and restoration method based on curve clustering
CN108694470B (en) Data prediction method and device based on artificial intelligence
CN111160626B (en) Power load time sequence control method based on decomposition fusion
CN115270965A (en) Power distribution network line fault prediction method and device
CN106600037B (en) Multi-parameter auxiliary load prediction method based on principal component analysis
CN101826090A (en) WEB public opinion trend forecasting method based on optimal model
CN102324038A (en) A kind of floristics recognition methods based on digital picture
CN104408667A (en) Method and system for comprehensively evaluating power quality
CN111178585A (en) Fault reporting amount prediction method based on multi-algorithm model fusion
CN115860797B (en) Electric quantity demand prediction method suitable for new electricity price reform situation
CN105678406A (en) Short-term load prediction method based on cloud model
CN111415049A (en) Power failure sensitivity analysis method based on neural network and clustering
CN117235647B (en) Mineral resource investigation business HSE data management method based on edge calculation
Kumar Srivastava et al. Short term load forecasting using regression trees: Random forest, bagging and m5p
CN110490220A (en) A kind of bus load discrimination method and system
CN104915727A (en) Multi-dimensional isomorphic heterogeneous BP neural network optical power ultrashort-term prediction method
Ullah et al. Adaptive data balancing method using stacking ensemble model and its application to non-technical loss detection in smart grids

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20160406

RJ01 Rejection of invention patent application after publication