CN109948913A - A kind of multi-source feature power consumer composite portrait system based on double-deck xgboost algorithm - Google Patents

A kind of multi-source feature power consumer composite portrait system based on double-deck xgboost algorithm Download PDF

Info

Publication number
CN109948913A
CN109948913A CN201910154105.5A CN201910154105A CN109948913A CN 109948913 A CN109948913 A CN 109948913A CN 201910154105 A CN201910154105 A CN 201910154105A CN 109948913 A CN109948913 A CN 109948913A
Authority
CN
China
Prior art keywords
feature
deck
power consumer
double
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910154105.5A
Other languages
Chinese (zh)
Inventor
颜宏文
何子佳
马瑞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changsha University of Science and Technology
Original Assignee
Changsha University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changsha University of Science and Technology filed Critical Changsha University of Science and Technology
Priority to CN201910154105.5A priority Critical patent/CN109948913A/en
Publication of CN109948913A publication Critical patent/CN109948913A/en
Pending legal-status Critical Current

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses the determination method and frame of a kind of multi-source feature power consumer portrait based on double-deck xgboost algorithm, user is laterally constructed label system from two different groups of individual and enterprise by this method and frame respectively, from three essential attribute, behavioural habits, cognitive characteristics directions, transverse and longitudinal intersection is common to construct multi-source label system for longitudinal direction.Simultaneously, to efficiently use multi-source feature, solve the problems, such as higher-dimension, multi-angle of view, which is constituted, from six angles such as category feature, cluster feature, time scale, text feature, statistical nature, numerical characteristics merges frame, model is built based on double-deck xgboost algorithm, it is final to determine power consumer composite portrait.

Description

A kind of multi-source feature power consumer composite portrait based on double-deck xgboost algorithm System
Technical field
This application involves technical field of electric power, more specifically to a kind of multi-source based on double-deck xgboost algorithm Feature power consumer composite portrait determines method and system.
Background technique
User's portrait is the important application of big data technology, and target is that retouching for user is established in many dimensions The property stated tag attributes, to be sketched the contours using these tag attributes true personal characteristics various to user, to utilize User, which draws a portrait, excavates user demand, analyzes user preference, and it is more efficient and more have needle by matching user's portrait to be supplied to user Information conveyance to property and the user experience closer to personal habits.
With the deep propulsion of informatization and power system reform, requirement of the user to power quality is higher and higher. In big data widely applied today, as data-intensive enterprise, power grid enterprises can make full use of its electric power big data to carry out Power consumer based on data mining, which is drawn a portrait, to be studied, and the data value of power industry is played, and is formulated differentiation and is mutually tied with precision The marketing strategy of conjunction also can be in the power sales after sale of electricity side is decontroled while adapting to growing user demand variation In possess first-strike advantage.
But the variability of electricity consumption user behavior habit, prevent power grid enterprises from recognizing user in real time.Electricity consumption user Constantly changing, enterprise is also required to change service plan and coughs up, it is thus understood that the real demand of user.Enterprise needs to carry out user thin Point, differentiated service is carried out to different types of user.Power grid enterprises need to draw a portrait by user, comprehensively understand user, look for To target user, to realize precision marketing.
Existing user classification method there are the problem of:
(1) label closing and not comprehensively, so-called closing refers to that number of tags is limited and fixed;
(2) algorithm used is dull and accuracy and generalization ability be not high.
(3) user, which draws a portrait, updates lag, cannot reflect the latest tendency of user in real time.
Summary of the invention
One, the foundation of multi-angle of view fusion frame
To efficiently use multi-source feature, higher-dimension is solved the problems, such as, from category feature, cluster feature, time scale, text spy Six angles such as sign, statistical nature, numerical characteristics constitute multi-angle of view and merge frame.
1, category feature
From the point of view of electricity consumption group, personal electricity consumption and business electrical can be divided into, personal is divided into householder's electricity consumption and tenant Electricity consumption, business electrical can classify from different industries and property, such as industry, agricultural, lodging catering trade, Software Industry, public Infrastructure electricity consumption, hospital, government, school etc..The type of service of user's consultings different simultaneously, which reflects different users, to be needed It asks, some users report failure for repairment, and some is then more concerned with electricity charge electricity.Based on the above analysis, the application to electricity consumption type, The classification fields such as town and country classification do following processing: field same for each user corresponds to multiple classification values, in order to sufficiently sharp With the information expressed between multiple values, the application carries out character representation to it with bag-of-categoties model.bag- Of-categoties model is improved on the basis of bag-of-words model, in addition to whether considering each sample There is classification X (has_x) outside, it is also contemplated that each sample corresponds to the number (count_x) and ratio (ratio_x) of classification X.
2, cluster coding
Part code field value is too many, such as power supply unit coding just has hundreds of value.It is found by statistical observation, There is fixed coding rule in these fields, can be clustered to coding using these rules, according still further to category feature to it It is handled, to realize Feature Dimension Reduction.
3, time scale
Since Various Seasonal, the power demand of different time are different, electricity charge sensitivity use in different time periods There is also differences for amount amount.In view of the importance of time factor, constructed on time dimension largely contain abundant information and Effective feature.For electricity consumption user, construct hour, day, 3 kinds of granularities of the moon category feature, and whether be the first tenday period of a month, in 3 binary features in ten days, the last ten-days period, and need to consider the connection between multiple times, the feature of construction includes the moon, day, hour 3 Cluster feature, statistical nature and the numerical characteristics of kind granularity.
4, text feature
When constructing demand susceptibility, power failure susceptibility, electricity charge susceptibility user tag, work order is accepted, 95598 calls are remembered The content for recording content record user's narration, reflects the true demand of user, direct embodiment whether being user's sensitivity.Its base This format is " [demand] main contents ", such as " [troublshooting] failure-description/user's requirement/user emotion ", " [timesharing electricity Valence] user consulting whether opened tou power price/tou power price policy/tou power price range and condition " etc..We are to each use The content of text at family is extracted 3 kinds of text features, the tf-idf feature including unigram+bigrame, text size, word altogether The statistics measure feature such as number and demand score feature.Demand score feature reflects the sensitivity of demand (content i.e. in " [] ") Degree.Calculation:
Wherein, needs_num_positive (n) indicates the number that demand n occurs in positive example, needs_num_sum (n) indicate that the total degree that demand n occurs, α indicate smoothing factor.
5, statistical nature
In order to preferably portray user's portrait, the application fully considers electrical network business needs, and derivative based on original field A series of statistical nature out.For example, in 95598 message registrations, between counting user first record and the last item record Every number of days, user's communication record quantity;In electricity charge information, quantity, each user that receivable electricity charge information records are counted Months, receivable penalty and the difference of paid penalty etc. for thering is payment to record;In power information, annual zero electricity of counting user Measure total number of days, the total number of days of big electricity, annual continuous longest zero power number of days, the annual electricity number of days, the low electricity in platform area of continuously most growing up Press number of days etc..
6, numerical characteristics
For numeric type data, some common statistics often can reflect the deep information of data, such as standard deviation reflection The dispersion degree of data, median reflect the central tendency etc. of data distribution.Therefore, multiple numeric types corresponding to user Value constructs basic statistics measure feature (maximum value, minimum value, mean value, median, standard deviation), including work order time, call The fields such as duration, total electricity, the electricity charge amount of money, moon electricity growth rate, year electricity growth rate, capacity utilization, energy consumption grade.
Two, the foundation of power consumer tag library
User is laterally constructed label system from two different groups of individual and enterprise by the application respectively, longitudinal from basic Three attribute, behavioural habits, cognitive characteristics directions, the common building multi-source label system of transverse and longitudinal intersection.Shown in table specific as follows:
Detailed description of the invention
In order to keep the purpose, technical solution and effect of the application more obvious, the application provides following attached drawing and is illustrated:
Fig. 1 is the step process based on the determination method of the multi-source feature power consumer portrait of double-deck xgboost algorithm Figure.
Fig. 2 is the step flow chart of the frame of the multi-source feature power consumer portrait based on double-deck xgboost algorithm.
Specific embodiment
The application mainly uses xgboost algorithm, is classified according to different tag combinations to power consumer, Xgboost algorithm is to extend and improve GDBT with decision tree (CART) for base learner, combines the integrated of more influence factors Learning algorithm.Regular terms is added to objective function and seeks optimal solution by it, including can customize loss function, standardization canonical Item, the processing of sparse features, the processing of missing values, Parallel Algorithm design etc., by the iterative calculation of Weak Classifier, effectively keep away Exempt from over-fitting occur, has many advantages, such as the fast speed of service, good classification effect, supports customized loss function, versatility high.
Specific step is as follows for the double-deck xgboost model:
Step 1: regular learning objective
For data set the D={ (x containing n sample and m featurei,yi) (| D |=n, xi∈Rm,yi∈ R), the collection of tree It is added at model using K subfunction to obtain final output.
In formula: F={ f (x)=wq(x)}(q:Rm→ T) and w ∈ RT;F is the space of regression tree;Q is the structure of tree, these Each sample is mapped to corresponding leaf node by tree construction;T is corresponding leaf node quantity.
Each fkA corresponding independent tree construction q and leaf structure w.Different from decision tree, each regression tree is each Leaf node has continuous score.Used here as wiIndicate the score on i-th leaf.The sample given for one, can be with It is categorized into corresponding leaf node using the decision rule in tree, and by the score on corresponding leaf node and is added up As last predicted value.In order to obtain corresponding function cluster in a model, following regularization objective function can be minimized.
In formula: l be one can dimpling loss function, the difference of corresponding predicted value and target value;Ω is to penalize item, limited model Complexity.Additional regular terms helps to smooth final study weight, to avoid over-fitting.Regularization learning objective tends to In selection one model simple and that prediction effect is good.
Step 2: gradient number boosting algorithm
In tree integrated model contain function parameter, cannot be carried out in theorem in Euclid space with traditional optimization method excellent Change.Here it usesRepresent i-th of example of the t times iteration, and by ftIt is added in following objective functions.
The f of model is significantly improved by constantly addingt, can achieve the effect of Optimized model.It under normal circumstances can be with Target is advanced optimized using Two-order approximation.
In formula:For first-order partial derivative,For second-order partial differential coefficient.
Target L simplified below is obtained by removal constant termt
Define Ij=i | q (xi)=j } example as leaf node j.L (φ) is rewritten are as follows:
The structure q (x) fixed for one*, the optimal weights of leaf node j can be calculated It can be used as marking Function evaluates the superiority and inferiority of tree construction q.This score is similar to the evaluation index of decision tree, and it is wider that difference is that it is suitable for General optimization aim.
It is possible thereby to calculate corresponding optimal value:
Under normal circumstances, enumerating all possible tree construction can accomplish.Using greedy algorithm, from single leaf section Point starts, and is iteratively added limb.Assuming that ILAnd IRIt is the example collection of the left and right node after dividing, enables I=IR∪IL, then Loss function reduction amount after division can be represented by the formula its both candidate nodes commonly used to evaluation segmentation.
Step 3: weighting quantile
For the operation of accelerating algorithm, approximate algorithm can be further used, quickly finds suitable segmentation candidates point.Usually Segmentation candidates point is generated using the tercile of individual features, so that distribution of the segmentation candidates point on data set is more Uniformly.A ranking functions r is defined thusk=R → [0 ,+∞):
In formula: Dk={ (x1k,h1),(x2k,h2),...,(xnk,hn) it is k-th of feature and its corresponding training sample Second dervative.Using this formula, all segmentation candidates point { s can be determinedk1,sk2,...,sklCondition should be met: | rk (sk,j)-rk(sk,j+1) | < ε, wherein ε is an approximation factor, for generating segmentation candidates point.
Step 4:, using xgboost algorithm iteration 2000 times, reserving model trains Shi Shufen according to the first, second and third step Used whole features are split as the second layer xgboost and the bagging input layer for blending model and continue iteration, are mentioned with this The accuracy of high model.
Finally, it is stated that the present invention considers mass data collection and processing method, machine learning and depth are utilized The theoretical label system held and construct power consumer is practised, and uses bilayer xgboost algorithm, this is a kind of newer engineering Data mining algorithm is practised, has many advantages, such as that training speed is fast, generalization ability is strong, the data of processing are more, acquired results can serve The various aspects such as grid DSM, electricity market, so that being conducive to improve the economic benefit of electric power enterprise and power consumer expires Meaning degree.

Claims (4)

1. a kind of determination method of the multi-source feature power consumer portrait based on double-deck xgboost algorithm, which is characterized in that packet Include step:
(1) user is laterally constructed label system from two different groups of individual and enterprise by this method respectively, longitudinal from basic Three attribute, behavioural habits, cognitive characteristics directions, the common building multi-source label system of transverse and longitudinal intersection;(2) effective use multi-source is special Sign, solves the problems, such as higher-dimension, by the essential attribute of user, behavioural habits, cognitive characteristics from category feature, cluster feature, time ruler Six angles such as degree, text feature, statistical nature, numerical characteristics constitute multi-angle of view and merge frame;
(3) model is built based on double-deck xgboost algorithm, determines final power consumer portrait.
2. the determination method drawn a portrait as described in claim 1 based on the multi-source feature power consumer of double-deck xgboost algorithm, The final power consumer portrait, by different tag combinations as input layer, carries out model training respectively, finally ties training Fruit synthesis is formed.
3. a kind of frame of the multi-source feature power consumer portrait based on double-deck xgboost algorithm, it is characterised in that:
(1) one layer of xgboost model mainly includes regular learning objective, gradient boosting algorithm, weighting quantile;
(2) when retaining first layer xgboost model training the used whole features of tree division as second layer xgboost with The input layer that bagging blends model continues iteration;
The result of (3) second layer models output is final prediction result.
4. the frame of the multi-source feature power consumer portrait as claimed in claim 3 based on double-deck xgboost algorithm, special Sign is, the final prediction result, the main electricity charge susceptibility including user, demand susceptibility, power failure susceptibility, risk etc. Grade, credit grade, essential attribute, behavioural habits, cognitive characteristics.
CN201910154105.5A 2019-03-01 2019-03-01 A kind of multi-source feature power consumer composite portrait system based on double-deck xgboost algorithm Pending CN109948913A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910154105.5A CN109948913A (en) 2019-03-01 2019-03-01 A kind of multi-source feature power consumer composite portrait system based on double-deck xgboost algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910154105.5A CN109948913A (en) 2019-03-01 2019-03-01 A kind of multi-source feature power consumer composite portrait system based on double-deck xgboost algorithm

Publications (1)

Publication Number Publication Date
CN109948913A true CN109948913A (en) 2019-06-28

Family

ID=67007832

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910154105.5A Pending CN109948913A (en) 2019-03-01 2019-03-01 A kind of multi-source feature power consumer composite portrait system based on double-deck xgboost algorithm

Country Status (1)

Country Link
CN (1) CN109948913A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110472680A (en) * 2019-08-08 2019-11-19 京东城市(北京)数字科技有限公司 Objective classification method, device and computer readable storage medium
CN112330050A (en) * 2020-11-20 2021-02-05 国网辽宁省电力有限公司营口供电公司 Power system load prediction method considering multiple features based on double-layer XGboost
CN113537607A (en) * 2021-07-23 2021-10-22 国网青海省电力公司信息通信公司 Power failure prediction method
CN113887830A (en) * 2021-10-26 2022-01-04 广东电网有限责任公司 Method, device, equipment and medium for determining power failure sensitivity
CN114119111A (en) * 2022-01-27 2022-03-01 深圳江行联加智能科技有限公司 Power transaction user management method, device, equipment and medium based on big data

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100312726A1 (en) * 2009-06-09 2010-12-09 Microsoft Corporation Feature vector clustering
CN106651424A (en) * 2016-09-28 2017-05-10 国网山东省电力公司电力科学研究院 Electric power user figure establishment and analysis method based on big data technology
CN108764984A (en) * 2018-05-17 2018-11-06 国网冀北电力有限公司电力科学研究院 A kind of power consumer portrait construction method and system based on big data
CN108764663A (en) * 2018-05-15 2018-11-06 广东电网有限责任公司信息中心 A kind of power customer portrait generates the method and system of management
CN109359868A (en) * 2018-10-18 2019-02-19 国网电子商务有限公司 A kind of construction method and system of power grid user portrait

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100312726A1 (en) * 2009-06-09 2010-12-09 Microsoft Corporation Feature vector clustering
CN106651424A (en) * 2016-09-28 2017-05-10 国网山东省电力公司电力科学研究院 Electric power user figure establishment and analysis method based on big data technology
CN108764663A (en) * 2018-05-15 2018-11-06 广东电网有限责任公司信息中心 A kind of power customer portrait generates the method and system of management
CN108764984A (en) * 2018-05-17 2018-11-06 国网冀北电力有限公司电力科学研究院 A kind of power consumer portrait construction method and system based on big data
CN109359868A (en) * 2018-10-18 2019-02-19 国网电子商务有限公司 A kind of construction method and system of power grid user portrait

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110472680A (en) * 2019-08-08 2019-11-19 京东城市(北京)数字科技有限公司 Objective classification method, device and computer readable storage medium
CN112330050A (en) * 2020-11-20 2021-02-05 国网辽宁省电力有限公司营口供电公司 Power system load prediction method considering multiple features based on double-layer XGboost
CN113537607A (en) * 2021-07-23 2021-10-22 国网青海省电力公司信息通信公司 Power failure prediction method
CN113537607B (en) * 2021-07-23 2022-08-05 国网青海省电力公司信息通信公司 Power failure prediction method
CN113887830A (en) * 2021-10-26 2022-01-04 广东电网有限责任公司 Method, device, equipment and medium for determining power failure sensitivity
CN114119111A (en) * 2022-01-27 2022-03-01 深圳江行联加智能科技有限公司 Power transaction user management method, device, equipment and medium based on big data

Similar Documents

Publication Publication Date Title
CN109948913A (en) A kind of multi-source feature power consumer composite portrait system based on double-deck xgboost algorithm
Du et al. CUS-heterogeneous ensemble-based financial distress prediction for imbalanced dataset with ensemble feature selection
CN111178624B (en) New product demand prediction method
CN109767255A (en) A method of it is modeled by big data and realizes intelligence operation and precision marketing
CN106548381A (en) Intelligent subscriber tag systems and implementation method
CN108388955A (en) Customer service strategies formulating method, device based on random forest and logistic regression
CN107766929A (en) model analysis method and device
CN110866782A (en) Customer classification method and system and electronic equipment
Li et al. RETRACTED ARTICLE: Data mining optimization model for financial management information system based on improved genetic algorithm
CN110046981A (en) A kind of credit estimation method, device and storage medium
Deng et al. Analysis and prediction of bank user churn based on ensemble learning algorithm
Ding Performance analysis of public management teaching practice training based on artificial intelligence technology
CN112508671A (en) Enterprise financial data processing method, system, device and medium
CN106022599A (en) Industrial design talent level evaluation method and system
Bvuma et al. Comparative analysis of data storage solutions for responsive big data applications
CN108154380A (en) The method for carrying out the online real-time recommendation of commodity to user based on extensive score data
Colapinto et al. Goal programming for financial portfolio management: a state-of-the-art review
Yu et al. Dynamic customer preference analysis for product portfolio identification using sequential pattern mining
CN108009847A (en) The method for taking out shop embedding feature extractions under scene
Jiang et al. On the build and application of bank customer churn warning model
Bierhold For a better understanding of Industry 4.0-An Industry 4.0 maturity model
Klisarova-Belcheva et al. Business intelligence and analytics–contemporary system model
Zhang [Retracted] The Impact of Tax Reduction and Fee Reduction Based on Big Data Algorithm on the High‐Quality Development of the Real Economy under the Action of Coupling Effect or Substitution Effect
Xiong et al. A proposed fixed-sum carryovers reallocation DEA approach for social scientific resources of Chinese public universities
Guo et al. Statistical decision research of long-term deposit subscription in banks based on decision tree

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20190628