CN109948913A - A kind of multi-source feature power consumer composite portrait system based on double-deck xgboost algorithm - Google Patents
A kind of multi-source feature power consumer composite portrait system based on double-deck xgboost algorithm Download PDFInfo
- Publication number
- CN109948913A CN109948913A CN201910154105.5A CN201910154105A CN109948913A CN 109948913 A CN109948913 A CN 109948913A CN 201910154105 A CN201910154105 A CN 201910154105A CN 109948913 A CN109948913 A CN 109948913A
- Authority
- CN
- China
- Prior art keywords
- feature
- deck
- power consumer
- double
- user
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The application discloses the determination method and frame of a kind of multi-source feature power consumer portrait based on double-deck xgboost algorithm, user is laterally constructed label system from two different groups of individual and enterprise by this method and frame respectively, from three essential attribute, behavioural habits, cognitive characteristics directions, transverse and longitudinal intersection is common to construct multi-source label system for longitudinal direction.Simultaneously, to efficiently use multi-source feature, solve the problems, such as higher-dimension, multi-angle of view, which is constituted, from six angles such as category feature, cluster feature, time scale, text feature, statistical nature, numerical characteristics merges frame, model is built based on double-deck xgboost algorithm, it is final to determine power consumer composite portrait.
Description
Technical field
This application involves technical field of electric power, more specifically to a kind of multi-source based on double-deck xgboost algorithm
Feature power consumer composite portrait determines method and system.
Background technique
User's portrait is the important application of big data technology, and target is that retouching for user is established in many dimensions
The property stated tag attributes, to be sketched the contours using these tag attributes true personal characteristics various to user, to utilize
User, which draws a portrait, excavates user demand, analyzes user preference, and it is more efficient and more have needle by matching user's portrait to be supplied to user
Information conveyance to property and the user experience closer to personal habits.
With the deep propulsion of informatization and power system reform, requirement of the user to power quality is higher and higher.
In big data widely applied today, as data-intensive enterprise, power grid enterprises can make full use of its electric power big data to carry out
Power consumer based on data mining, which is drawn a portrait, to be studied, and the data value of power industry is played, and is formulated differentiation and is mutually tied with precision
The marketing strategy of conjunction also can be in the power sales after sale of electricity side is decontroled while adapting to growing user demand variation
In possess first-strike advantage.
But the variability of electricity consumption user behavior habit, prevent power grid enterprises from recognizing user in real time.Electricity consumption user
Constantly changing, enterprise is also required to change service plan and coughs up, it is thus understood that the real demand of user.Enterprise needs to carry out user thin
Point, differentiated service is carried out to different types of user.Power grid enterprises need to draw a portrait by user, comprehensively understand user, look for
To target user, to realize precision marketing.
Existing user classification method there are the problem of:
(1) label closing and not comprehensively, so-called closing refers to that number of tags is limited and fixed;
(2) algorithm used is dull and accuracy and generalization ability be not high.
(3) user, which draws a portrait, updates lag, cannot reflect the latest tendency of user in real time.
Summary of the invention
One, the foundation of multi-angle of view fusion frame
To efficiently use multi-source feature, higher-dimension is solved the problems, such as, from category feature, cluster feature, time scale, text spy
Six angles such as sign, statistical nature, numerical characteristics constitute multi-angle of view and merge frame.
1, category feature
From the point of view of electricity consumption group, personal electricity consumption and business electrical can be divided into, personal is divided into householder's electricity consumption and tenant
Electricity consumption, business electrical can classify from different industries and property, such as industry, agricultural, lodging catering trade, Software Industry, public
Infrastructure electricity consumption, hospital, government, school etc..The type of service of user's consultings different simultaneously, which reflects different users, to be needed
It asks, some users report failure for repairment, and some is then more concerned with electricity charge electricity.Based on the above analysis, the application to electricity consumption type,
The classification fields such as town and country classification do following processing: field same for each user corresponds to multiple classification values, in order to sufficiently sharp
With the information expressed between multiple values, the application carries out character representation to it with bag-of-categoties model.bag-
Of-categoties model is improved on the basis of bag-of-words model, in addition to whether considering each sample
There is classification X (has_x) outside, it is also contemplated that each sample corresponds to the number (count_x) and ratio (ratio_x) of classification X.
2, cluster coding
Part code field value is too many, such as power supply unit coding just has hundreds of value.It is found by statistical observation,
There is fixed coding rule in these fields, can be clustered to coding using these rules, according still further to category feature to it
It is handled, to realize Feature Dimension Reduction.
3, time scale
Since Various Seasonal, the power demand of different time are different, electricity charge sensitivity use in different time periods
There is also differences for amount amount.In view of the importance of time factor, constructed on time dimension largely contain abundant information and
Effective feature.For electricity consumption user, construct hour, day, 3 kinds of granularities of the moon category feature, and whether be the first tenday period of a month, in
3 binary features in ten days, the last ten-days period, and need to consider the connection between multiple times, the feature of construction includes the moon, day, hour 3
Cluster feature, statistical nature and the numerical characteristics of kind granularity.
4, text feature
When constructing demand susceptibility, power failure susceptibility, electricity charge susceptibility user tag, work order is accepted, 95598 calls are remembered
The content for recording content record user's narration, reflects the true demand of user, direct embodiment whether being user's sensitivity.Its base
This format is " [demand] main contents ", such as " [troublshooting] failure-description/user's requirement/user emotion ", " [timesharing electricity
Valence] user consulting whether opened tou power price/tou power price policy/tou power price range and condition " etc..We are to each use
The content of text at family is extracted 3 kinds of text features, the tf-idf feature including unigram+bigrame, text size, word altogether
The statistics measure feature such as number and demand score feature.Demand score feature reflects the sensitivity of demand (content i.e. in " [] ")
Degree.Calculation:
Wherein, needs_num_positive (n) indicates the number that demand n occurs in positive example, needs_num_sum
(n) indicate that the total degree that demand n occurs, α indicate smoothing factor.
5, statistical nature
In order to preferably portray user's portrait, the application fully considers electrical network business needs, and derivative based on original field
A series of statistical nature out.For example, in 95598 message registrations, between counting user first record and the last item record
Every number of days, user's communication record quantity;In electricity charge information, quantity, each user that receivable electricity charge information records are counted
Months, receivable penalty and the difference of paid penalty etc. for thering is payment to record;In power information, annual zero electricity of counting user
Measure total number of days, the total number of days of big electricity, annual continuous longest zero power number of days, the annual electricity number of days, the low electricity in platform area of continuously most growing up
Press number of days etc..
6, numerical characteristics
For numeric type data, some common statistics often can reflect the deep information of data, such as standard deviation reflection
The dispersion degree of data, median reflect the central tendency etc. of data distribution.Therefore, multiple numeric types corresponding to user
Value constructs basic statistics measure feature (maximum value, minimum value, mean value, median, standard deviation), including work order time, call
The fields such as duration, total electricity, the electricity charge amount of money, moon electricity growth rate, year electricity growth rate, capacity utilization, energy consumption grade.
Two, the foundation of power consumer tag library
User is laterally constructed label system from two different groups of individual and enterprise by the application respectively, longitudinal from basic
Three attribute, behavioural habits, cognitive characteristics directions, the common building multi-source label system of transverse and longitudinal intersection.Shown in table specific as follows:
Detailed description of the invention
In order to keep the purpose, technical solution and effect of the application more obvious, the application provides following attached drawing and is illustrated:
Fig. 1 is the step process based on the determination method of the multi-source feature power consumer portrait of double-deck xgboost algorithm
Figure.
Fig. 2 is the step flow chart of the frame of the multi-source feature power consumer portrait based on double-deck xgboost algorithm.
Specific embodiment
The application mainly uses xgboost algorithm, is classified according to different tag combinations to power consumer,
Xgboost algorithm is to extend and improve GDBT with decision tree (CART) for base learner, combines the integrated of more influence factors
Learning algorithm.Regular terms is added to objective function and seeks optimal solution by it, including can customize loss function, standardization canonical
Item, the processing of sparse features, the processing of missing values, Parallel Algorithm design etc., by the iterative calculation of Weak Classifier, effectively keep away
Exempt from over-fitting occur, has many advantages, such as the fast speed of service, good classification effect, supports customized loss function, versatility high.
Specific step is as follows for the double-deck xgboost model:
Step 1: regular learning objective
For data set the D={ (x containing n sample and m featurei,yi) (| D |=n, xi∈Rm,yi∈ R), the collection of tree
It is added at model using K subfunction to obtain final output.
In formula: F={ f (x)=wq(x)}(q:Rm→ T) and w ∈ RT;F is the space of regression tree;Q is the structure of tree, these
Each sample is mapped to corresponding leaf node by tree construction;T is corresponding leaf node quantity.
Each fkA corresponding independent tree construction q and leaf structure w.Different from decision tree, each regression tree is each
Leaf node has continuous score.Used here as wiIndicate the score on i-th leaf.The sample given for one, can be with
It is categorized into corresponding leaf node using the decision rule in tree, and by the score on corresponding leaf node and is added up
As last predicted value.In order to obtain corresponding function cluster in a model, following regularization objective function can be minimized.
In formula: l be one can dimpling loss function, the difference of corresponding predicted value and target value;Ω is to penalize item, limited model
Complexity.Additional regular terms helps to smooth final study weight, to avoid over-fitting.Regularization learning objective tends to
In selection one model simple and that prediction effect is good.
Step 2: gradient number boosting algorithm
In tree integrated model contain function parameter, cannot be carried out in theorem in Euclid space with traditional optimization method excellent
Change.Here it usesRepresent i-th of example of the t times iteration, and by ftIt is added in following objective functions.
The f of model is significantly improved by constantly addingt, can achieve the effect of Optimized model.It under normal circumstances can be with
Target is advanced optimized using Two-order approximation.
In formula:For first-order partial derivative,For second-order partial differential coefficient.
Target L simplified below is obtained by removal constant termt。
Define Ij=i | q (xi)=j } example as leaf node j.L (φ) is rewritten are as follows:
The structure q (x) fixed for one*, the optimal weights of leaf node j can be calculated It can be used as marking
Function evaluates the superiority and inferiority of tree construction q.This score is similar to the evaluation index of decision tree, and it is wider that difference is that it is suitable for
General optimization aim.
It is possible thereby to calculate corresponding optimal value:
Under normal circumstances, enumerating all possible tree construction can accomplish.Using greedy algorithm, from single leaf section
Point starts, and is iteratively added limb.Assuming that ILAnd IRIt is the example collection of the left and right node after dividing, enables I=IR∪IL, then
Loss function reduction amount after division can be represented by the formula its both candidate nodes commonly used to evaluation segmentation.
Step 3: weighting quantile
For the operation of accelerating algorithm, approximate algorithm can be further used, quickly finds suitable segmentation candidates point.Usually
Segmentation candidates point is generated using the tercile of individual features, so that distribution of the segmentation candidates point on data set is more
Uniformly.A ranking functions r is defined thusk=R → [0 ,+∞):
In formula: Dk={ (x1k,h1),(x2k,h2),...,(xnk,hn) it is k-th of feature and its corresponding training sample
Second dervative.Using this formula, all segmentation candidates point { s can be determinedk1,sk2,...,sklCondition should be met: | rk
(sk,j)-rk(sk,j+1) | < ε, wherein ε is an approximation factor, for generating segmentation candidates point.
Step 4:, using xgboost algorithm iteration 2000 times, reserving model trains Shi Shufen according to the first, second and third step
Used whole features are split as the second layer xgboost and the bagging input layer for blending model and continue iteration, are mentioned with this
The accuracy of high model.
Finally, it is stated that the present invention considers mass data collection and processing method, machine learning and depth are utilized
The theoretical label system held and construct power consumer is practised, and uses bilayer xgboost algorithm, this is a kind of newer engineering
Data mining algorithm is practised, has many advantages, such as that training speed is fast, generalization ability is strong, the data of processing are more, acquired results can serve
The various aspects such as grid DSM, electricity market, so that being conducive to improve the economic benefit of electric power enterprise and power consumer expires
Meaning degree.
Claims (4)
1. a kind of determination method of the multi-source feature power consumer portrait based on double-deck xgboost algorithm, which is characterized in that packet
Include step:
(1) user is laterally constructed label system from two different groups of individual and enterprise by this method respectively, longitudinal from basic
Three attribute, behavioural habits, cognitive characteristics directions, the common building multi-source label system of transverse and longitudinal intersection;(2) effective use multi-source is special
Sign, solves the problems, such as higher-dimension, by the essential attribute of user, behavioural habits, cognitive characteristics from category feature, cluster feature, time ruler
Six angles such as degree, text feature, statistical nature, numerical characteristics constitute multi-angle of view and merge frame;
(3) model is built based on double-deck xgboost algorithm, determines final power consumer portrait.
2. the determination method drawn a portrait as described in claim 1 based on the multi-source feature power consumer of double-deck xgboost algorithm,
The final power consumer portrait, by different tag combinations as input layer, carries out model training respectively, finally ties training
Fruit synthesis is formed.
3. a kind of frame of the multi-source feature power consumer portrait based on double-deck xgboost algorithm, it is characterised in that:
(1) one layer of xgboost model mainly includes regular learning objective, gradient boosting algorithm, weighting quantile;
(2) when retaining first layer xgboost model training the used whole features of tree division as second layer xgboost with
The input layer that bagging blends model continues iteration;
The result of (3) second layer models output is final prediction result.
4. the frame of the multi-source feature power consumer portrait as claimed in claim 3 based on double-deck xgboost algorithm, special
Sign is, the final prediction result, the main electricity charge susceptibility including user, demand susceptibility, power failure susceptibility, risk etc.
Grade, credit grade, essential attribute, behavioural habits, cognitive characteristics.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910154105.5A CN109948913A (en) | 2019-03-01 | 2019-03-01 | A kind of multi-source feature power consumer composite portrait system based on double-deck xgboost algorithm |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910154105.5A CN109948913A (en) | 2019-03-01 | 2019-03-01 | A kind of multi-source feature power consumer composite portrait system based on double-deck xgboost algorithm |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109948913A true CN109948913A (en) | 2019-06-28 |
Family
ID=67007832
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910154105.5A Pending CN109948913A (en) | 2019-03-01 | 2019-03-01 | A kind of multi-source feature power consumer composite portrait system based on double-deck xgboost algorithm |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109948913A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110472680A (en) * | 2019-08-08 | 2019-11-19 | 京东城市(北京)数字科技有限公司 | Objective classification method, device and computer readable storage medium |
CN112330050A (en) * | 2020-11-20 | 2021-02-05 | 国网辽宁省电力有限公司营口供电公司 | Power system load prediction method considering multiple features based on double-layer XGboost |
CN113537607A (en) * | 2021-07-23 | 2021-10-22 | 国网青海省电力公司信息通信公司 | Power failure prediction method |
CN113887830A (en) * | 2021-10-26 | 2022-01-04 | 广东电网有限责任公司 | Method, device, equipment and medium for determining power failure sensitivity |
CN114119111A (en) * | 2022-01-27 | 2022-03-01 | 深圳江行联加智能科技有限公司 | Power transaction user management method, device, equipment and medium based on big data |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100312726A1 (en) * | 2009-06-09 | 2010-12-09 | Microsoft Corporation | Feature vector clustering |
CN106651424A (en) * | 2016-09-28 | 2017-05-10 | 国网山东省电力公司电力科学研究院 | Electric power user figure establishment and analysis method based on big data technology |
CN108764984A (en) * | 2018-05-17 | 2018-11-06 | 国网冀北电力有限公司电力科学研究院 | A kind of power consumer portrait construction method and system based on big data |
CN108764663A (en) * | 2018-05-15 | 2018-11-06 | 广东电网有限责任公司信息中心 | A kind of power customer portrait generates the method and system of management |
CN109359868A (en) * | 2018-10-18 | 2019-02-19 | 国网电子商务有限公司 | A kind of construction method and system of power grid user portrait |
-
2019
- 2019-03-01 CN CN201910154105.5A patent/CN109948913A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100312726A1 (en) * | 2009-06-09 | 2010-12-09 | Microsoft Corporation | Feature vector clustering |
CN106651424A (en) * | 2016-09-28 | 2017-05-10 | 国网山东省电力公司电力科学研究院 | Electric power user figure establishment and analysis method based on big data technology |
CN108764663A (en) * | 2018-05-15 | 2018-11-06 | 广东电网有限责任公司信息中心 | A kind of power customer portrait generates the method and system of management |
CN108764984A (en) * | 2018-05-17 | 2018-11-06 | 国网冀北电力有限公司电力科学研究院 | A kind of power consumer portrait construction method and system based on big data |
CN109359868A (en) * | 2018-10-18 | 2019-02-19 | 国网电子商务有限公司 | A kind of construction method and system of power grid user portrait |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110472680A (en) * | 2019-08-08 | 2019-11-19 | 京东城市(北京)数字科技有限公司 | Objective classification method, device and computer readable storage medium |
CN112330050A (en) * | 2020-11-20 | 2021-02-05 | 国网辽宁省电力有限公司营口供电公司 | Power system load prediction method considering multiple features based on double-layer XGboost |
CN113537607A (en) * | 2021-07-23 | 2021-10-22 | 国网青海省电力公司信息通信公司 | Power failure prediction method |
CN113537607B (en) * | 2021-07-23 | 2022-08-05 | 国网青海省电力公司信息通信公司 | Power failure prediction method |
CN113887830A (en) * | 2021-10-26 | 2022-01-04 | 广东电网有限责任公司 | Method, device, equipment and medium for determining power failure sensitivity |
CN114119111A (en) * | 2022-01-27 | 2022-03-01 | 深圳江行联加智能科技有限公司 | Power transaction user management method, device, equipment and medium based on big data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109948913A (en) | A kind of multi-source feature power consumer composite portrait system based on double-deck xgboost algorithm | |
Du et al. | CUS-heterogeneous ensemble-based financial distress prediction for imbalanced dataset with ensemble feature selection | |
CN111178624B (en) | New product demand prediction method | |
CN109767255A (en) | A method of it is modeled by big data and realizes intelligence operation and precision marketing | |
CN106548381A (en) | Intelligent subscriber tag systems and implementation method | |
CN108388955A (en) | Customer service strategies formulating method, device based on random forest and logistic regression | |
CN107766929A (en) | model analysis method and device | |
CN110866782A (en) | Customer classification method and system and electronic equipment | |
Li et al. | RETRACTED ARTICLE: Data mining optimization model for financial management information system based on improved genetic algorithm | |
CN110046981A (en) | A kind of credit estimation method, device and storage medium | |
Deng et al. | Analysis and prediction of bank user churn based on ensemble learning algorithm | |
Ding | Performance analysis of public management teaching practice training based on artificial intelligence technology | |
CN112508671A (en) | Enterprise financial data processing method, system, device and medium | |
CN106022599A (en) | Industrial design talent level evaluation method and system | |
Bvuma et al. | Comparative analysis of data storage solutions for responsive big data applications | |
CN108154380A (en) | The method for carrying out the online real-time recommendation of commodity to user based on extensive score data | |
Colapinto et al. | Goal programming for financial portfolio management: a state-of-the-art review | |
Yu et al. | Dynamic customer preference analysis for product portfolio identification using sequential pattern mining | |
CN108009847A (en) | The method for taking out shop embedding feature extractions under scene | |
Jiang et al. | On the build and application of bank customer churn warning model | |
Bierhold | For a better understanding of Industry 4.0-An Industry 4.0 maturity model | |
Klisarova-Belcheva et al. | Business intelligence and analytics–contemporary system model | |
Zhang | [Retracted] The Impact of Tax Reduction and Fee Reduction Based on Big Data Algorithm on the High‐Quality Development of the Real Economy under the Action of Coupling Effect or Substitution Effect | |
Xiong et al. | A proposed fixed-sum carryovers reallocation DEA approach for social scientific resources of Chinese public universities | |
Guo et al. | Statistical decision research of long-term deposit subscription in banks based on decision tree |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20190628 |