CN113780677A - Prediction method and device for potential power repeated appeal user - Google Patents

Prediction method and device for potential power repeated appeal user Download PDF

Info

Publication number
CN113780677A
CN113780677A CN202111125304.7A CN202111125304A CN113780677A CN 113780677 A CN113780677 A CN 113780677A CN 202111125304 A CN202111125304 A CN 202111125304A CN 113780677 A CN113780677 A CN 113780677A
Authority
CN
China
Prior art keywords
data
appeal
user
power
repeated
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111125304.7A
Other languages
Chinese (zh)
Inventor
陈薇
李炳要
黄令忠
余梅梅
刘晓薇
侯玉
张昱波
成坤
李涛
许盖伦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Power Supply Bureau Co Ltd
Original Assignee
Shenzhen Power Supply Bureau Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Power Supply Bureau Co Ltd filed Critical Shenzhen Power Supply Bureau Co Ltd
Priority to CN202111125304.7A priority Critical patent/CN113780677A/en
Publication of CN113780677A publication Critical patent/CN113780677A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/067Enterprise or organisation modelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/06Electricity, gas or water supply

Abstract

The invention discloses a prediction method and a prediction device for potential power repeated appeal users, wherein the prediction method comprises the following steps: respectively acquiring user panel data, appeal data, processing condition data and historical data of a target power user to be predicted; cleaning and characterizing the user panel data, the appeal data, the processing condition data and the historical data, and storing the user panel data, the appeal data, the processing condition data and the historical data into a training set database; inputting data in a test set database into a pre-established prediction model for explaining repeated appeal indexes, and outputting a repeated appeal prediction value of the power consumer; step S4, determining whether the repeated appeal prediction value of the power consumer is greater than a preset probability threshold, and if so, determining that the power consumer is a potential power repeated appeal user. The method and the system can realize perception of potential repeated appeal users, further improve the management level, improve the working efficiency and improve the enterprise competitiveness.

Description

Prediction method and device for potential power repeated appeal user
Technical Field
The invention belongs to the technical field of power data application and analysis, and particularly relates to a method and a device for predicting potential power repeated appeal users.
Background
With the gradual arousal and continuous upgrading of the demands of novel energy consumers, the energy is no longer an indiscriminate daily product, the requirements of customers on power supply service are changed from 'power on' to 'power on', the expectations on the service of company customers are higher, and the pursuit of convenience, individuality, openness and sharing becomes the main characteristic of energy consumption. The service attribute of the energy product is amplified, and the continuous improvement of the customer satisfaction is more the development requirement of the company. Management and control of enterprise internal control risks are promoted through external client appeal management and control and analysis, service capacity is improved, and enterprise management effect can be further improved. The repeated appeal of the user not only wastes manpower and time, but also can generate negative emotion to cause complaint upgrading.
At present, research on repeated complaints or requirements is limited to extraction of text contents, whether the problem is the problem of the repeated complaints or not is judged, and therefore corresponding follow-up treatment measures are formulated to be managed and controlled, the problem of the repeated complaints is analyzed, and a rectification and modification scheme is formulated. The method still belongs to the category of post-modification, cannot realize the perception of potential repeated appeal users in advance, and cannot improve the management level and the working efficiency.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a method and a device for predicting potential power repeated appeal users, so as to dig links of user experience and perception weakness in advance, improve management level and improve working efficiency.
In order to solve the technical problem, the invention provides a method for predicting a potential power repeated appeal user, which comprises the following steps:
step S1, personal information and user panel data of electricity consumption habits of a target power user to be predicted are obtained, appeal data of appeal content of the target power user are represented, processing condition data of appeal processing condition of the target power user are represented, and historical data of historical electricity consumption condition and historical appeal condition of the target power user are represented respectively;
step S2, cleaning and characterizing the user panel data, appeal data, processing condition data and historical data of the target power user, and storing the user panel data, appeal data, processing condition data and historical data into a test set database;
step S3, inputting data in the test set database into a pre-established prediction model for explaining repeated appeal indexes, and outputting repeated appeal prediction values of the power users;
step S4, determining whether the repeated appeal prediction value of the power consumer is greater than a preset probability threshold, and if so, determining that the power consumer is a potential power repeated appeal user.
Further, the user panel data specifically includes attribute indexes 1 to 4: electricity consumption property, power supply area, sex of complainer and electricity consumption in last year; the appeal data specifically includes attribute indexes 5-19: internal routes, external routes, call duration, class of service, class of secondary services, class of tertiary services, first appeal month, time period, "not accepted", "affected", "as soon as possible", "complaining of mood", "severe", "again", "complaint"; the processing situation data specifically includes attribute indexes 19 to 27: seat job, seat skill, processing completion time (including subsequent processing), whether the subsequent processing is performed or not, subsequent feedback timeout, acceptors, filing time and distribution departments; the historical data specifically comprises attribute indexes 28-33: previous appeal amount, previous consultation amount, previous year service handling times, last year service handling number, last year appeal amount and last year consultation amount; the REPEAT appeal index 34 "REPEAT" is used as a target variable.
Further, the step S2 specifically includes: cleaning original data, performing characteristic processing on the data, including performing integer processing on text information, uniformly formatting identification on discrete data with clustering processing numerical values as one class, eliminating missing data, and storing the cleaned data in a test set database.
Further, the performing the integer processing on the text information specifically replaces the data actual value with an integer value, and the performing the clustering processing on the discrete data with a numerical value of one category specifically classifies the discrete data with a numerical value of one category into different discrete degrees with an integral number.
Further, the step S3 specifically includes: and inputting the data in the test set database into a pre-established prediction model, and applying an R language to call a prcomp () command to perform principal component analysis on the data or call a factanll () command to perform factor analysis on the data, so as to eliminate the inconspicuous variables in the training set index data set.
Further, the step S4 further includes: according to the repeated appeal probability interval, the potential repeated power appeal users are divided into multiple levels at preset intervals, and the higher the level is, the larger the repeated appeal probability is.
Further, the process of establishing the prediction model for interpreting the repeated appeal index specifically includes:
the method comprises the steps that personal information and power utilization habit user panel data used for representing a certain power user in historical data, appeal data used for representing appeal content of the power user, processing situation data used for representing appeal processing situation of the power user and historical data used for representing historical power utilization situation and historical appeal situation of the power user are obtained respectively;
cleaning and characterizing the user panel data, the appeal data, the processing condition data and the historical data, and storing the user panel data, the appeal data, the processing condition data and the historical data into a training set database;
and training the data in the training set database to obtain a prediction model for explaining repeated appeal indexes.
Further, after the data in the training set database is trained, a prediction model for explaining the repeated appeal index is obtained, which specifically includes:
constructing an observation matrix, and carrying out matrix processing on the collected variables: x ═ T (X1, X2, …, xn);
by means of principal component analysis or factor analysis, removing insignificant attribute indexes x1, x2, …, xn in a training set index data set, extracting m) items, and effectively reflecting interpretation variables f1, f2, … and fm of appeal data, wherein m and n respectively represent the number of the attribute indexes, m is far less than n, and xn is a repeated appeal index of a reflection training set of {0, 1 };
let y be xnRepresenting the binary response variable with the value {0, 1}, the explanatory variables are F1, F2, …, fm, F ═ T (F1, F2, …, fm) implicit variable model construction is as follows:
y*=F+ ε, where β is an mx 1 vector, ε represents a random interference term, let α represent an unknown threshold parameter, define:
Figure BDA0003278650640000031
if ε obeys a logistic distribution
Figure BDA0003278650640000032
Obtaining y-condition distribution of given f, and calculating each response probability by using a logistic model:
Figure BDA0003278650640000033
namely, the power repeated appeal user prediction model is judged.
Further, the step S3 outputs the predicted value of the repeated appeal of the power consumer by the following formula:
Figure BDA0003278650640000034
wherein β is an mx 1 vector and α represents an unknown threshold parameter; p ═ P { y ═ 1| FT ═ (f1, f2, …, fm) } represents the probability of being judged as y ═ 1 under the evaluation indexes f1, f2, …, fm, as the repeat demand prediction value of the power consumer.
The invention also provides a prediction device for the potential power repeated appeal user, which comprises the following steps:
the data acquisition unit is used for respectively acquiring personal information and user panel data of electricity utilization habits of a target power user to be predicted, appeal data for characterizing appeal content of the target power user, processing condition data for characterizing appeal processing condition of the target power user and historical data for characterizing historical electricity utilization condition and historical appeal condition of the target power user;
the data processing unit is used for cleaning and characterizing the user panel data, the appeal data, the processing condition data and the historical data of the target power user and storing the user panel data, the appeal data, the processing condition data and the historical data into a test set database;
the calculation unit is used for inputting data in the test set database into a pre-established prediction model for explaining repeated appeal indexes and outputting repeated appeal prediction values of the power users;
the prediction unit is used for judging whether the repeated appeal prediction value of the power user is larger than a preset probability threshold value or not, and if yes, the power user is determined to be a potential power repeated appeal user.
Further, the data processing unit is specifically configured to clean the original data, perform characterization processing on the data, including performing integer processing on text information, uniformly formatting the identification on the discrete data with a clustering processing value as one type, removing missing data, and storing the cleaned data in a test set database.
Further, the performing the integer processing on the text information specifically replaces the data actual value with an integer value, and the performing the clustering processing on the discrete data with a numerical value of one category specifically classifies the discrete data with a numerical value of one category into different discrete degrees with an integral number.
Further, the computing unit is specifically configured to input data in the test set database into a pre-established prediction model, apply the R language to call a prcomp () command to perform principal component analysis on the data or call a factnanl () command to perform factor analysis on the data, and eliminate an insignificant variable in a training set index data set.
Further, the prediction unit is further configured to divide the potential repeated power appeal users into multiple levels at preset intervals according to the repeated appeal probability interval, wherein a higher level indicates a higher repeated appeal probability.
The implementation of the invention has the following beneficial effects: by combining the actual user appeal condition, the probability of repeated appeal of the power user is calculated according to an R language compiling model algorithm, potential repeated appeal users are found, links of user experience and perception weakness are mined in advance, the communication quality is improved for differentiated management of the users, the management level is further improved, the working efficiency is improved, and the enterprise competitiveness is improved; the method changes the post-improvement into the pre-perception, accurately identifies the user upgrade complaints, the client problems and the service risks, practically promotes the user appeal to be effectively and properly solved, and continuously meets the ever-increasing power utilization requirements of the masses.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart illustrating a method for predicting a potential power repetitive appeal user according to an embodiment of the present invention.
FIG. 2 is a schematic diagram of a flow chart of constructing a prediction model according to an embodiment of the present invention.
Fig. 3 is a schematic diagram of a specific application flow of the prediction method for the potential power repetitive appeal user according to the embodiment of the present invention.
Detailed Description
The following description of the embodiments refers to the accompanying drawings, which are included to illustrate specific embodiments in which the invention may be practiced.
The method comprises the steps of screening data indexes of power appeal users according to the principles of purpose, feasibility, focus, typicality and scientificity, obtaining training set attribute indexes, forming a potential power repeated appeal user identification model by applying R language through corresponding indexes, obtaining model input data corresponding to the power appeal users to be identified, and calculating the repeated appeal probability of the model input data. The perception of potential repeated appeal users is achieved, and the difference, pertinence, accuracy and effectiveness of customer service work are further improved.
Thus, referring to fig. 1, an embodiment of the present invention provides a method for predicting a potential power repeat appeal user, including:
step S1, personal information and user panel data of electricity consumption habits of a target power user to be predicted are obtained, appeal data of appeal content of the target power user are represented, processing condition data of appeal processing condition of the target power user are represented, and historical data of historical electricity consumption condition and historical appeal condition of the target power user are represented respectively;
step S2, cleaning and characterizing the user panel data, appeal data, processing condition data and historical data of the target power user, and storing the user panel data, appeal data, processing condition data and historical data into a test set database;
step S3, inputting data in the test set database into a pre-established prediction model for explaining repeated appeal indexes, and outputting repeated appeal prediction values of the power users;
step S4, determining whether the repeated appeal prediction value of the power consumer is greater than a preset probability threshold, and if so, determining that the power consumer is a potential power repeated appeal user.
Specifically, in step S1 of this embodiment, the personal information and the electricity usage habits of the electricity consumers are captured from the marketing management system by using SQL statements, and a first type of "user panel data (X1)" is obtained; capturing power user appeal content from a client problem system to obtain second-type appeal data (X2); capturing power user appeal processing conditions from a client problem system to obtain a third type of processing condition data (X3); and capturing the historical electricity utilization condition and the historical appeal condition of the power consumer from the marketing management system to obtain a fourth type of historical data (X4).
As an example, the four types of data include 34 attribute indexes (actual operations may not be limited to the 34 attribute indexes), including: user panel data (attribute indexes 1-4: electricity consumption property, power supply area, complainer gender and last year electricity consumption); appeal data (attribute indices 5-19: inside route, outside route, call duration, class of service, class of second class of service, class of third class of service, first appeal month, time period, "not accepted", "affected", "as soon as possible", "complaint mood", "severe", "again", "complaint"); processing condition data (attribute indexes 19-27: seat post, seat skill, processing completion time (including subsequent processing), whether subsequent processing is performed or not, subsequent feedback timeout, acceptors, filing time and distribution departments); historical data (attribute indexes 28-33: previous year appeal amount, previous consultation amount, previous year service handling times, last year service handling number, last year appeal amount and last year consultation amount); because the power consumption property and the power supply area are the same, the data of the user panel is basically the same when the sex of the complainer is the same, the historical appeal and the power consumption condition are introduced for judgment, and the attribute index 34 'REPEAT' is used as a target variable, namely a repeated appeal index.
Cleaning original data, performing characteristic processing on the data, converting the data into characteristic indexes for model training, for example, performing integral processing on text information, uniformly formatting identification on discrete data with clustering processing numerical values as one class, eliminating missing data, and storing the cleaned data in a test set database. Specifically, the integer processing is to replace the data actual value with an integer value: for example, the data of the attribute index 1 "electricity property" is converted into integer values 1 to 7 according to the classification of residential life, general industrial and commercial and other, large industrial electricity, commercial, other electricity, non-residential and general industry, and the attribute index 21 "seat job" is converted into integer values 1 to 5 according to high-class seat, general, class, quality control seat and other. In the same way, the relevant attribute indexes such as the power supply area, the gender of the client, the internal path, the service category and the like are processed into integers, and when data is missing, the data is removed. Categorizing discrete data whose values are a class: for example, the accurate data of the attribute index 7 "call duration" may be classified into 12 different discrete degrees of a whole number, which are respectively 0 minute, 0-2 minutes, 2-5 minutes, 5-8 minutes, 8-10 minutes, 10-12 minutes, 12-15 minutes, 15-20 minutes, 20-25 minutes, 25-30 minutes, and 30 minutes or more, and the contents of the data are replaced by the same integer value, which may be converted into integer values of 0, 2, 5, 8, 10, 12, 15, 20, 25, 30, 31, and 32; the attribute index 12 "appeal period" can classify 5 different discrete degrees of which the number is one, and the discrete degrees are respectively 6: 00-12: 00. 12: 00-14: 00. 14: 00-18: 00. 18: 00-22: 00. 22: 00-6: 00, replacing the content with the same integer value, and converting the content into an integer value of 1-5; similarly, other similar attribute indexes such as event processing completion time, filing time, last year electricity consumption and the like are classified and preprocessed according to unified standards under the condition of fully considering user types.
The above data cleansing and processing is important to eliminate insignificant variables in a data set under actual business conditions. First, a large number of variables are generally not handled and interpreted in a very reasonable manner; and secondly, some data in the 33 variables and the classification variables, such as position jobs and position skills, secondary service subclasses and tertiary service subclasses, whether subsequent processing and subsequent feedback timeout and other indexes are mutually dependent. Therefore, the present embodiment only retains some more important information contained therein to reduce the amount of analysis.
Step S2 further determines all the appeal situations to obtain a REPEAT appeal index "REPEAT", processes the index data 0-1, and stores the data in the test set database.
Step S3 inputs the above related data into a pre-established prediction model, and applies the R language to call the prcomp () command to analyze its Principal Component (Principal Component Analysis) or call the factnanl () command to perform factor Analysis (fan Analysis), so as to eliminate the insignificant variables in the test set index data set. And inputting the generated related principal component Fj into a pre-established prediction model, calling a glm () command by applying an R language, constructing and using the Fj as an interpretation variable to carry out logistic regression analysis, and interpreting a repeated appeal index 'REPEAT' (namely a target variable).
Please refer to fig. 2, the process of constructing the prediction model is as follows:
(1) similar to step S1, user panel data for representing personal information and electricity usage habits of each power user, appeal data for representing appeal content of each power user, processing condition data for representing appeal processing conditions of each power user, and historical data for representing historical electricity usage conditions and historical appeal conditions of each power user are respectively obtained; and then, similarly cleaning data, and constructing a new training set database after eliminating invalid data.
(2) According to the basic idea of the principal component analysis method, the index system represented by the original variable is reduced to the index system represented by the principal component. First, before principal component analysis is utilized, the entire data set is extended so that their differences are unity. Therefore, the variance of the raw data of the chosen variables is a reasonable interpretation. Then, the variance of the components was obtained as a result.
The following are the outputs at R
#Input the original data set
ticdata=read.table(″″)
testing=read.table(″″)
testing=as.matrix(testing)
#Using principal component analysis to select sig nificant variables
tic=as.matrix(ticdata[,1:33])
pca=prcomp(tic,scale.=T)
summary(pca)
The principle of principal component-based logistic model and estimation is as follows:
firstly, constructing an observation matrix, and carrying out matrix processing on collected variables: x ═ T (X1, X2, …, xn);
wherein n is the number of attribute indexes; removing the insignificant attribute indexes x1, x2, … and xn in the training set index data set through Principal Component Analysis (Principal Component Analysis), extracting m (far less than n) items, which can effectively reflect the interpretation variables f1, f2, … and fm of the appeal data, wherein m and n respectively represent the number of the attribute indexes, and xn is a {0, 1} reflecting training set repeated appeal index.
The main components are analyzed to obtain:
Figure BDA0003278650640000081
has the following characteristics:
(1) fi and fj are independent of each other, i.e., Cov (fi, fj) ═ 0;
(2) f1 is the one with the largest variance among all linear combinations of x1, x2,., (the coefficients satisfy the above requirements), and (b), (c), i.e., fm is the one with the largest variance among all linear combinations of x1, x2,. and xn, which are not related to f1, f2,. and fm-1.
Let y be xnRepresenting binary response variables with values {0, 1}, the explanatory variables are F1, F2 …, fm, F ═ F1, F2, …, fm) T hidden variable models are constructed as follows:
y*=F+ ε, where β is an mx 1 vector, ε represents a random interference term, let α represent an unknown threshold parameter, define:
Figure BDA0003278650640000091
if ε obeys a logistic distribution
Figure BDA0003278650640000092
The y-condition distribution of a given f can be obtained, and each response probability can be calculated by using a logistic model:
Figure BDA0003278650640000093
namely, the prediction model is used for judging the repeated appeal user.
(3) Method for constructing component weight
From the results of the principal component analysis, it can be seen that several principal components explain 90% of the varying data. The following are the outputs at R:
rot=pca$r
x=pca$x
loading=rot[,1:23]
loading
plot(pca)
pcadata=tic%*%oading
(4) method for constructing logistic model algorithm based on principal component
Using the matrix multiplication of the test data for the load capacity of these several major components, new variables were obtained that could be used to conform to the Logistic model, and Logistic regression analysis was performed using Fj as the explanatory variable, the output at R is as follows:
#Construct logistic model and predict the testing data
logit=glm(ticdata[,34]~as.matrix(pcadata),family=binomial)
summary(ogit)
the logistic model for obtaining the main components is as follows:
ε+F
=0.00786F1-0.26419F2+0.32771F3+0.83742F4+0.07617F5+0.12871F6-0.28977F7+0.664F8+0.02593F9-0.10943F10+0.1025F11-0.41541F12-0.54102F13-0.16840F14-013043F15-0.2424F16-0.27553F17-0.01689F18+0.10497F19+0.31261F20+0.14404F21+0.05996F22-0.2297F23。
test set testing was introduced, by the following equation:
Figure BDA0003278650640000101
deriving a probabilistic predictor (i.e., p) for "REPEAT", where β is an mx 1 vector and α represents an unknown threshold parameter; p ═ P { y ═ 1/FT ═ (f1, f2, …, fm) } represents the probability of being judged as y ═ 1 under the interpretation variables f1, f2, …, fm, as the repeat appeal prediction value of the power consumer.
The following are the outputs at R:
pcatesting=testing%*%loading
c0=logit$coefficients[1]
c0=rep(c0,1000)
c=logit$coefficients[2:24]
gtesting=as.matrix(pcatesting)%*%as.matrix(c)
ytesting=gtesting+c0
ptesting=exp(ytesting)/(1+exp(ytesting))
ptesting
it can be understood that if the factors such as the attribute field, the data time, the number of users in the training set are different, the calculated weights of the generated predictive model formula responses are different, but the predictive model formula responses are based on the factor analysis or the principal component analysis, and the construction method is the same.
Referring to fig. 3 again, in a specific application example of the present application, a user first calls for a problem of a hotline appeal of a service, according to the method of the present embodiment, user identification is performed through a power utilization address/label/user number, and then user panel data, appeal data, handling condition data, and history data of the user are obtained; inputting the predicted value into a pre-established prediction model to obtain a repeated appeal predicted value of the user, comparing the repeated appeal predicted value with a preset probability threshold, and if the predicted value is larger than the probability threshold, determining that the power user is a potential power repeated appeal user. After the potential power repeated appeal users are obtained, the client service personnel can respond to the client appeal in a differentiated mode according to the repeated appeal user management and control measures and schemes, so that appeal upgrading is avoided, the client appeal content is dug deeply, risk assessment is conducted on the client appeal content, and the internal management and control level of an enterprise is enhanced.
According to the repeated appeal condition of the training set and the practicability of p, the potential repeated power appeal users can be divided into 5 grades at intervals of 0.2 according to the repeated appeal probability interval [0-1], namely: a1[ 0-0.2%), A2[ 0.2-0.4%), A3[ 0.4-0.6%), A4[ 0.6-0.8%), A5[0.8-1], higher grades indicate greater probability of repeat claims. And analyzing the assumed threshold value when p is larger than 0.5, and calculating the test set to obtain 3 potential repeated appeal clients, wherein the 3 potential repeated appeal clients are actually shown to be the repeated appeal clients, and the model has better prediction condition. Considering that the attribute indexes in the embodiment have high correlation and the numerical values have missing conditions, the effective indexes can be increased to further train the model, and a more accurate prediction result is obtained.
Corresponding to the embodiment of the invention, a method for predicting a potential power repeated appeal user is also provided, and a device for predicting a potential power repeated appeal user is provided, and the method comprises the following steps:
the data acquisition unit is used for respectively acquiring personal information and user panel data of electricity utilization habits of a target power user to be predicted, appeal data for characterizing appeal content of the target power user, processing condition data for characterizing appeal processing condition of the target power user and historical data for characterizing historical electricity utilization condition and historical appeal condition of the target power user;
the data processing unit is used for cleaning and characterizing the user panel data, the appeal data, the processing condition data and the historical data of the target power user and storing the user panel data, the appeal data, the processing condition data and the historical data into a test set database;
the calculation unit is used for inputting data in the test set database into a pre-established prediction model for explaining repeated appeal indexes and outputting repeated appeal prediction values of the power users;
the prediction unit is used for judging whether the repeated appeal prediction value of the power user is larger than a preset probability threshold value or not, and if yes, the power user is determined to be a potential power repeated appeal user.
Further, the data processing unit is specifically configured to clean the original data, perform characterization processing on the data, including performing integer processing on text information, uniformly formatting the identification on the discrete data with a clustering processing value as one type, removing missing data, and storing the cleaned data in a test set database.
Further, the performing the integer processing on the text information specifically replaces the data actual value with an integer value, and the performing the clustering processing on the discrete data with a numerical value of one category specifically classifies the discrete data with a numerical value of one category into different discrete degrees with an integral number.
Further, the computing unit is specifically configured to input data in the test set database into a pre-established prediction model, apply the R language to call a prcomp () command to perform principal component analysis on the data or call a factnanl () command to perform factor analysis on the data, and eliminate an insignificant variable in a training set index data set.
Further, the prediction unit is further configured to divide the potential repeated power appeal users into multiple levels at preset intervals according to the repeated appeal probability interval, wherein a higher level indicates a higher repeated appeal probability.
For the working principle and process of the present embodiment, please refer to the description of the first embodiment of the present invention, which is not repeated herein.
As can be seen from the above description, the present invention provides the following advantageous effects: by combining the actual user appeal condition, the probability of repeated appeal of the power user is calculated according to an R language compiling model algorithm, potential repeated appeal users are found, links of user experience and perception weakness are mined in advance, the communication quality is improved for differentiated management of the users, the management level is further improved, the working efficiency is improved, and the enterprise competitiveness is improved; the method changes the post-improvement into the pre-perception, accurately identifies the user upgrade complaints, the client problems and the service risks, practically promotes the user appeal to be effectively and properly solved, and continuously meets the ever-increasing power utilization requirements of the masses.
The above disclosure is only for the purpose of illustrating the preferred embodiments of the present invention, and it is therefore to be understood that the invention is not limited by the scope of the appended claims.

Claims (14)

1. A method for predicting potential power repetitive appeal users is characterized by comprising the following steps:
step S1, personal information and user panel data of electricity consumption habits of a target power user to be predicted are obtained, appeal data of appeal content of the target power user are represented, processing condition data of appeal processing condition of the target power user are represented, and historical data of historical electricity consumption condition and historical appeal condition of the target power user are represented respectively;
step S2, cleaning and characterizing the user panel data, appeal data, processing condition data and historical data of the target power user, and storing the user panel data, appeal data, processing condition data and historical data into a test set database;
step S3, inputting data in the test set database into a pre-established prediction model for explaining repeated appeal indexes, and outputting repeated appeal prediction values of the power users;
step S4, determining whether the repeated appeal prediction value of the power consumer is greater than a preset probability threshold, and if so, determining that the power consumer is a potential power repeated appeal user.
2. The prediction method according to claim 1, wherein the user panel data specifically comprises attribute metrics 1-4: electricity consumption property, power supply area, sex of complainer and electricity consumption in last year; the appeal data specifically includes attribute indexes 5-19: internal routes, external routes, call duration, class of service, class of secondary services, class of tertiary services, first appeal month, time period, "not accepted", "affected", "as soon as possible", "complaining of mood", "severe", "again", "complaint"; the processing situation data specifically includes attribute indexes 19 to 27: seat job, seat skill, processing completion time (including subsequent processing), whether the subsequent processing is performed or not, subsequent feedback timeout, acceptors, filing time and distribution departments; the historical data specifically comprises attribute indexes 28-33: previous appeal amount, previous consultation amount, previous year service handling times, last year service handling number, last year appeal amount and last year consultation amount; the REPEAT appeal index 34 "REPEAT" is used as a target variable.
3. The prediction method according to claim 1, wherein the step S2 specifically includes: cleaning original data, performing characteristic processing on the data, including performing integer processing on text information, uniformly formatting identification on discrete data with clustering processing numerical values as one class, eliminating missing data, and storing the cleaned data in a test set database.
4. The prediction method according to claim 3, wherein the performing of the integer processing on the text information is to replace an actual data value with an integer value, and the performing of the clustering on the discrete data of one category is to classify the discrete data of one category into different discrete degrees of a same number.
5. The prediction method according to claim 1, wherein the step S3 specifically includes: and inputting the data in the test set database into a pre-established prediction model, and applying an R language to call a prcomp () command to perform principal component analysis on the data or call a factanll () command to perform factor analysis on the data, so as to eliminate the inconspicuous variables in the training set index data set.
6. The prediction method according to claim 1, wherein the step S4 further comprises: according to the repeated appeal probability interval, the potential repeated power appeal users are divided into multiple levels at preset intervals, and the higher the level is, the larger the repeated appeal probability is.
7. The prediction method of claim 1, wherein the process of building the prediction model for interpreting the repetitive desirability indicators comprises:
the method comprises the steps that personal information and power utilization habit user panel data used for representing a certain power user in historical data, appeal data used for representing appeal content of the power user, processing situation data used for representing appeal processing situation of the power user and historical data used for representing historical power utilization situation and historical appeal situation of the power user are obtained respectively;
cleaning and characterizing the user panel data, the appeal data, the processing condition data and the historical data, and storing the user panel data, the appeal data, the processing condition data and the historical data into a training set database;
and training the data in the training set database to obtain a prediction model for explaining repeated appeal indexes.
8. The prediction method according to claim 7, wherein the training of the data in the training set database to obtain the prediction model for interpreting the repetitive complaint indicators comprises:
constructing an observation matrix, and carrying out matrix processing on the collected variables: x ═ T (X1, X2, …, xn);
by means of principal component analysis or factor analysis, removing insignificant attribute indexes x1, x2, …, xn in a training set index data set, extracting m) items, and effectively reflecting interpretation variables f1, f2, … and fm of appeal data, wherein m and n respectively represent the number of the attribute indexes, m is far less than n, and xn is a repeated appeal index of a reflection training set of {0, 1 };
let y be xnRepresenting the binary response variable with the value {0, 1}, the explanatory variables are F1, F2, …, fm, F ═ T (F1, F2, …, fm) implicit variable model construction is as follows:
y*=F+ ε, where β is an mx 1 vector, ε represents a random interference term, let α represent an unknown threshold parameter, define:
Figure FDA0003278650630000021
if ε obeys a logistic distribution
Figure FDA0003278650630000022
Obtaining y-condition distribution of given f, and calculating each response probability by using a logistic model:
Figure FDA0003278650630000031
namely, the power repeated appeal user prediction model is judged.
9. The prediction method according to claim 8, wherein the step S3 is configured to output the predicted value of the repeated appeal of the power consumer according to the following formula:
Figure FDA0003278650630000032
wherein β is an mx 1 vector and α represents an unknown threshold parameter; p ═ P { y ═ 1| FT ═ (f1, f2, …, fm) } represents the probability of being judged as y ═ 1 under the evaluation indexes f1, f2, …, fm, as the repeat demand prediction value of the power consumer.
10. A device for predicting potential power repeat appeal users, comprising:
the data acquisition unit is used for respectively acquiring personal information and user panel data of electricity utilization habits of a target power user to be predicted, appeal data for characterizing appeal content of the target power user, processing condition data for characterizing appeal processing condition of the target power user and historical data for characterizing historical electricity utilization condition and historical appeal condition of the target power user;
the data processing unit is used for cleaning and characterizing the user panel data, the appeal data, the processing condition data and the historical data of the target power user and storing the user panel data, the appeal data, the processing condition data and the historical data into a test set database;
the calculation unit is used for inputting data in the test set database into a pre-established prediction model for explaining repeated appeal indexes and outputting repeated appeal prediction values of the power users;
the prediction unit is used for judging whether the repeated appeal prediction value of the power user is larger than a preset probability threshold value or not, and if yes, the power user is determined to be a potential power repeated appeal user.
11. The prediction device according to claim 10, wherein the data processing unit is specifically configured to clean raw data, perform characterization processing on the data, and include performing integer processing on text information, uniformly formatting the identification on discrete data with a clustering processing value as one class, removing missing data, and storing the cleaned data in a test set database.
12. The prediction apparatus according to claim 11, wherein the performing of the integer processing on the text information is to replace a data actual value with an integer value, and the performing of the clustering on the discrete data of one category is to classify the discrete data of one category into different discrete degrees of a single number.
13. The prediction apparatus as claimed in claim 10, wherein the computing unit is specifically configured to input data in the test set database into a pre-established prediction model, and apply the R language to call prcomp () command to analyze its principal component or call factnanl () command to factor it, so as to eliminate the insignificant variables in the training set index data set.
14. The prediction device of claim 10, wherein the prediction unit is further configured to classify the potential repetitive power appeal users into multiple levels at preset intervals according to a repetitive appeal probability interval, and a higher level indicates a higher repetitive appeal probability.
CN202111125304.7A 2021-09-26 2021-09-26 Prediction method and device for potential power repeated appeal user Pending CN113780677A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111125304.7A CN113780677A (en) 2021-09-26 2021-09-26 Prediction method and device for potential power repeated appeal user

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111125304.7A CN113780677A (en) 2021-09-26 2021-09-26 Prediction method and device for potential power repeated appeal user

Publications (1)

Publication Number Publication Date
CN113780677A true CN113780677A (en) 2021-12-10

Family

ID=78853359

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111125304.7A Pending CN113780677A (en) 2021-09-26 2021-09-26 Prediction method and device for potential power repeated appeal user

Country Status (1)

Country Link
CN (1) CN113780677A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109242257A (en) * 2018-08-09 2019-01-18 广州瀚信通信科技股份有限公司 A kind of 4G Internet user complaint model based on key index association analysis
CN109447364A (en) * 2018-11-08 2019-03-08 国网湖南省电力有限公司 Power customer based on label complains prediction technique
CN111353792A (en) * 2020-05-25 2020-06-30 广东电网有限责任公司惠州供电局 Client portrait system with visual display and data analysis functions

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109242257A (en) * 2018-08-09 2019-01-18 广州瀚信通信科技股份有限公司 A kind of 4G Internet user complaint model based on key index association analysis
CN109447364A (en) * 2018-11-08 2019-03-08 国网湖南省电力有限公司 Power customer based on label complains prediction technique
CN111353792A (en) * 2020-05-25 2020-06-30 广东电网有限责任公司惠州供电局 Client portrait system with visual display and data analysis functions

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李鹏鹏等: "基于随机森林算法的95598投诉预测方法研究", 浙江电力, no. 04 *

Similar Documents

Publication Publication Date Title
US10592811B1 (en) Analytics scripting systems and methods
Kara et al. A two-stage stochastic and robust programming approach to strategic planning of a reverse supply network: The case of paper recycling
US20140143018A1 (en) Predictive Modeling from Customer Interaction Analysis
US8538915B2 (en) Unified numerical and semantic analytics system for decision support
US10740679B1 (en) Analytics scripting systems and methods
Wagner et al. Intelligent techniques for forecasting multiple time series in real‐world systems
CN112418738B (en) Staff operation risk prediction method based on logistic regression
Napitu et al. Twitter opinion mining predicts broadband internet's customer churn rate
CN111652735A (en) Insurance product recommendation method based on user behavior label characteristics and commodity characteristics
Zhang Rational inattention in uncertain business cycles
JPH10124476A (en) Device for constructing hierarchical predicted model and method therefor
CN113780677A (en) Prediction method and device for potential power repeated appeal user
WO2023090292A1 (en) Information processing device, information processing method, and computer program
CN116308494A (en) Supply chain demand prediction method
Visser et al. Customer comfort limit utilisation: Management tool informing credit limit-setting strategy decisions to improve profitability
Josep Predicting customer behavior with Activation Loyalty per Period. From RFM to RFMAP
Montagno et al. Using neural networks for identifying organizational improvement strategies
Lindemann et al. Methodical data-driven integration of customer needs from social media into the product development process
US10860931B1 (en) Method and system for performing analysis using unstructured data
Manghnani et al. Customer churn prediction
Rodpysh Model to predict the behavior of customers churn at the industry
Wang et al. Estimation of global waste smartphones and embedded critical raw materials: An industry life cycle perspective
Gončarovs et al. Using data analytics for continuous improvement of CRM processes: case of financial institution
Priya V Implementing Lead Qualification Model Using ICP for Saas Products
KR102380750B1 (en) Method for predicting potential customer and system thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination