CN113780677A

CN113780677A - Prediction method and device for potential power repeated appeal user

Info

Publication number: CN113780677A
Application number: CN202111125304.7A
Authority: CN
Inventors: 陈薇; 李炳要; 黄令忠; 余梅梅; 刘晓薇; 侯玉; 张昱波; 成坤; 李涛; 许盖伦
Original assignee: Shenzhen Power Supply Bureau Co Ltd
Current assignee: Shenzhen Power Supply Bureau Co Ltd
Priority date: 2021-09-26
Filing date: 2021-09-26
Publication date: 2021-12-10

Abstract

The invention discloses a prediction method and a prediction device for potential power repeated appeal users, wherein the prediction method comprises the following steps: respectively acquiring user panel data, appeal data, processing condition data and historical data of a target power user to be predicted; cleaning and characterizing the user panel data, the appeal data, the processing condition data and the historical data, and storing the user panel data, the appeal data, the processing condition data and the historical data into a training set database; inputting data in a test set database into a pre-established prediction model for explaining repeated appeal indexes, and outputting a repeated appeal prediction value of the power consumer; step S4, determining whether the repeated appeal prediction value of the power consumer is greater than a preset probability threshold, and if so, determining that the power consumer is a potential power repeated appeal user. The method and the system can realize perception of potential repeated appeal users, further improve the management level, improve the working efficiency and improve the enterprise competitiveness.

Description

Prediction method and device for potential power repeated appeal user

Technical Field

The invention belongs to the technical field of power data application and analysis, and particularly relates to a method and a device for predicting potential power repeated appeal users.

Background

With the gradual arousal and continuous upgrading of the demands of novel energy consumers, the energy is no longer an indiscriminate daily product, the requirements of customers on power supply service are changed from 'power on' to 'power on', the expectations on the service of company customers are higher, and the pursuit of convenience, individuality, openness and sharing becomes the main characteristic of energy consumption. The service attribute of the energy product is amplified, and the continuous improvement of the customer satisfaction is more the development requirement of the company. Management and control of enterprise internal control risks are promoted through external client appeal management and control and analysis, service capacity is improved, and enterprise management effect can be further improved. The repeated appeal of the user not only wastes manpower and time, but also can generate negative emotion to cause complaint upgrading.

At present, research on repeated complaints or requirements is limited to extraction of text contents, whether the problem is the problem of the repeated complaints or not is judged, and therefore corresponding follow-up treatment measures are formulated to be managed and controlled, the problem of the repeated complaints is analyzed, and a rectification and modification scheme is formulated. The method still belongs to the category of post-modification, cannot realize the perception of potential repeated appeal users in advance, and cannot improve the management level and the working efficiency.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a method and a device for predicting potential power repeated appeal users, so as to dig links of user experience and perception weakness in advance, improve management level and improve working efficiency.

In order to solve the technical problem, the invention provides a method for predicting a potential power repeated appeal user, which comprises the following steps:

step S1, personal information and user panel data of electricity consumption habits of a target power user to be predicted are obtained, appeal data of appeal content of the target power user are represented, processing condition data of appeal processing condition of the target power user are represented, and historical data of historical electricity consumption condition and historical appeal condition of the target power user are represented respectively;

step S2, cleaning and characterizing the user panel data, appeal data, processing condition data and historical data of the target power user, and storing the user panel data, appeal data, processing condition data and historical data into a test set database;

step S3, inputting data in the test set database into a pre-established prediction model for explaining repeated appeal indexes, and outputting repeated appeal prediction values of the power users;

step S4, determining whether the repeated appeal prediction value of the power consumer is greater than a preset probability threshold, and if so, determining that the power consumer is a potential power repeated appeal user.

Further, the user panel data specifically includes attribute indexes 1 to 4: electricity consumption property, power supply area, sex of complainer and electricity consumption in last year; the appeal data specifically includes attribute indexes 5-19: internal routes, external routes, call duration, class of service, class of secondary services, class of tertiary services, first appeal month, time period, "not accepted", "affected", "as soon as possible", "complaining of mood", "severe", "again", "complaint"; the processing situation data specifically includes attribute indexes 19 to 27: seat job, seat skill, processing completion time (including subsequent processing), whether the subsequent processing is performed or not, subsequent feedback timeout, acceptors, filing time and distribution departments; the historical data specifically comprises attribute indexes 28-33: previous appeal amount, previous consultation amount, previous year service handling times, last year service handling number, last year appeal amount and last year consultation amount; the REPEAT appeal index 34 "REPEAT" is used as a target variable.

Further, the step S2 specifically includes: cleaning original data, performing characteristic processing on the data, including performing integer processing on text information, uniformly formatting identification on discrete data with clustering processing numerical values as one class, eliminating missing data, and storing the cleaned data in a test set database.

Further, the performing the integer processing on the text information specifically replaces the data actual value with an integer value, and the performing the clustering processing on the discrete data with a numerical value of one category specifically classifies the discrete data with a numerical value of one category into different discrete degrees with an integral number.

Further, the step S3 specifically includes: and inputting the data in the test set database into a pre-established prediction model, and applying an R language to call a prcomp () command to perform principal component analysis on the data or call a factanll () command to perform factor analysis on the data, so as to eliminate the inconspicuous variables in the training set index data set.

Further, the step S4 further includes: according to the repeated appeal probability interval, the potential repeated power appeal users are divided into multiple levels at preset intervals, and the higher the level is, the larger the repeated appeal probability is.

Further, the process of establishing the prediction model for interpreting the repeated appeal index specifically includes:

the method comprises the steps that personal information and power utilization habit user panel data used for representing a certain power user in historical data, appeal data used for representing appeal content of the power user, processing situation data used for representing appeal processing situation of the power user and historical data used for representing historical power utilization situation and historical appeal situation of the power user are obtained respectively;

cleaning and characterizing the user panel data, the appeal data, the processing condition data and the historical data, and storing the user panel data, the appeal data, the processing condition data and the historical data into a training set database;

and training the data in the training set database to obtain a prediction model for explaining repeated appeal indexes.

Further, after the data in the training set database is trained, a prediction model for explaining the repeated appeal index is obtained, which specifically includes:

constructing an observation matrix, and carrying out matrix processing on the collected variables: x ═ T (X1, X2, …, xn);

by means of principal component analysis or factor analysis, removing insignificant attribute indexes x1, x2, …, xn in a training set index data set, extracting m) items, and effectively reflecting interpretation variables f1, f2, … and fm of appeal data, wherein m and n respectively represent the number of the attribute indexes, m is far less than n, and xn is a repeated appeal index of a reflection training set of {0, 1 };

let y be x_nRepresenting the binary response variable with the value {0, 1}, the explanatory variables are F1, F2, …, fm, F ═ T (F1, F2, …, fm) implicit variable model construction is as follows:

y^*＝F^Tβ+ ε, where β is an mx 1 vector, ε represents a random interference term, let α represent an unknown threshold parameter, define:

if ε obeys a logistic distribution

Obtaining y-condition distribution of given f, and calculating each response probability by using a logistic model:

namely, the power repeated appeal user prediction model is judged.

Further, the step S3 outputs the predicted value of the repeated appeal of the power consumer by the following formula:

wherein β is an mx 1 vector and α represents an unknown threshold parameter; p ═ P { y ═ 1| FT ═ (f1, f2, …, fm) } represents the probability of being judged as y ═ 1 under the evaluation indexes f1, f2, …, fm, as the repeat demand prediction value of the power consumer.

The invention also provides a prediction device for the potential power repeated appeal user, which comprises the following steps:

the data acquisition unit is used for respectively acquiring personal information and user panel data of electricity utilization habits of a target power user to be predicted, appeal data for characterizing appeal content of the target power user, processing condition data for characterizing appeal processing condition of the target power user and historical data for characterizing historical electricity utilization condition and historical appeal condition of the target power user;

the data processing unit is used for cleaning and characterizing the user panel data, the appeal data, the processing condition data and the historical data of the target power user and storing the user panel data, the appeal data, the processing condition data and the historical data into a test set database;

the calculation unit is used for inputting data in the test set database into a pre-established prediction model for explaining repeated appeal indexes and outputting repeated appeal prediction values of the power users;

the prediction unit is used for judging whether the repeated appeal prediction value of the power user is larger than a preset probability threshold value or not, and if yes, the power user is determined to be a potential power repeated appeal user.

Further, the data processing unit is specifically configured to clean the original data, perform characterization processing on the data, including performing integer processing on text information, uniformly formatting the identification on the discrete data with a clustering processing value as one type, removing missing data, and storing the cleaned data in a test set database.

Further, the computing unit is specifically configured to input data in the test set database into a pre-established prediction model, apply the R language to call a prcomp () command to perform principal component analysis on the data or call a factnanl () command to perform factor analysis on the data, and eliminate an insignificant variable in a training set index data set.

Further, the prediction unit is further configured to divide the potential repeated power appeal users into multiple levels at preset intervals according to the repeated appeal probability interval, wherein a higher level indicates a higher repeated appeal probability.

The implementation of the invention has the following beneficial effects: by combining the actual user appeal condition, the probability of repeated appeal of the power user is calculated according to an R language compiling model algorithm, potential repeated appeal users are found, links of user experience and perception weakness are mined in advance, the communication quality is improved for differentiated management of the users, the management level is further improved, the working efficiency is improved, and the enterprise competitiveness is improved; the method changes the post-improvement into the pre-perception, accurately identifies the user upgrade complaints, the client problems and the service risks, practically promotes the user appeal to be effectively and properly solved, and continuously meets the ever-increasing power utilization requirements of the masses.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a flowchart illustrating a method for predicting a potential power repetitive appeal user according to an embodiment of the present invention.

FIG. 2 is a schematic diagram of a flow chart of constructing a prediction model according to an embodiment of the present invention.

Fig. 3 is a schematic diagram of a specific application flow of the prediction method for the potential power repetitive appeal user according to the embodiment of the present invention.

Detailed Description

The following description of the embodiments refers to the accompanying drawings, which are included to illustrate specific embodiments in which the invention may be practiced.

The method comprises the steps of screening data indexes of power appeal users according to the principles of purpose, feasibility, focus, typicality and scientificity, obtaining training set attribute indexes, forming a potential power repeated appeal user identification model by applying R language through corresponding indexes, obtaining model input data corresponding to the power appeal users to be identified, and calculating the repeated appeal probability of the model input data. The perception of potential repeated appeal users is achieved, and the difference, pertinence, accuracy and effectiveness of customer service work are further improved.

Thus, referring to fig. 1, an embodiment of the present invention provides a method for predicting a potential power repeat appeal user, including:

Specifically, in step S1 of this embodiment, the personal information and the electricity usage habits of the electricity consumers are captured from the marketing management system by using SQL statements, and a first type of "user panel data (X1)" is obtained; capturing power user appeal content from a client problem system to obtain second-type appeal data (X2); capturing power user appeal processing conditions from a client problem system to obtain a third type of processing condition data (X3); and capturing the historical electricity utilization condition and the historical appeal condition of the power consumer from the marketing management system to obtain a fourth type of historical data (X4).

As an example, the four types of data include 34 attribute indexes (actual operations may not be limited to the 34 attribute indexes), including: user panel data (attribute indexes 1-4: electricity consumption property, power supply area, complainer gender and last year electricity consumption); appeal data (attribute indices 5-19: inside route, outside route, call duration, class of service, class of second class of service, class of third class of service, first appeal month, time period, "not accepted", "affected", "as soon as possible", "complaint mood", "severe", "again", "complaint"); processing condition data (attribute indexes 19-27: seat post, seat skill, processing completion time (including subsequent processing), whether subsequent processing is performed or not, subsequent feedback timeout, acceptors, filing time and distribution departments); historical data (attribute indexes 28-33: previous year appeal amount, previous consultation amount, previous year service handling times, last year service handling number, last year appeal amount and last year consultation amount); because the power consumption property and the power supply area are the same, the data of the user panel is basically the same when the sex of the complainer is the same, the historical appeal and the power consumption condition are introduced for judgment, and the attribute index 34 'REPEAT' is used as a target variable, namely a repeated appeal index.

Cleaning original data, performing characteristic processing on the data, converting the data into characteristic indexes for model training, for example, performing integral processing on text information, uniformly formatting identification on discrete data with clustering processing numerical values as one class, eliminating missing data, and storing the cleaned data in a test set database. Specifically, the integer processing is to replace the data actual value with an integer value: for example, the data of the attribute index 1 "electricity property" is converted into integer values 1 to 7 according to the classification of residential life, general industrial and commercial and other, large industrial electricity, commercial, other electricity, non-residential and general industry, and the attribute index 21 "seat job" is converted into integer values 1 to 5 according to high-class seat, general, class, quality control seat and other. In the same way, the relevant attribute indexes such as the power supply area, the gender of the client, the internal path, the service category and the like are processed into integers, and when data is missing, the data is removed. Categorizing discrete data whose values are a class: for example, the accurate data of the attribute index 7 "call duration" may be classified into 12 different discrete degrees of a whole number, which are respectively 0 minute, 0-2 minutes, 2-5 minutes, 5-8 minutes, 8-10 minutes, 10-12 minutes, 12-15 minutes, 15-20 minutes, 20-25 minutes, 25-30 minutes, and 30 minutes or more, and the contents of the data are replaced by the same integer value, which may be converted into integer values of 0, 2, 5, 8, 10, 12, 15, 20, 25, 30, 31, and 32; the attribute index 12 "appeal period" can classify 5 different discrete degrees of which the number is one, and the discrete degrees are respectively 6: 00-12: 00. 12: 00-14: 00. 14: 00-18: 00. 18: 00-22: 00. 22: 00-6: 00, replacing the content with the same integer value, and converting the content into an integer value of 1-5; similarly, other similar attribute indexes such as event processing completion time, filing time, last year electricity consumption and the like are classified and preprocessed according to unified standards under the condition of fully considering user types.

The above data cleansing and processing is important to eliminate insignificant variables in a data set under actual business conditions. First, a large number of variables are generally not handled and interpreted in a very reasonable manner; and secondly, some data in the 33 variables and the classification variables, such as position jobs and position skills, secondary service subclasses and tertiary service subclasses, whether subsequent processing and subsequent feedback timeout and other indexes are mutually dependent. Therefore, the present embodiment only retains some more important information contained therein to reduce the amount of analysis.

Step S2 further determines all the appeal situations to obtain a REPEAT appeal index "REPEAT", processes the index data 0-1, and stores the data in the test set database.

Step S3 inputs the above related data into a pre-established prediction model, and applies the R language to call the prcomp () command to analyze its Principal Component (Principal Component Analysis) or call the factnanl () command to perform factor Analysis (fan Analysis), so as to eliminate the insignificant variables in the test set index data set. And inputting the generated related principal component Fj into a pre-established prediction model, calling a glm () command by applying an R language, constructing and using the Fj as an interpretation variable to carry out logistic regression analysis, and interpreting a repeated appeal index 'REPEAT' (namely a target variable).

Please refer to fig. 2, the process of constructing the prediction model is as follows:

(1) similar to step S1, user panel data for representing personal information and electricity usage habits of each power user, appeal data for representing appeal content of each power user, processing condition data for representing appeal processing conditions of each power user, and historical data for representing historical electricity usage conditions and historical appeal conditions of each power user are respectively obtained; and then, similarly cleaning data, and constructing a new training set database after eliminating invalid data.

(2) According to the basic idea of the principal component analysis method, the index system represented by the original variable is reduced to the index system represented by the principal component. First, before principal component analysis is utilized, the entire data set is extended so that their differences are unity. Therefore, the variance of the raw data of the chosen variables is a reasonable interpretation. Then, the variance of the components was obtained as a result.

The following are the outputs at R

#Input the original data set

ticdata＝read.table(″″)

testing＝read.table(″″)

testing＝as.matrix(testing)

#Using principal component analysis to select sig nificant variables

tic＝as.matrix(ticdata[，1:33])

pca＝prcomp(tic，scale.＝T)

summary(pca)

The principle of principal component-based logistic model and estimation is as follows:

firstly, constructing an observation matrix, and carrying out matrix processing on collected variables: x ═ T (X1, X2, …, xn);

wherein n is the number of attribute indexes; removing the insignificant attribute indexes x1, x2, … and xn in the training set index data set through Principal Component Analysis (Principal Component Analysis), extracting m (far less than n) items, which can effectively reflect the interpretation variables f1, f2, … and fm of the appeal data, wherein m and n respectively represent the number of the attribute indexes, and xn is a {0, 1} reflecting training set repeated appeal index.

The main components are analyzed to obtain:

has the following characteristics:

(1) fi and fj are independent of each other, i.e., Cov (fi, fj) ═ 0;

(2) f1 is the one with the largest variance among all linear combinations of x1, x2,., (the coefficients satisfy the above requirements), and (b), (c), i.e., fm is the one with the largest variance among all linear combinations of x1, x2,. and xn, which are not related to f1, f2,. and fm-1.

Let y be x_nRepresenting binary response variables with values {0, 1}, the explanatory variables are F1, F2 …, fm, F ═ F1, F2, …, fm) T hidden variable models are constructed as follows:

if ε obeys a logistic distribution

The y-condition distribution of a given f can be obtained, and each response probability can be calculated by using a logistic model:

namely, the prediction model is used for judging the repeated appeal user.

(3) Method for constructing component weight

From the results of the principal component analysis, it can be seen that several principal components explain 90% of the varying data. The following are the outputs at R:

rot＝pca$r

x＝pca$x

loading＝rot[，1：23]

loading

plot(pca)

pcadata＝tic％*％oading

(4) method for constructing logistic model algorithm based on principal component

Using the matrix multiplication of the test data for the load capacity of these several major components, new variables were obtained that could be used to conform to the Logistic model, and Logistic regression analysis was performed using Fj as the explanatory variable, the output at R is as follows:

#Construct logistic model and predict the testing data

logit＝glm(ticdata[，34]～as.matrix(pcadata)，family＝binomial)

summary(ogit)

the logistic model for obtaining the main components is as follows:

ε+F^Tβ

＝0.00786F1-0.26419F2+0.32771F3+0.83742F4+0.07617F5+0.12871F6-0.28977F7+0.664F8+0.02593F9-0.10943F10+0.1025F11-0.41541F12-0.54102F13-0.16840F14-013043F15-0.2424F16-0.27553F17-0.01689F18+0.10497F19+0.31261F20+0.14404F21+0.05996F22-0.2297F23。

test set testing was introduced, by the following equation:

deriving a probabilistic predictor (i.e., p) for "REPEAT", where β is an mx 1 vector and α represents an unknown threshold parameter; p ═ P { y ═ 1/FT ═ (f1, f2, …, fm) } represents the probability of being judged as y ═ 1 under the interpretation variables f1, f2, …, fm, as the repeat appeal prediction value of the power consumer.

The following are the outputs at R:

pcatesting＝testing％^*％loading

c0＝logit$coefficients[1]

c0＝rep(c0，1000)

c＝logit$coefficients[2：24]

gtesting＝as.matrix(pcatesting)％*％as.matrix(c)

ytesting＝gtesting+c0

ptesting＝exp(ytesting)/(1+exp(ytesting))

ptesting

it can be understood that if the factors such as the attribute field, the data time, the number of users in the training set are different, the calculated weights of the generated predictive model formula responses are different, but the predictive model formula responses are based on the factor analysis or the principal component analysis, and the construction method is the same.

Referring to fig. 3 again, in a specific application example of the present application, a user first calls for a problem of a hotline appeal of a service, according to the method of the present embodiment, user identification is performed through a power utilization address/label/user number, and then user panel data, appeal data, handling condition data, and history data of the user are obtained; inputting the predicted value into a pre-established prediction model to obtain a repeated appeal predicted value of the user, comparing the repeated appeal predicted value with a preset probability threshold, and if the predicted value is larger than the probability threshold, determining that the power user is a potential power repeated appeal user. After the potential power repeated appeal users are obtained, the client service personnel can respond to the client appeal in a differentiated mode according to the repeated appeal user management and control measures and schemes, so that appeal upgrading is avoided, the client appeal content is dug deeply, risk assessment is conducted on the client appeal content, and the internal management and control level of an enterprise is enhanced.

According to the repeated appeal condition of the training set and the practicability of p, the potential repeated power appeal users can be divided into 5 grades at intervals of 0.2 according to the repeated appeal probability interval [0-1], namely: a1[ 0-0.2%), A2[ 0.2-0.4%), A3[ 0.4-0.6%), A4[ 0.6-0.8%), A5[0.8-1], higher grades indicate greater probability of repeat claims. And analyzing the assumed threshold value when p is larger than 0.5, and calculating the test set to obtain 3 potential repeated appeal clients, wherein the 3 potential repeated appeal clients are actually shown to be the repeated appeal clients, and the model has better prediction condition. Considering that the attribute indexes in the embodiment have high correlation and the numerical values have missing conditions, the effective indexes can be increased to further train the model, and a more accurate prediction result is obtained.

Corresponding to the embodiment of the invention, a method for predicting a potential power repeated appeal user is also provided, and a device for predicting a potential power repeated appeal user is provided, and the method comprises the following steps:

For the working principle and process of the present embodiment, please refer to the description of the first embodiment of the present invention, which is not repeated herein.

As can be seen from the above description, the present invention provides the following advantageous effects: by combining the actual user appeal condition, the probability of repeated appeal of the power user is calculated according to an R language compiling model algorithm, potential repeated appeal users are found, links of user experience and perception weakness are mined in advance, the communication quality is improved for differentiated management of the users, the management level is further improved, the working efficiency is improved, and the enterprise competitiveness is improved; the method changes the post-improvement into the pre-perception, accurately identifies the user upgrade complaints, the client problems and the service risks, practically promotes the user appeal to be effectively and properly solved, and continuously meets the ever-increasing power utilization requirements of the masses.

The above disclosure is only for the purpose of illustrating the preferred embodiments of the present invention, and it is therefore to be understood that the invention is not limited by the scope of the appended claims.

Claims

1. A method for predicting potential power repetitive appeal users is characterized by comprising the following steps:

2. The prediction method according to claim 1, wherein the user panel data specifically comprises attribute metrics 1-4: electricity consumption property, power supply area, sex of complainer and electricity consumption in last year; the appeal data specifically includes attribute indexes 5-19: internal routes, external routes, call duration, class of service, class of secondary services, class of tertiary services, first appeal month, time period, "not accepted", "affected", "as soon as possible", "complaining of mood", "severe", "again", "complaint"; the processing situation data specifically includes attribute indexes 19 to 27: seat job, seat skill, processing completion time (including subsequent processing), whether the subsequent processing is performed or not, subsequent feedback timeout, acceptors, filing time and distribution departments; the historical data specifically comprises attribute indexes 28-33: previous appeal amount, previous consultation amount, previous year service handling times, last year service handling number, last year appeal amount and last year consultation amount; the REPEAT appeal index 34 "REPEAT" is used as a target variable.

3. The prediction method according to claim 1, wherein the step S2 specifically includes: cleaning original data, performing characteristic processing on the data, including performing integer processing on text information, uniformly formatting identification on discrete data with clustering processing numerical values as one class, eliminating missing data, and storing the cleaned data in a test set database.

4. The prediction method according to claim 3, wherein the performing of the integer processing on the text information is to replace an actual data value with an integer value, and the performing of the clustering on the discrete data of one category is to classify the discrete data of one category into different discrete degrees of a same number.

5. The prediction method according to claim 1, wherein the step S3 specifically includes: and inputting the data in the test set database into a pre-established prediction model, and applying an R language to call a prcomp () command to perform principal component analysis on the data or call a factanll () command to perform factor analysis on the data, so as to eliminate the inconspicuous variables in the training set index data set.

6. The prediction method according to claim 1, wherein the step S4 further comprises: according to the repeated appeal probability interval, the potential repeated power appeal users are divided into multiple levels at preset intervals, and the higher the level is, the larger the repeated appeal probability is.

7. The prediction method of claim 1, wherein the process of building the prediction model for interpreting the repetitive desirability indicators comprises:

8. The prediction method according to claim 7, wherein the training of the data in the training set database to obtain the prediction model for interpreting the repetitive complaint indicators comprises:

if ε obeys a logistic distribution

namely, the power repeated appeal user prediction model is judged.

9. The prediction method according to claim 8, wherein the step S3 is configured to output the predicted value of the repeated appeal of the power consumer according to the following formula:

10. A device for predicting potential power repeat appeal users, comprising:

11. The prediction device according to claim 10, wherein the data processing unit is specifically configured to clean raw data, perform characterization processing on the data, and include performing integer processing on text information, uniformly formatting the identification on discrete data with a clustering processing value as one class, removing missing data, and storing the cleaned data in a test set database.

12. The prediction apparatus according to claim 11, wherein the performing of the integer processing on the text information is to replace a data actual value with an integer value, and the performing of the clustering on the discrete data of one category is to classify the discrete data of one category into different discrete degrees of a single number.

13. The prediction apparatus as claimed in claim 10, wherein the computing unit is specifically configured to input data in the test set database into a pre-established prediction model, and apply the R language to call prcomp () command to analyze its principal component or call factnanl () command to factor it, so as to eliminate the insignificant variables in the training set index data set.

14. The prediction device of claim 10, wherein the prediction unit is further configured to classify the potential repetitive power appeal users into multiple levels at preset intervals according to a repetitive appeal probability interval, and a higher level indicates a higher repetitive appeal probability.