CN116050631A

CN116050631A - User loss prediction method, device, equipment and storage medium

Info

Publication number: CN116050631A
Application number: CN202310082488.6A
Authority: CN
Inventors: 王昌雄; 唐昊; 白云龙
Original assignee: Guangzhou Maywide Technology Co ltd
Current assignee: Guangzhou Maywide Technology Co ltd
Priority date: 2023-01-18
Filing date: 2023-01-18
Publication date: 2023-05-02

Abstract

The embodiment of the application provides a user loss prediction method, device, equipment and storage medium, and belongs to the technical field of artificial intelligence. The method comprises the following steps: acquiring target broadcast television data and a plurality of sample broadcast television data of a target user; inputting the sample broadcast and television data into a plurality of trained prediction models, and iteratively updating the weight value of each prediction model to determine the target weight value of the prediction model, wherein the model types of any two prediction models are different; inputting the target broadcast television data into a plurality of prediction models to respectively obtain target loss probability predicted by each prediction model; determining total loss probability according to the target loss probability and the target weight value; and determining a loss prediction result of the target user according to the total loss probability. The embodiment of the application can improve the prediction precision of the user loss.

Description

User loss prediction method, device, equipment and storage medium

Technical Field

The present disclosure relates to the field of, but not limited to, artificial intelligence, and in particular, to a method, apparatus, device, and storage medium for predicting user loss.

Background

With the development of a broadcast television network, main services of broadcast television network operators facing the public are live television, video on demand and broadband internet surfing, and in the service processing process, various broadcast television data can be generated; predicting whether a user using a broadcast television network is a potential pre-churn user is of great importance to the operator.

At present, a single type of prediction model is generally used for user loss prediction, but the single type of prediction model has limitations, so that the prediction accuracy is low because all the broadcast and television data cannot be effectively analyzed, and in addition, when different types of prediction models are used for prediction, the prediction results of the different types of prediction models are different, so that the accurate prediction results cannot be determined, and the prediction accuracy is low.

Disclosure of Invention

The following is a summary of the subject matter described in detail herein. This summary is not intended to limit the scope of the claims.

The embodiment of the application provides a user loss prediction method, device, equipment and storage medium, which can improve the prediction precision of user loss.

To achieve the above objective, a first aspect of an embodiment of the present application provides a user churn prediction method, including: acquiring target broadcast television data and a plurality of sample broadcast television data of a target user; inputting the sample broadcast and television data into a plurality of trained prediction models, and iteratively updating the weight value of each prediction model to determine the target weight value of the prediction model, wherein the model types of any two prediction models are different; inputting the target broadcast television data into a plurality of prediction models to respectively obtain target loss probability predicted by each prediction model; determining total loss probability according to the target loss probability and the target weight value; and determining a loss prediction result of the target user according to the total loss probability.

In some embodiments, the model class of the predictive model includes at least: a logistic regression model, a random forest model and a gradient lifting decision tree model; inputting the target broadcast television data into a plurality of prediction models to respectively obtain target loss probabilities predicted by the prediction models, wherein the target loss probabilities comprise: inputting the target broadcast television data into the logistic regression model to obtain the target loss probability predicted by the logistic regression model; inputting the target broadcast television data into the random forest model to obtain target loss probability predicted by the logistic regression model; and inputting the target broadcast television data into the gradient lifting decision tree model to obtain the target loss probability predicted by the logistic regression model.

In some embodiments, the inputting the sample broadcast and television data into a plurality of trained prediction models, iteratively updating the weight value of each prediction model, and determining the target weight value of the prediction model includes: acquiring total iteration times, weight values of all the prediction models and loss labels of the sample broadcast and television data; dividing the plurality of sample broadcast television data into a plurality of sample data sets based on the total iteration times, wherein the number of the sample data sets is the same as the total iteration times; sequentially inputting the sample broadcast and television data in the sample data set into a plurality of trained prediction models aiming at any sample data set to respectively obtain sample loss probability predicted by each prediction model; determining a sample loss result of the sample broadcast and television data according to the sample loss probability; determining the prediction accuracy of each prediction model according to each sample loss result and the corresponding loss label; according to the prediction accuracy of all the prediction models, iteratively updating the weight value of each prediction model until each sample data set is traversed; and determining the update times of the weight values, and taking the current weight value as a target weight value of the prediction model when the update times of the weight values reach the total iteration times.

In some embodiments, the iteratively updating the weight value of each prediction model according to the prediction accuracy of all the prediction models includes: calculating the sum of the prediction accuracy of all the prediction models to obtain the overall accuracy; calculating the quotient of the prediction accuracy of the prediction model and the overall accuracy aiming at any prediction model to obtain model accuracy; determining the number of models of the prediction model; multiplying the model accuracy, the weight value of the prediction model and the model quantity, and updating the weight value of the prediction model according to the multiplication result.

In some embodiments, the target radio and television data includes attribute data and interaction data, the attribute data including at least: data class data, payment class data, bill class data and product class data, the interaction data at least comprises: the set top box interaction data, the customer service interaction data and the intelligent gateway interaction data; inputting the target broadcast television data into a plurality of prediction models to respectively obtain target loss probabilities predicted by the prediction models, wherein the target loss probabilities comprise: based on a preset feature combination strategy, carrying out combination processing on the attribute data and the interaction data to obtain a radio and television data combination; performing feature extraction processing on the broadcast and television data combination to obtain broadcast and television feature data; and inputting the broadcast and television characteristic data into a plurality of prediction models to respectively obtain target loss probabilities predicted by the prediction models.

In some embodiments, after the step of determining the attrition prediction result for the target user based on the total attrition probability, the method further comprises: judging whether the target user is a pre-loss user or not according to the loss prediction result of the target user; when the target user is the pre-loss user, determining first pushing information according to the attribute data and a preset attribute saving strategy, and determining second pushing information according to the interaction data and a preset interaction saving strategy; and pushing the first pushing information and the second pushing information to the pre-churn user.

In some embodiments, after the step of obtaining the target broadcast television data and the plurality of sample broadcast television data of the target user, the method further comprises: performing anomaly detection processing on the target broadcast television data to determine first anomaly data; removing the first abnormal data from the target broadcast television data, and updating the target broadcast television data; performing anomaly detection processing on the sample broadcast and television data to determine second anomaly data; and eliminating the second abnormal data from the sample broadcast television data, and updating the sample broadcast television data.

To achieve the above object, a second aspect of the embodiments of the present application provides a user loss prediction apparatus, including: the acquisition unit is used for acquiring target broadcast television data and a plurality of sample broadcast television data of a target user; the updating unit is used for inputting the sample broadcast and television data into a plurality of trained prediction models, carrying out iterative updating on the weight value of each prediction model, and determining the target weight value of the prediction model, wherein the model types of any two prediction models are different; the prediction unit is used for inputting the target broadcast television data into a plurality of prediction models to respectively obtain target loss probabilities predicted by the prediction models; the calculation unit is used for determining total loss probability according to the target loss probability and the target weight value; and the determining unit is used for determining the loss prediction result of the target user according to the total loss probability.

To achieve the above object, a third aspect of the embodiments of the present application provides an electronic device, where the electronic device includes a memory and a processor, where the memory stores a computer program, and the processor implements the user loss prediction method described in the first aspect when executing the computer program.

To achieve the above object, a fourth aspect of the embodiments of the present application proposes a storage medium, which is a computer-readable storage medium storing a computer program, which when executed by a processor implements the user churn prediction method described in the first aspect.

The method, device, equipment and storage medium for predicting user loss provided by the application comprise the following steps: acquiring target broadcast television data and a plurality of sample broadcast television data of a target user; inputting the sample broadcast and television data into a plurality of trained prediction models, and iteratively updating the weight value of each prediction model to determine the target weight value of the prediction model, wherein the model types of any two prediction models are different; inputting the target broadcast television data into a plurality of prediction models to respectively obtain target loss probability predicted by each prediction model; determining total loss probability according to the target loss probability and the target weight value; and determining a loss prediction result of the target user according to the total loss probability. According to the scheme provided by the embodiment of the application, the sample broadcast television data are input into different prediction models, the weight values of the prediction models are subjected to iterative updating according to the model output results, so that the target weight values of the prediction models are determined, in the prediction process, the target broadcast television data of a target user are input into different prediction models, the target loss probability is determined through the model output results, then the total loss probability is calculated by combining the target weight values, further the loss prediction results of the target user are determined, the target loss probability of each prediction model is integrated according to the target weight values of each prediction model, and prediction is carried out through adopting a plurality of prediction models, so that the prediction accuracy of the loss prediction results is high, and the prediction accuracy of the user loss is improved.

Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the application. The objectives and other advantages of the application will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

The accompanying drawings are included to provide a further understanding of the technical aspects of the present application, and are incorporated in and constitute a part of this specification, illustrate the technical aspects of the present application and together with the examples of the present application, and not constitute a limitation of the technical aspects of the present application.

FIG. 1 is a flow chart of a user churn prediction method provided in one embodiment of the present application;

FIG. 2 is a flow chart of a method of determining a target attrition probability provided in another embodiment of the present application;

FIG. 3 is a flow chart of another method of determining a target attrition probability provided in another embodiment of the present application;

FIG. 4 is a flow chart of a method of updating weight values provided in another embodiment of the present application;

FIG. 5 is a flow chart of a particular method of determining a target attrition probability provided in another embodiment of the present application;

FIG. 6 is a flow chart of a method of determining push information provided in another embodiment of the present application;

FIG. 7 is a flowchart of a method for rejecting exception data according to another embodiment of the present application;

fig. 8 is a schematic structural diagram of a user loss prediction apparatus according to another embodiment of the present application;

fig. 9 is a schematic hardware structure of an electronic device according to another embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

In the description of the present application, the meaning of a number is one or more, the meaning of a number is two or more, and greater than, less than, exceeding, etc. are understood to exclude the present number, and the meaning of above, below, within, etc. are understood to include the present number.

It should be noted that although functional block division is performed in a device diagram and a logic sequence is shown in a flowchart, in some cases, the steps shown or described may be performed in a different order than the block division in the device, or in the flowchart. The terms first, second and the like in the description, in the claims and in the above-described figures, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order.

Aiming at the problem of low prediction precision, the application provides a method, a device, equipment and a storage medium for predicting user loss, wherein the method comprises the following steps: acquiring target broadcast television data and a plurality of sample broadcast television data of a target user; inputting sample broadcast and television data into a plurality of trained prediction models, and carrying out iterative updating on the weight value of each prediction model to determine the target weight value of the prediction model, wherein the model types of any two prediction models are different; inputting target broadcast and television data into a plurality of prediction models to respectively obtain target loss probability predicted by each prediction model; determining total loss probability according to the target loss probability and the target weight value; and determining a loss prediction result of the target user according to the total loss probability. According to the scheme provided by the embodiment of the application, the sample broadcast television data are input into different prediction models, the weight values of the prediction models are subjected to iterative updating according to the model output results, so that the target weight values of the prediction models are determined, in the prediction process, the target broadcast television data of a target user are input into different prediction models, the target loss probability is determined through the model output results, then the total loss probability is calculated by combining the target weight values, further the loss prediction results of the target user are determined, the target loss probability of each prediction model is integrated according to the target weight values of each prediction model, and prediction is carried out through adopting a plurality of prediction models, so that the prediction accuracy of the loss prediction results is high, and the prediction accuracy of the user loss is improved.

The method, device, equipment and storage medium for predicting user loss provided in the embodiments of the present application are specifically described through the following embodiments, and the method for predicting user loss in the embodiments of the present application is described first.

The embodiment of the application provides a user loss prediction method, and relates to the technical field of artificial intelligence. The user loss prediction method provided by the embodiment of the application can be applied to a terminal, a server side and software running in the terminal or the server side. In some embodiments, the terminal may be a smart phone, tablet, notebook, desktop, etc.; the server side can be configured as an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, and a cloud server for providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, basic cloud computing services such as big data and artificial intelligent platforms and the like; the software may be an application or the like that implements the user churn prediction method, but is not limited to the above form.

The subject application is operational with numerous general purpose or special purpose computer system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

In the embodiments of the present application, when related processing is required according to user information, user behavior data, user history data, user location information, and other data related to user identity or characteristics, permission or consent of the user is obtained first, and the collection, use, processing, and the like of the data comply with related laws and regulations and standards of related countries and regions. In addition, when the embodiment of the application needs to acquire the sensitive personal information of the user, the independent permission or independent consent of the user is acquired through a popup window or a jump to a confirmation page or the like, and after the independent permission or independent consent of the user is explicitly acquired, necessary user related data for enabling the embodiment of the application to normally operate is acquired.

Embodiments of the present application are further described below with reference to the accompanying drawings.

As shown in fig. 1, fig. 1 is a flowchart of a user churn prediction method according to an embodiment of the present application. The user churn prediction method includes, but is not limited to, the following steps:

step S110, obtaining target broadcast television data and a plurality of sample broadcast television data of a target user;

step S120, inputting sample broadcast television data into a plurality of trained prediction models, and carrying out iterative updating on the weight value of each prediction model to determine a target weight value of the prediction model, wherein model types of any two prediction models are different;

step S130, inputting target broadcast television data into a plurality of prediction models to respectively obtain target loss probabilities predicted by the prediction models;

step S140, determining total loss probability according to the target loss probability and the target weight value;

and step S150, determining a loss prediction result of the target user according to the total loss probability.

It can be understood that the sample broadcast television data refers to historical broadcast television data, so that the actual loss condition of a user related to the historical broadcast television data can be known, and the target weight value can be accurately determined in the iterative updating process of the weight value in combination with the actual loss condition, and the advantages of each prediction model can be brought into play by adjusting the weight value of the prediction model, so that the defects of the prediction model are reduced, and the overall prediction result is more accurate; based on the method, sample broadcast television data are input into different prediction models, the weight values of the prediction models are iteratively updated according to model output results, so that target weight values of the prediction models are determined, in the prediction process, target broadcast television data of a target user are input into different prediction models, the target loss probability is determined through model output results, then the total loss probability is calculated by combining the target weight values, further loss prediction results of the target user are determined, the target loss probability of each prediction model is integrated according to the target weight values of each prediction model, prediction is carried out by adopting multiple prediction models, and multiple types of broadcast television data can be effectively analyzed, so that the prediction accuracy of the loss prediction results is high, and the prediction accuracy of user loss is improved.

It should be noted that determining the total loss probability according to the target loss probability and the target weight value means that the target loss probability of the prediction model is weighted by the target weight value of the prediction model, and then the target loss probabilities of the prediction models are added to obtain the total loss probability; determining a loss prediction result of the target user according to the total loss probability means that the total loss probability is compared with a preset probability threshold, and when the total loss probability is greater than or equal to the probability threshold, the loss prediction result represents that the target user will be lost, and the target user is a pre-loss user; when the total loss probability is smaller than the probability threshold, the loss prediction result indicates that the target user cannot be lost.

It is noted that the prediction model can be obtained by training with historical broadcast and television data.

In addition, referring to fig. 2, in an embodiment, model types of the prediction model include at least: a logistic regression model, a random forest model and a gradient lifting decision tree model; step S130 in the embodiment shown in fig. 1 includes, but is not limited to, the following steps:

step S210, inputting target broadcast television data into a logistic regression model to obtain target loss probability predicted by the logistic regression model;

Step S220, inputting target broadcast television data into a random forest model to obtain target loss probability predicted by a logistic regression model;

and step S230, inputting the target broadcast television data into a gradient lifting decision tree model to obtain the target loss probability predicted by the logistic regression model.

It will be appreciated that different predictive models have different advantages and disadvantages, and that by integrating the various types of predictive models, the accuracy of the predictions can be improved.

It should be noted that for the logistic regression model, advantages of the logistic regression model include, but are not limited to: 1. can be applied to continuity and category arguments; 2. easy to use and interpret; 3. the robustness to small data noise is good, and the influence of slight multiple collinearity can not be received. Drawbacks of logistic regression models include, but are not limited to: 1. being more sensitive to multiple collinearity of the independent variables in the model, for example, two highly correlated independent variables being placed into the model at the same time, may result in a weaker one independent variable regression symbol not being expected, and the symbol being twisted. Representative independent variables are selected by means of factor analysis or variable clustering analysis and the like so as to reduce the correlation among candidate variables; 2. the prediction result is in an S shape, so that the process of converting the log (odds) into the probability is nonlinear, the probability changes little along with the change of the log (odds) value at two ends, the marginal value is too small, the slope is too small, and the change of the middle probability is large and sensitive. The influence of the variable change in many intervals on the target probability is not distinguished, and the threshold cannot be determined.

It should be noted that, for the random forest model, advantages of the random forest model include, but are not limited to: 1. after training, the importance degree of the features can be determined; 2. for unbalanced data sets, errors can be balanced; 3. accuracy can be maintained if a significant portion of the features are missing. Drawbacks of random forest models include, but are not limited to: 1. random forest models have been demonstrated to be overdose on some noisy classification or regression problems; 2. for data with different valued attributes, more valued division of the attributes can have a greater influence on the random forest model, so that the attribute weights produced by the random forest model on the data are not credible.

It should be noted that, for the gradient-lifted decision tree model, advantages of the gradient-lifted decision tree model include, but are not limited to: 1. various types of data can be flexibly processed, including continuous values and discrete values; 2. compared with the SVM, the prediction preparation rate can be higher under the condition of relatively less parameter adjustment time; 3. using some robust loss functions, the robustness to outliers is very strong, such as the Huber loss function and the Quantile loss function. Drawbacks of gradient-lifting decision tree models include, but are not limited to: 1. for data with inconsistent sample numbers of each category, the result of the information gain is biased to those features with more values in the decision tree; 2. overfitting problems occur.

In addition, referring to fig. 3, in an embodiment, step S120 in the embodiment shown in fig. 1 includes, but is not limited to, the following steps:

step S310, acquiring total iteration times, weight values of all prediction models and loss labels of sample broadcast and television data;

step S320, dividing the plurality of sample broadcast television data into a plurality of sample data sets based on the total iteration times, wherein the number of the sample data sets is the same as the total iteration times;

step S330, for any sample data set, sequentially inputting sample broadcast and television data in the sample data set into a plurality of trained prediction models to respectively obtain sample loss probabilities predicted by the prediction models;

step S340, determining a sample loss result of the sample broadcast and television data according to the sample loss probability;

step S350, determining the prediction accuracy of each prediction model according to each sample loss result and the corresponding loss label;

step S360, iteratively updating the weight value of each prediction model according to the prediction accuracy of all the prediction models until each sample data set is traversed;

step S370, determining the update times of the weight values, and taking the current weight value as the target weight value of the prediction model when the update times of the weight values reach the total iteration times.

It can be understood that, in order to obtain an accurate target weight value, the weight value needs to be iteratively updated, the total iteration number is set by a user or preset by a system, one sample data set is needed to be used when each iteration is updated, the updated weight value can be transferred to the next iteration update process, and the finally obtained target weight value can be accurately adjusted.

It should be noted that, the loss label is used for representing the actual loss situation of the user, when the sample loss result is the same as the loss label situation, it is indicated that the sample broadcast television data is correctly predicted, otherwise, the sample broadcast television data is incorrectly predicted, in the process of one iteration update, for a prediction model, the prediction accuracy of the prediction model is determined by summarizing the prediction situations of all sample broadcast television data, for example, the prediction model predicts 100 broadcast television data, and the prediction accuracy of the prediction model is 50%.

In addition, referring to fig. 4, in an embodiment, step S360 in the embodiment shown in fig. 3 includes, but is not limited to, the following steps:

step S410, calculating the sum of the prediction accuracy of all the prediction models to obtain the overall accuracy;

Step S420, calculating the quotient of the prediction accuracy and the overall accuracy of the prediction model aiming at any prediction model to obtain the model accuracy;

step S430, determining the number of the prediction models;

step S440, multiplying the model accuracy, the weight value of the prediction model and the number of models, and updating the weight value of the prediction model according to the multiplication result.

It can be understood that in the iterative updating process of the weight value, the weight value is updated through the calculation result of the prediction accuracy, so that the reliability of the weight value can be ensured, and the higher prediction accuracy can be ensured.

It should be noted that, when the total iteration number is 10, the number of types of prediction models is 3, namely, a prediction model a, a prediction model B and a prediction model C, and the initial values of weights of the prediction model a, the prediction model B and the prediction model C are all 1/3; in the first iteration updating process, the number of sample broadcast and television data in a sample data set is 100, and the prediction accuracy of the prediction model A is 50Since the prediction accuracy of the prediction model B is 60% and the prediction accuracy of the prediction model C is 70%, the weight value of the prediction model a is updated as follows:

the weight value of the prediction model B is updated as follows:

the weight value of the prediction model C is updated as: / >

Then carrying out iterative updating of the next round; after 10 rounds of iterative updating, iterating the target weight value, wherein each round of iterative updating process corresponds to one sample data set.

In addition, referring to fig. 5, in an embodiment, the target broadcast data includes attribute data and interaction data, and the attribute data includes at least: the data class data, payment class data, bill class data and product class data, and the interaction data at least comprises: the set top box interaction data, the customer service interaction data and the intelligent gateway interaction data; step S130 in the embodiment shown in fig. 1 includes, but is not limited to, the following steps:

step S510, based on a preset feature combination strategy, combining the attribute data and the interaction data to obtain a radio and television data combination;

step S520, carrying out feature extraction processing on the broadcast television data combination to obtain broadcast television feature data;

and step S530, inputting the broadcast television characteristic data into a plurality of prediction models to respectively obtain target loss probabilities predicted by the prediction models.

It can be understood that a plurality of feature combinations can be determined according to a feature combination strategy, and feature data specific to broadcast and television data is generated by using different feature combinations, for example, specific broadcast and television feature data is generated by gender and live channel watching times, specific broadcast and television feature data is generated by age intervals and complaint suggesting times, specific broadcast and television feature data is generated by client contribution degree and broadband using frequency, and specific broadcast and television feature data is generated by interval-dividing on-demand watching time; by generating new broadcast and television characteristic data, comprehensive analysis is realized, and the prediction accuracy of the prediction model can be improved.

It should be noted that the data class data includes, but is not limited to: user id, the place city to which the user belongs, user account opening time, user status, user service type, user affiliation company, user status change time, client id, client rating, client certificate number, client status, client attribute, client type and client contribution;

payment class data includes, but is not limited to: basic viewing maintenance fees, video value-added service fees, data service fees, bandwidth service fees and payment means;

billing class data includes, but is not limited to: bill amount, offset amount, arrearage amount, and bill type; product class data includes, but is not limited to: sales name, package name, product type, product unit price, time of order, and state of order;

the set top box interaction data includes, but is not limited to: live broadcast viewing time length, live broadcast viewing frequency, live broadcast channel viewing times, on-demand viewing time length, on-demand viewing frequency, on-demand channel viewing times, review viewing time length, review viewing frequency and review channel viewing times;

customer service interaction data includes, but is not limited to: the number of fault declarations, the content of fault declarations, the number of consultations, the content of consultations, the number of complaints advice, the content of complaints advice and the satisfaction of return visit;

Intelligent gateway interaction data includes, but is not limited to: broadband usage traffic and broadband usage frequency.

As shown in fig. 6, in an embodiment, following step S150 in the embodiment shown in fig. 1, the following steps are included, but not limited to:

step S610, judging whether the target user is a pre-churn user according to the churn prediction result of the target user;

step S620, when the target user is a pre-loss user, determining first pushing information according to the attribute data and a preset attribute saving strategy, and determining second pushing information according to the interaction data and a preset interaction saving strategy;

step S630, pushing the first pushing information and the second pushing information to the pre-churn user.

It can be appreciated that when the target user is a pre-churn user, the saving work needs to be performed on the pre-churn user, and push information for saving the user is determined by combining attribute data and interaction data of the pre-churn user, for example, pre-stored offers or product packages suitable for the consumption degree of the user are pushed according to the consumption condition of the user; or pushing more targeted product packages to the user according to the viewing behaviors or interests of the user, such as pushing a movie product package to movie lovers and pushing educational plates to the family with nurturing children; or recommending the fusion package more in line with the user's demands according to the situation that the user orders the product package, such as recommending interaction or digital package to the broadband user for the access point with free or preferential price according to the preference.

As shown in fig. 7, in an embodiment, following step S110 in the embodiment shown in fig. 1, the following steps are included, but not limited to:

step S710, performing anomaly detection processing on target broadcast television data to determine first anomaly data;

step S720, eliminating the first abnormal data from the target broadcast television data, and updating the target broadcast television data;

step S730, performing anomaly detection processing on the sample broadcast television data to determine second anomaly data;

and step S740, removing the second abnormal data from the sample broadcast television data, and updating the sample broadcast television data.

It can be understood that after the original data of the target broadcast television data and the sample broadcast television data are obtained, abnormality detection processing is required to be performed on the original data, and rejection processing is performed on abnormal data, for example, if the data is smaller or larger than an expected number by 3 times of variance, the data is taken as abnormal data and rejected, a plurality of repeated data are collected at the same time, only one data is reserved, and redundant data are rejected; in addition, to ensure the integrity of the data, the null value of the data may be supplemented by median filling, mean filling, or constant filling, wherein median filling refers to filling the null value with the median of the feature, mean filling refers to filling the null value with the mean of the feature, and constant filling refers to filling the null value with a fixed value.

In addition, referring to fig. 8, the present application further provides a user loss prediction apparatus 800, including:

an acquiring unit 810, configured to acquire target broadcast television data and a plurality of sample broadcast television data of a target user;

an updating unit 820, configured to input sample broadcast and television data into a plurality of trained prediction models, iteratively update weight values of each prediction model, and determine a target weight value of the prediction model, where model types of any two prediction models are different;

the prediction unit 830 is configured to input target broadcast and television data into a plurality of prediction models, so as to obtain target loss probabilities predicted by the respective prediction models;

a calculating unit 840, configured to determine a total loss probability according to the target loss probability and the target weight value;

a determining unit 850, configured to determine a loss prediction result of the target user according to the total loss probability.

It can be appreciated that the specific embodiment of the user loss prediction apparatus 800 is substantially the same as the specific embodiment of the user loss prediction method described above, and will not be described herein. Based on the method, sample broadcast television data are input into different prediction models, the weight values of the prediction models are iteratively updated according to model output results, so that target weight values of the prediction models are determined, in the prediction process, target broadcast television data of a target user are input into different prediction models, the target loss probability is determined through model output results, then the total loss probability is calculated by combining the target weight values, further loss prediction results of the target user are determined, the target loss probability of each prediction model is integrated according to the target weight values of each prediction model, prediction is carried out by adopting multiple prediction models, and multiple types of broadcast television data can be effectively analyzed, so that the prediction accuracy of the loss prediction results is high, and the prediction accuracy of user loss is improved.

In addition, referring to fig. 9, fig. 9 illustrates a hardware structure of an electronic device of another embodiment, the electronic device including:

the processor 901 may be implemented by a general-purpose CPU (Central Processing Unit ), a microprocessor, an application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, etc. for executing related programs to implement the technical solutions provided by the embodiments of the present application;

the Memory 902 may be implemented in the form of a Read Only Memory (ROM), a static storage device, a dynamic storage device, or a random access Memory (Random Access Memory, RAM). The memory 902 may store an operating system and other application programs, and when the technical solution provided in the embodiments of the present disclosure is implemented by software or firmware, relevant program codes are stored in the memory 902, and the processor 901 invokes the user churn prediction method for executing the embodiments of the present disclosure, for example, executing the method steps S110 to S150 in fig. 1, the method steps S210 to S230 in fig. 2, the method steps S310 to S370 in fig. 3, the method steps S410 to S440 in fig. 4, the method steps S510 to S530 in fig. 5, the method steps S610 to S630 in fig. 6, and the method steps S710 to S740 in fig. 7 described above;

An input/output interface 903 for inputting and outputting information;

the communication interface 904 is configured to implement communication interaction between the device and other devices, and may implement communication in a wired manner (e.g. USB, network cable, etc.), or may implement communication in a wireless manner (e.g. mobile network, WIFI, bluetooth, etc.);

a bus 905 that transfers information between the various components of the device (e.g., the processor 901, the memory 902, the input/output interface 903, and the communication interface 904);

wherein the processor 901, the memory 902, the input/output interface 903 and the communication interface 904 are communicatively coupled to each other within the device via a bus 905.

The embodiment of the present application further provides a storage medium, which is a computer readable storage medium, for computer readable storage, where the storage medium stores one or more programs, and the one or more programs may be executed by the one or more processors to implement the above-described user churn prediction method, for example, perform the method steps S110 to S150 in fig. 1, the method steps S210 to S230 in fig. 2, the method steps S310 to S370 in fig. 3, the method steps S410 to S440 in fig. 4, the method steps S510 to S530 in fig. 5, the method steps S610 to S630 in fig. 6, and the method steps S710 to S740 in fig. 7, which are described above.

The memory, as a non-transitory computer readable storage medium, may be used to store non-transitory software programs as well as non-transitory computer executable programs. In addition, the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory remotely located relative to the processor, the remote memory being connectable to the processor through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The user loss prediction method, device, equipment and storage medium provided by the embodiment of the application are used for obtaining target broadcast television data and a plurality of sample broadcast television data of a target user; inputting sample broadcast and television data into a plurality of trained prediction models, and carrying out iterative updating on the weight value of each prediction model to determine the target weight value of the prediction model, wherein the model types of any two prediction models are different; inputting target broadcast and television data into a plurality of prediction models to respectively obtain target loss probability predicted by each prediction model; determining total loss probability according to the target loss probability and the target weight value; and determining a loss prediction result of the target user according to the total loss probability. Based on the method, sample broadcast television data are input into different prediction models, the weight values of the prediction models are iteratively updated according to model output results, so that target weight values of the prediction models are determined, in the prediction process, target broadcast television data of a target user are input into different prediction models, the target loss probability is determined through model output results, then the total loss probability is calculated by combining the target weight values, further loss prediction results of the target user are determined, the target loss probability of each prediction model is integrated according to the target weight values of each prediction model, prediction is carried out by adopting multiple prediction models, and multiple types of broadcast television data can be effectively analyzed, so that the prediction accuracy of the loss prediction results is high, and the prediction accuracy of user loss is improved.

The embodiments described in the embodiments of the present application are for more clearly describing the technical solutions of the embodiments of the present application, and do not constitute a limitation on the technical solutions provided by the embodiments of the present application, and as those skilled in the art can know that, with the evolution of technology and the appearance of new application scenarios, the technical solutions provided by the embodiments of the present application are equally applicable to similar technical problems.

It will be appreciated by those skilled in the art that the solutions shown in fig. 1-7 are not limiting to embodiments of the present application, and may include more or fewer steps than illustrated, or may combine certain steps, or different steps.

The above described apparatus embodiments are merely illustrative, wherein the units illustrated as separate components may or may not be physically separate, i.e. may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

Those of ordinary skill in the art will appreciate that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof.

The terms "first," "second," "third," "fourth," and the like in the description of the present application and in the above-described figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the present application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be understood that in this application, "at least one" means one or more, and "a plurality" means two or more. "and/or" for describing the association relationship of the association object, the representation may have three relationships, for example, "a and/or B" may represent: only a, only B and both a and B are present, wherein a, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b or c may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.

In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the above-described division of units is merely a logical function division, and there may be another division manner in actual implementation, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including multiple instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods of the various embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing a program.

Preferred embodiments of the present application are described above with reference to the accompanying drawings, and thus do not limit the scope of the claims of the embodiments of the present application. Any modifications, equivalent substitutions and improvements made by those skilled in the art without departing from the scope and spirit of the embodiments of the present application shall fall within the scope of the claims of the embodiments of the present application.

Claims

1. A method for predicting user churn, comprising:

acquiring target broadcast television data and a plurality of sample broadcast television data of a target user;

inputting the sample broadcast and television data into a plurality of trained prediction models, and iteratively updating the weight value of each prediction model to determine the target weight value of the prediction model, wherein the model types of any two prediction models are different;

inputting the target broadcast television data into a plurality of prediction models to respectively obtain target loss probability predicted by each prediction model;

determining total loss probability according to the target loss probability and the target weight value;

and determining a loss prediction result of the target user according to the total loss probability.

2. The method according to claim 1, wherein model types of the predictive model include at least: a logistic regression model, a random forest model and a gradient lifting decision tree model; inputting the target broadcast television data into a plurality of prediction models to respectively obtain target loss probabilities predicted by the prediction models, wherein the target loss probabilities comprise:

inputting the target broadcast television data into the logistic regression model to obtain the target loss probability predicted by the logistic regression model;

Inputting the target broadcast television data into the random forest model to obtain target loss probability predicted by the logistic regression model;

and inputting the target broadcast television data into the gradient lifting decision tree model to obtain the target loss probability predicted by the logistic regression model.

3. The method of claim 1, wherein the inputting the sample broadcast television data into a plurality of trained predictive models, iteratively updating the weight values of each of the predictive models, determining a target weight value for the predictive model, comprises:

acquiring total iteration times, weight values of all the prediction models and loss labels of the sample broadcast and television data;

dividing the plurality of sample broadcast television data into a plurality of sample data sets based on the total iteration times, wherein the number of the sample data sets is the same as the total iteration times;

sequentially inputting the sample broadcast and television data in the sample data set into a plurality of trained prediction models aiming at any sample data set to respectively obtain sample loss probability predicted by each prediction model;

determining a sample loss result of the sample broadcast and television data according to the sample loss probability;

Determining the prediction accuracy of each prediction model according to each sample loss result and the corresponding loss label;

according to the prediction accuracy of all the prediction models, iteratively updating the weight value of each prediction model until each sample data set is traversed;

and determining the update times of the weight values, and taking the current weight value as a target weight value of the prediction model when the update times of the weight values reach the total iteration times.

4. A method according to claim 3, wherein iteratively updating the weight value of each of the prediction models according to the prediction accuracy of all the prediction models comprises:

calculating the sum of the prediction accuracy of all the prediction models to obtain the overall accuracy;

calculating the quotient of the prediction accuracy of the prediction model and the overall accuracy aiming at any prediction model to obtain model accuracy;

determining the number of models of the prediction model;

multiplying the model accuracy, the weight value of the prediction model and the model quantity, and updating the weight value of the prediction model according to the multiplication result.

5. The method of claim 1, wherein the target broadcast data includes attribute data and interaction data, the attribute data including at least: data class data, payment class data, bill class data and product class data, the interaction data at least comprises: the set top box interaction data, the customer service interaction data and the intelligent gateway interaction data; inputting the target broadcast television data into a plurality of prediction models to respectively obtain target loss probabilities predicted by the prediction models, wherein the target loss probabilities comprise:

based on a preset feature combination strategy, carrying out combination processing on the attribute data and the interaction data to obtain a radio and television data combination;

performing feature extraction processing on the broadcast and television data combination to obtain broadcast and television feature data;

and inputting the broadcast and television characteristic data into a plurality of prediction models to respectively obtain target loss probabilities predicted by the prediction models.

6. The method of claim 5, wherein after the step of determining the attrition prediction results for the target user based on the total attrition probability, the method further comprises:

judging whether the target user is a pre-loss user or not according to the loss prediction result of the target user;

When the target user is the pre-loss user, determining first pushing information according to the attribute data and a preset attribute saving strategy, and determining second pushing information according to the interaction data and a preset interaction saving strategy;

and pushing the first pushing information and the second pushing information to the pre-churn user.

7. The method of claim 1, wherein after the step of obtaining the target broadcast data and the plurality of sample broadcast data for the target user, the method further comprises:

performing anomaly detection processing on the target broadcast television data to determine first anomaly data;

removing the first abnormal data from the target broadcast television data, and updating the target broadcast television data;

performing anomaly detection processing on the sample broadcast and television data to determine second anomaly data;

and eliminating the second abnormal data from the sample broadcast television data, and updating the sample broadcast television data.

8. A user churn prediction apparatus, comprising:

the acquisition unit is used for acquiring target broadcast television data and a plurality of sample broadcast television data of a target user;

the updating unit is used for inputting the sample broadcast and television data into a plurality of trained prediction models, carrying out iterative updating on the weight value of each prediction model, and determining the target weight value of the prediction model, wherein the model types of any two prediction models are different;

The prediction unit is used for inputting the target broadcast television data into a plurality of prediction models to respectively obtain target loss probabilities predicted by the prediction models;

the calculation unit is used for determining total loss probability according to the target loss probability and the target weight value;

and the determining unit is used for determining the loss prediction result of the target user according to the total loss probability.

9. An electronic device comprising a memory storing a computer program and a processor implementing the steps of the user churn prediction method according to any one of claims 1 to 7 when the computer program is executed by the processor.

10. A storage medium storing a computer program, wherein the computer program when executed by a processor implements the steps of the user churn prediction method according to any one of claims 1 to 7.