CN111652661B

CN111652661B - Mobile phone client user loss early warning processing method

Info

Publication number: CN111652661B
Application number: CN202010769480.3A
Authority: CN
Inventors: 邵俊; 蔺静茹; 张磊; 曹新建; 支磊
Original assignee: Shenzhen Suoxinda Data Technology Co ltd; Soxinda Beijing Data Technology Co ltd
Current assignee: Shenzhen Suoxinda Data Technology Co ltd; Soxinda Beijing Data Technology Co ltd
Priority date: 2020-08-04
Filing date: 2020-08-04
Publication date: 2020-12-08
Anticipated expiration: 2040-08-04
Also published as: CN111652661A

Abstract

The invention relates to a mobile phone client user loss early warning processing method, which comprises the following steps: collecting user information at regular time, forming a first user information set and carrying out digital processing to form a first user data set; estimating a first probability value for each user in the first set of user data; when the first probability value is greater than a first threshold value, classifying the user as a first type of user and calculating user data of the user; matching the calculation result with a corresponding type question bank; sending alarm information to a management platform and sending a first request to a first type of user; and adopting corresponding countermeasures based on the first request. Compared with the prior art, the method eliminates the collinearity by the splitting method under the condition of keeping the precision as much as possible, avoids losing important variables and precision due to the fact that a certain variable which is most representative in a cluster (such as the variable with the maximum correlation with the principal component) is simply kept for eliminating the collinearity, and therefore the accuracy of early warning processing is improved.

Description

Mobile phone client user loss early warning processing method

Technical Field

The invention belongs to the field of big data analysis and data mining, relates to a user information classification method, and particularly relates to a mobile phone client user loss early warning processing method.

Background

With the development of the mobile internet, the mobile phone gradually replaces the operator to become the first interface selected by the user, and the marketing of the mobile phone occupies an increasingly important position in the marketing strategy of the operator. At present, three operators increase the purchase and sale of mobile phones. The operator changes the mobile internet into a mobile internet without the support of a mobile phone, and the architecture of the mobile internet comprises three aspects of a cloud end, a pipeline and the mobile phone. The operator needs to make an intelligent pipeline, and the development of the mobile internet and the 4G/5G service presents obvious mobile phone driving characteristics. The main current channel of mobile internet application is a software application store embedded in a mobile phone, and the quality of the mobile phone service user directly influences the use of the user.

The mobile phone becomes a first interface of the user, and the left user and the right user can select the operator. Every year, a user changes a mobile phone, which can become a key opportunity for the user to reselect an operator, in the 4G/5G era, the mobile phone and a network are relatively bound due to the difference of technical systems, the user selects the mobile phone due to application selection, and the case ratio of selecting the operator due to mobile phone selection is all the same, so that a related selection mode is formed. This means that the user often chooses the handset first in the selection, and thus the network selection falls back second. For example, the user may select a network of china unicom by selecting an apple phone, and the user may select a millet phone by preferring a rice chat service, and then select a back operator. As the market capacity of mobile communication subscribers is approaching saturation, the focus of competition among various operators has gradually shifted to the competition of subscribers of other networks. Therefore, how to effectively analyze the potential lost users, search the causes and adopt a targeted means to reserve the users is a problem which needs to be solved urgently at present.

In addition, regression analysis is a statistical analysis method for determining the quantitative relationship of interdependence between two or more variables. The application is very wide, and regression analysis is divided into unitary regression analysis and multiple regression analysis according to the number of related variables; according to the number of independent variables, simple regression analysis and multiple regression analysis can be divided; according to the type of relationship between independent variables and dependent variables, linear regression analysis and nonlinear regression analysis can be classified. If a regression analysis includes only one independent variable and one dependent variable and the relationship between the independent variable and the dependent variable can be approximated by a straight line, the regression analysis is called a univariate linear regression analysis. If two or more independent variables are included in the regression analysis and there is a linear correlation between the independent variables, it is referred to as a multiple linear regression analysis.

An optimization analysis method for eliminating the problem of collinearity of regression data in a complex system is provided in Chinese patent ZL201510881058.6, and the essence of the optimization analysis method is a method for continuously screening variables based on principal component analysis. The method mainly comprises the steps of selecting the variable with the maximum correlation after calculating the principal component each time, simultaneously removing other variables highly correlated with the principal component, and calculating the next principal component. Although it selects variables, the above method may have two drawbacks: the contribution degree of the selected variables to the model may not be high; in the process of eliminating the variables, the highly relevant judgment has strong subjectivity, and the important variables are easy to lose. Due to the fact that the selected variables are not typical and the important variables are lost, data analysis of the system is inaccurate, and the credibility of the system is low. Therefore, how to rapidly and efficiently classify, sort and model the obtained massive data information and extract valuable or concerned data information meeting preset conditions is a technical problem in the field of big data analysis and data mining.

Disclosure of Invention

In view of the above-mentioned drawbacks in the prior art, an object of the present invention is to provide a method and system for effectively predicting users who are potentially lost and providing corresponding solutions in time.

In order to achieve the above object, the present invention provides a method and a system for processing loss early warning of a mobile phone client user, comprising the following steps:

collecting user information in an operator server at regular time to form a first user information set;

carrying out digital processing on the first user information set to form a first user data set;

estimating, using a first estimation module, a first probability value for each user in the first set of user data based on the first set of user data;

when the first probability value is greater than a first threshold, classifying the user as a first type of user;

calculating the user data of the first type of user based on a second data model to obtain a calculation result, inquiring a database, and matching the calculation result with a corresponding type question bank;

sending alarm information to a management platform and sending a first request to the first type of user;

and adopting corresponding countermeasures based on the first request.

Wherein estimating, using a first estimation module, a first probability value for each user in the first set of user data based on the first set of user data comprises:

estimating the first set of user data based on a first data model, wherein the first probability value is a user churn probability value.

Wherein the establishing of the first data model comprises the steps of:

selecting historical user information for modeling, and dividing a historical user information set into a training set and a test set according to a proportion, wherein the training set is used for modeling and model parameter estimation, and the test set is used for model evaluation;

extracting user characteristic data which can be used for modeling, and establishing a data analysis broad table;

and establishing the first data model based on the data analysis broad table.

Wherein the establishing the first data model based on the data analysis broad table specifically comprises:

binning the data set;

performing WOE conversion on each box to obtain a WOE value;

performing variable clustering operation by a splitting method, and screening variables;

the variables are further screened by a backward elimination method, and if the variable VIF is more than 10, the variable with the maximum p value is eliminated. The remaining variables were then modeled by logistic regression.

The screening step was repeated until all variables VIF <10 and p-value < 0.05.

The variable clustering operation performed by the splitting method specifically comprises the following steps:

solving a covariance matrix for vectors formed by all variables, and calculating a first characteristic root and a second characteristic root as well as a corresponding first characteristic vector and a corresponding second characteristic vector;

judging a second feature vector, and if the second feature vector is larger than 0.8, dividing the variables into two types;

and respectively calculating covariance matrixes of the two classified variables, respectively calculating a first characteristic root and a second characteristic root of the two classified variables, and a corresponding first characteristic vector and a corresponding second characteristic vector, and returning to the judging step until the second characteristic vectors of the covariance matrixes of all the subclasses are not more than 0.8 or only 1 variable exists in the subclasses.

Wherein the screening variables specifically include:

reserving a variable with the highest IV value and a variable with the highest IV value in each class

The variable with the lowest value; in which the variable X is

The formula for the value is:

where R2 represents a representative metric within a cluster, which can be obtained by squaring the pearson correlation coefficient of the variable with the first principal component to which it belongs,

representing the first principal component of each class not containing the variable and the largest Pearson correlation coefficient in the Pearson correlation coefficients of the variable, and the formula is:

wherein k represents the number of classes, and the k classes are numbered from 1 to k in sequence,

the first principal component of the j-th class is represented, i represents the number of the class in which X is located, and Corr represents the pearson correlation coefficient.

Wherein the second data model is a decision tree based multi-classification model.

Based on the first request, corresponding countermeasures are adopted, and the method comprises the following steps:

the first request is to ask the first type user whether to accept a questionnaire survey;

if the first type user agrees, sending a corresponding network link address to the first type user;

receiving a feedback response of the first type user, wherein the feedback response comprises an answer of the first type user to the type question;

and adopting corresponding countermeasures based on the feedback response.

The user information comprises user personal information, user behavior related data in a charging system and mobile phone client information of the user.

The information of the mobile phone client of the user is acquired through a radio resource control connection REQUEST RRCCONNECTION REQUEST message or a CHANNEL REQUEST CHANNEL REQUEST message.

Compared with the prior art, the early warning processing system provided by the invention has the advantages that the user information is digitally processed and converted into the data information in the specific format of the system, the colinearity is eliminated by using the modeling module through a splitting method under the condition that the accuracy is kept as far as possible, the important variable and the accuracy are avoided being lost because a certain most representative variable in a cluster (such as the maximum correlation with a main component) is simply kept for eliminating the colinearity, and the accuracy of the early warning processing is improved.

Drawings

The above and other objects, features and advantages of exemplary embodiments of the present disclosure will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the present disclosure are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar or corresponding parts and in which:

fig. 1 is a flowchart illustrating a method for processing a mobile phone client user churn early warning according to an embodiment of the present invention;

FIG. 2 is a flow chart illustrating a user churn warning processing method according to one embodiment of the present invention;

FIG. 3 is a flow diagram illustrating the building of a logistic regression model according to one embodiment of the invention;

FIG. 4 is a flow diagram illustrating variable clustering according to an embodiment of the invention;

FIG. 5 is a flow chart illustrating the discovery of the cause of churn according to one embodiment of the present invention; and

fig. 6 is a block diagram illustrating a mobile phone client user churn early warning processing system according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the examples of the present invention and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise, and "a plurality" typically includes at least two.

It should be understood that although the terms first, second, third, etc. may be used to describe … … in embodiments of the present invention, these … … should not be limited to these terms. These terms are used only to distinguish … …. For example, the first … … can also be referred to as the second … … and similarly the second … … can also be referred to as the first … … without departing from the scope of embodiments of the present invention.

It should be understood that the term "and/or" as used herein is merely one type of association that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

The words "if", as used herein, may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to a detection", depending on the context. Similarly, the phrases "if determined" or "if detected (a stated condition or event)" may be interpreted as "when determined" or "in response to a determination" or "when detected (a stated condition or event)" or "in response to a detection (a stated condition or event)", depending on the context.

It is also noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that an article or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such article or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in the article or device in which the element is included.

Alternative embodiments of the present invention are described in detail below with reference to the accompanying drawings.

Example one

Referring to fig. 1, the invention discloses a method for processing loss early warning of a mobile phone client user, which comprises the following steps:

and adopting corresponding countermeasures based on the first request.

Example two

On the basis of the first embodiment, the present embodiment further includes the following contents:

in the internet field, big data technologies are increasingly being applied. In this field, user information classification is a very important aspect, especially for user churn warning. Because obtaining a new user usually saves several times of the cost of an old user, and the income brought by the old user to the company is far beyond that of the new user, the potential user loss is early warned by big data analysis and a strategy is made in time to save, which is a very critical link in many industries (such as telecommunication, bank, etc.). The method and the device have the advantages that the probability of possible loss of the user is output through the logistic regression model, the potential loss list is formulated, attribution is carried out according to the loss of the user, and corresponding measures are formulated.

Specifically, referring to fig. 2, the method for generating the potential churn list by outputting the probability of the possible churn of the user through the logistic regression model includes the following steps:

step 1, defining user churn behaviors through data, wherein for example, a churn user is determined when 20% of assets are churn in nearly three months;

step 2, selecting historical user information for modeling, and dividing a training set and a test set according to a proportion;

step 3, extracting user characteristic data which can be used for modeling, such as held product information, proportion, balance, transaction preference and the like, and establishing a data analysis broad table;

step 4, establishing a logistic regression model based on the wide table, outputting the probability of possible loss of the user, and sequencing through the loss probability values to obtain an early warning list;

step 5, discovering the possible loss reasons of the users on the list through a multi-classification model;

and 6, sending the early warning list and the loss reason to a management platform, and performing targeted strategy retrieval by the management platform according to the loss reason.

EXAMPLE III

On the basis of the second embodiment, the present embodiment further includes the following contents:

referring to fig. 3, step 4 may include the steps of:

step 4.1, variable binning and WOE transformation;

step 4.2, performing variable clustering operation by a splitting method, and screening variables;

and 4.3, establishing a logistic regression model, observing a regression result, removing variables with p values larger than 0.05, further screening the variables by a backward removal method, and removing the variables with the maximum p values if the variables VIF are larger than 10. Then performing logistic regression modeling on the remaining variables;

step 4.4, continuously repeating step 4.3 until all variables VIF <10 and p value < 0.05.

Example four

On the basis of the third embodiment, the present embodiment further includes the following contents:

the meaning of the box separation in the step 4.1 is as follows:

1) the value of the text variable which cannot be calculated is converted into a numerical value which can be calculated,

2) the stability of the model is increased, and the large change of the model result caused by the small disturbance of the numerical value is prevented.

More specifically, the variable X is divided into three boxes X, y and z, and the WOE value calculation formula of the X box is as follows:

WOE(X=x)=ln((#{Y=1,X=x}/#{Y=1})/(#{Y=0,X=x}/#{Y=0}))…(1)

where # (a) represents the number of samples satisfying condition a, # (a, B) represents the number of samples satisfying both conditions a and B, and ln () is a natural logarithmic function.

EXAMPLE five

On the basis of the fourth embodiment, the present embodiment further includes the following contents:

referring to fig. 4, the variable clustering operation by the splitting method in step 4.2 may include the following steps:

solving a covariance matrix for the vector composed of all the N variables, and calculating a first feature root and a second feature root, and corresponding feature vectors (the first feature vector and the second feature vector, respectively).

If the second feature vector is >0.8, the N variables are classified into two categories, and the specific classification manner may include the following steps:

respectively calculating the Pearson correlation coefficient of each variable and the two eigenvectors, and comparing the absolute values of the correlation coefficients; if the absolute value of the correlation coefficient of the variable and the first feature vector is larger than the absolute value of the correlation coefficient of the variable and the second feature vector, the variable belongs to the first class, otherwise, the variable belongs to the second class.

And respectively calculating covariance matrixes of the two groups of classified (class) variables, and respectively calculating a first characteristic root, a second characteristic root and corresponding characteristic vectors of the two groups of classified (class) variables. If the second characteristic root vector of a certain group of variables is greater than 0.8, repeating the classification steps on the group of variables until the second characteristic root vectors of the covariance matrixes of all the subclasses are not greater than 0.8 or only 1 variable in the subclasses exists.

The method has the advantage that whether the splitting of the group is terminated is judged according to the size of the second characteristic root, so that the variables with weak correlation are not gathered together. The invention ensures that the second characteristic root in each small group is less than 0.8 through an iterative mode so as to ensure the interpretability of the first principal component on the variance of the integral variable in the class.

In addition, the screening variables in step 4.2 may include the following steps:

reserving a variable with the highest IV value and a variable with the highest IV value in each class after variable clustering

The variable with the lowest value. Wherein the variable X is

The formula for the value is:

the first principal component of each class not containing the variable and the largest pearson correlation coefficient of the pearson correlation coefficients of the variable are expressed by the formula:

wherein k represents the number of classes, the invention numbers the k classes from 1 to k in sequence,

To make it possible to

As small as possible, then R2 should also be as large and large as possible

The values are as small as possible, i.e. the variables are not only representative within the group, but should be as weakly correlated with other classes as possible.

EXAMPLE six

On the basis of the fifth embodiment, the present embodiment further includes the following contents:

after the variables of the logistic regression model enter the final regression link, the effectiveness of the model is generally judged through two indexes: p-value (assumed value) and VIF (variance inflation factor) value. Wherein a p-value reflects the significance of a single variable, a larger p-value means a lower significance of the variable, and if the p-value >0.05, the variable is considered to be not significant and should be removed from the model; the VIF value reflects the degree of co-linearity of the variables, the higher the VIF value is, the larger the co-linearity is, and generally if the VIF value is greater than 10, the co-linearity is considered to exist in the model, and the variables need to be adjusted.

Wherein, VIF represents the co-linearity coefficient of the model and the formula is

VIF=1/(1-R²) Wherein R is a complex correlation coefficient of the independent variable to the rest independent variables for regression analysis.

The p-value is the degree of significance that logistic regression uses the z-statistic to characterize, i.e.,

p = Pr (| s | > | z |), where s obeys a standard normal distribution, and Pr is an operation to solve a probability, that is, to solve a probability of | s | > | z |.

If the p-value is greater than 0.05, the variable is considered to be not significant and should be removed from the model.

In order to facilitate understanding of the above-described co-linearity coefficient, complex correlation coefficient, and significance, detailed descriptions thereof will be given below, respectively.

In which co-linearity coefficients are used in the invention

The relationship between the VIF value and the complex correlation coefficient is as follows:

wherein the complex correlation coefficient is

The square root of (a). The larger the complex correlation coefficient is, the larger the complex correlation coefficient is

The larger, so the greater the co-linear coefficient of the variables, i.e.

Strong correlation with other variables exists, which can result in that stable parameter estimation cannot be obtained during model training.

The above

The complex correlation coefficients for other variables have the specific meaning: in all the independent variables, to

As dependent variables, all others

Establishing a linear regression model of the coefficients of a solution as independent variables

The square root of (a). In a linear regression model, let y be the dependent variable and X be the independent variable, then

Wherein

Is the average value of the samples and is,

to estimate y by the linear model, the equation characterizes the percentage that can be interpreted using the linear model in the overall compilation of the y values, with the remaining unexplained proportion being due to random perturbations caused by sampling. The larger the value, the more interpretable y is by the model, and the stronger the correlation between y and the argument. In the present inventionIn the bright scene, it is used

As the dependent variable y, use

As an independent variable, the above calculation may be made.

In addition, the significance of the above-mentioned degrees of significance specifically means: whether an index of the original hypothesis should be rejected in the statistical hypothesis testing process. For example:

h0, the coefficient of the variable X is 0, and the model result has no interpretation capability, namely X cannot enter the model;

let H1 be that the coefficient of variable X is not 0 and should enter the model;

the P value is used to refer to the probability that H0 holds, and if the P value is greater than a set significance level of 0.05, then it is considered that there is insufficient reason to reject the original hypothesis, i.e., X should not enter the model. The larger the value of P, the more likely the contribution of the variable to the model is due solely to sampling errors, and the more the model should be rejected.

EXAMPLE seven

On the basis of the sixth embodiment, the present embodiment further includes the following contents:

steps 4.3 and 4.4 may include the following:

performing logistic regression modeling on the retained maximum 2k variables, and iteratively using a backward elimination method until the VIF values and the p values of all the variables meet specified conditions;

and then adding the rejected variables back one by using a forward selection method based on the variables remained after the rejection, if after a certain variable is added, the VIF of all the variables is still less than 10, and the p value is not more than 0.05, keeping the added variable, and continuing the step until all the remaining variables cannot be added.

The reason for adopting the forward selection method after using the backward elimination method is as follows: because the backward elimination method adopts a greedy algorithm, namely, the variable which should be eliminated most is eliminated each time, and then the whole process is possibly trapped in a local optimal variable selection rather than a global optimal variable selection, the invention continues to carry out forward selection and add the variable on the basis of using the backward elimination method so as to prevent the variable from being killed by mistake.

Example eight

On the basis of the seventh embodiment, the present embodiment further includes the following contents:

referring to fig. 5, step 5 can be disassembled into the following steps:

step 5.1, the invention divides the reasons of user loss into 3 types according to the business experience: 1. product reason 2, customer reason 3, external reason; more specifically, the reasons for user churn can be classified into 6 categories: 1. lack of customer care 2, mobile phone failure 3, network uncovered 4, poor quality of service 5, no suitable tariff scheme 6, other reasons.

Step 5.2, by randomly drawing a sufficient number of attrition users (more than 5000) with one of the above category labels.

And 5.3, using the data obtained by the multi-classification model training based on the decision tree, and using the trained model to predict the potential loss reasons of the users in the early warning list.

Example nine

With reference to fig. 1 to 5, on the basis of the above embodiments, an embodiment of the present invention provides a method for processing a loss early warning of a mobile phone client user, which includes the following steps:

calculating the user data of the first type of user based on a second data model to obtain a calculation result, inquiring a database, and matching the calculation result with a corresponding type question bank; preferably, the second data model is a decision tree based multi-classification model;

and adopting corresponding countermeasures based on the first request.

The early warning processing system provided by the embodiment of the invention carries out digital processing on user information, converts the user information into data information in a system specific format, eliminates the collinearity by using the modeling module through a splitting method under the condition of keeping the precision as much as possible, and avoids losing important variables and precision due to simply keeping a certain variable which is most representative in a cluster (for example, the maximum correlation with a principal component) in order to eliminate the collinearity, thereby improving the accuracy of early warning processing.

Further, the present invention estimates a first probability value for each user in the first set of user data using a first estimation module based on the first set of user data, may include:

At a practical application, the establishing of the first data model may include the following steps:

and establishing the first data model based on the data analysis broad table.

Further, to complete the evaluation of the first set of user data, a first data model can be built by a wide table of data analytics. Wherein the building the first data model based on the data analysis broad table may comprise:

binning the data set;

performing WOE conversion on each box to obtain a WOE value;

The screening step was repeated until all variables VIF <10 and p-value < 0.05.

To facilitate an understanding of the above-described steps of establishing the first data model, some of the steps present therein will be described in detail for clarity. Wherein the variable clustering operation by the splitting method may include:

judging a second feature vector, and if the second feature vector is larger than 0.8, dividing the variables into two types; the classification standard is as follows: respectively calculating the Pearson correlation coefficient of each variable and the two eigenvectors, and comparing the absolute values of the correlation coefficients; if the absolute value of the correlation coefficient of the variable and the first feature vector is larger than the absolute value of the correlation coefficient of the variable and the second feature vector, the variable belongs to a first class, otherwise, the variable belongs to a second class;

Correspondingly, the screening variables may include:

The variable with the lowest value; wherein the variable XIs/are as follows

The formula for the value is:

In the present invention, based on the first request, corresponding countermeasures are adopted, which may include:

and adopting corresponding countermeasures based on the feedback response.

In order to facilitate understanding of the user loss early warning processing method, some parameters or terms are explained. The user information may include personal information of the user, data related to user behavior in the billing system, and mobile client information of the user. The user personal information may include: using mobile phone brand, age bracket, network access time, model, price, occupation and income; the user behavior related data may include: the call and profit and loss conditions of the user, the service conditions of the user, the stability conditions of the mobile phone and the like. In addition, the invention can also collect data such as user consumption behavior, payment behavior and the like in the charging system at regular time.

According to the invention, corresponding countermeasures are adopted based on the first request. The measures adopted are mainly to avoid user loss, and the user loss can include two aspects: firstly, the user transfers from the terminal to other terminals; secondly, the monthly average call cost of the user is reduced, and the user becomes a low-value user from a high-value user.

In an actual application scenario, the information of the mobile phone client of the user may be obtained through a rrc connection REQUEST RRCCONNECTION REQUEST message or a CHANNEL REQUEST message. More specifically, the obtaining of the mobile phone client information of the user may be obtained through a first message when the user accesses the network, where the first message may be a radio resource control connection REQUEST (RRCCONNECTION REQUEST) message in a protocol of "2 GHz TD-SCDMA UU interface technical requirement layer three technical requirements" of 3GPP TS 25.331 and CCSA or a CHANNEL REQUEST (CHANNEL REQUEST) in 3GPP TS 04.08. In order to ensure the compatibility of the protocol, a terminal type cell, a terminal manufacturer cell, a terminal model cell and a version information cell are added in the expandable part of the RRC CONNECTION REQUEST message, and occupy four bytes which are respectively used for carrying the terminal type information, the terminal manufacturer information, the terminal model information and the version information of the terminal.

Example ten

Referring to fig. 6, the embodiment further provides a system 600 for processing loss early warning of a mobile phone client user, which includes:

the early warning server 601 is used for collecting user information in an operator server at regular time to form a first user information set;

calculating the user data of the first type of users based on a second data model to obtain a calculation result,

a database 603 for storing a corresponding type question bank matched with the calculation result;

and the management platform 602 is configured to receive alarm information sent by the early warning server, and obtain an answer analysis result of the user for the corresponding type of question bank.

EXAMPLE eleven

The disclosed embodiments provide a non-volatile computer storage medium having stored thereon computer-executable instructions that may perform the method steps as described in the embodiments above.

It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.

Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a local Area Network (AN) or a Wide Area Network (WAN), or the connection may be made to AN external computer (for example, through the internet using AN internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of an element does not in some cases constitute a limitation on the element itself.

The foregoing describes preferred embodiments of the present invention, and is intended to provide a clear and concise description of the spirit and scope of the invention, and not to limit the same, but to include all modifications, substitutions, and alterations falling within the spirit and scope of the invention as defined by the appended claims.

Claims

1. A mobile phone client user loss early warning processing method comprises the following steps:

based on the first request, adopting corresponding countermeasures;

estimating the first set of user data based on a first data model, wherein the first probability value is a user churn probability value;

wherein the establishing of the first data model comprises the steps of:

establishing the first data model based on the data analysis broad table;

binning the data set;

performing WOE conversion on each box to obtain a WOE value;

further screening variables by a backward elimination method, if a variable VIF is more than 10, eliminating the variable with the maximum p value, wherein the VIF is a variance expansion coefficient, the p value is an assumed value p-value, and then performing logistic regression modeling on the remaining variables;

continuously repeating the screening step until all variables, VIF <10 and p-value < 0.05;

2. The method of claim 1, performing variable clustering operations by a fragmentation method, and screening variables, wherein screening variables specifically comprises:

The variable with the lowest value, and the IV value being high means that the contribution of the variable to the model result is high; in which the variable X is

The formula for the value is:

3. The method of claim 1, wherein the second data model is a decision tree based multi-classification model.

4. The method of claim 1, wherein based on the first request, taking corresponding countermeasures comprises:

receiving a feedback response of the first type user, wherein the feedback response comprises an answer of the first type user to a type question;

and adopting corresponding countermeasures based on the feedback response.

5. The method of claim 1, wherein the user information includes user personal information, user behavior related data in a billing system, and mobile client information of the user.

6. The method of claim 5, wherein the user's mobile client information is obtained through a radio resource control connection REQUEST RRCCONNECTION REQUEST message or a CHANNEL REQUEST message.