CN107609708B - User loss prediction method and system based on mobile game shop - Google Patents

User loss prediction method and system based on mobile game shop Download PDF

Info

Publication number
CN107609708B
CN107609708B CN201710873746.7A CN201710873746A CN107609708B CN 107609708 B CN107609708 B CN 107609708B CN 201710873746 A CN201710873746 A CN 201710873746A CN 107609708 B CN107609708 B CN 107609708B
Authority
CN
China
Prior art keywords
users
user
behavior
prediction
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710873746.7A
Other languages
Chinese (zh)
Other versions
CN107609708A (en
Inventor
刘冶
刘宇琛
彭楠
陈宇恒
杨泽锋
印鉴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Heyan Big Data Technology Co ltd
National Sun Yat Sen University
Original Assignee
Guangzhou Heyan Big Data Technology Co ltd
National Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Heyan Big Data Technology Co ltd, National Sun Yat Sen University filed Critical Guangzhou Heyan Big Data Technology Co ltd
Priority to CN201710873746.7A priority Critical patent/CN107609708B/en
Publication of CN107609708A publication Critical patent/CN107609708A/en
Application granted granted Critical
Publication of CN107609708B publication Critical patent/CN107609708B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention provides a user loss prediction method and a user loss prediction system based on a mobile phone game store, which comprise the following steps: collecting basic information, behavior information and game information of users from a server log, and dividing the information into training set users and prediction set users; establishing a lost user label for a training set user, and preprocessing original data; performing feature extraction, selection and normalization on basic information, behavior information and game information of training set users and prediction set users; training a gradient lifting decision tree algorithm to obtain a user loss prediction model according to the characteristics of the training set users and the loss user labels; and identifying lost users of the mobile game shop through a user loss prediction model according to the characteristics of the prediction set users. The method and the system can quickly and accurately identify the potential lost users based on the business scene of the mobile game shop, and provide decision support for the mobile game shop to recall the lost users in time.

Description

User loss prediction method and system based on mobile game shop
Technical Field
The invention relates to the technical field of network data mining, in particular to a user loss prediction method and system based on a mobile game store.
Background
In recent years, with the popularization and development of mobile communication devices, the market of mobile phone games continues to steadily and rapidly grow on a global scale. The mobile game store, as the mobile game entrance of the user, is always the strategic deployment of the mobile game merchants. On the premise, the mobile phone game store industry competes for enthusiasm, and each mobile phone game store faces a situation that users run away seriously; moreover, retaining an existing user can often generate greater profits than introducing a new user. Therefore, for the increasingly saturated mobile phone game store industry, an effective user loss prediction analysis mechanism is established, decision support can be provided for user retention of the mobile phone game store and even occupation and expansion of market share, and the business significance is great.
On the other hand, the establishment of an effective user churn prediction analysis mechanism must be based on accurate understanding of a specific service scenario. Although the user churn prediction research under the existing game service scene is various in category and the aimed game types are fully from a large-scale multi-player battle game to a leisure game, the user churn prediction research is only aimed at the analysis of a single game. And the user loss prediction analysis based on the mobile phone game store needs to research user behaviors under various game types, namely, the game dimensionality is increased, and the complexity and the modeling difficulty of a service scene and corresponding characteristic engineering are greatly increased.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides an effective user loss prediction method and system based on a mobile game store.
The invention is realized by the following scheme: a user loss prediction method based on a mobile phone game store comprises the following steps:
s1: acquiring basic information, behavior information and game information of training set users and prediction set users from a server log, establishing lost user labels for the training set users, and preprocessing original data; the definition of the attrition users is as follows: among the users on the line of the first n days, the users which do not reach the active condition in the last m days are marked as lost users, wherein the active condition is that the total number of the events which have occurred of the users is greater than j and the active time is greater than k days; wherein n, m, j and k are adjustable parameters;
s2: performing feature extraction, selection and normalization on basic information, behavior information and game information of training set users and prediction set users;
s3: training a gradient lifting decision tree algorithm to obtain a user loss prediction model according to the characteristics of the training set users and the loss user labels;
s4: identifying lost users of the mobile game shop through a user loss prediction model according to the characteristics of the prediction set users;
wherein, the step S2 specifically includes:
s21: extracting basic features and game features based on the basic information and game information of the training set users and the prediction set users;
s22: extracting behavior characteristics based on the behavior information of the training set user;
s23: according to the lost user labels of the training set users, performing feature selection on the behavior features of the training set users to obtain key behavior features;
s24: extracting key behavior characteristics of the prediction set users based on the key behavior characteristics of the training set users and the behavior information of the prediction set users;
s25: the basic characteristics, game characteristics and key behavior characteristics of the training set users and the prediction set users are normalized.
The invention provides a user loss prediction method and system based on a mobile game store. The method and the system are based on the definition of lost users, combine the service scenes of the mobile game stores, extract the user data of the server logs as the basic characteristics of the users, the behavior characteristics of the users and the game characteristics of the users, train and establish an optimal gradient boost decision tree algorithm model to identify the lost users in the future period. The invention can automatically adjust according to the actual business scene of the mobile game shop, quickly and accurately identify the potential lost users of the mobile game shop, provide decision support for the mobile game shop to recall the potential lost users in time, and solve the urgent need of predicting lost users in the mobile game shop industry with increasingly saturated markets. Meanwhile, the invention also makes up the vacancy of the user loss prediction technology based on the mobile phone game store in the prior art.
As a further improvement of the present invention, the step S1 specifically includes:
s11: acquiring basic information, behavior information and game information of training set users and prediction set users in corresponding time periods from a server log according to the time period in which user loss prediction is required, and establishing a loss user label for the training set users;
s12: and (4) carrying out data cleaning on the basic information, the behavior information and the game information of the training set users and the prediction set users, wherein the data cleaning comprises the elimination of abnormal users and invalid events of the users.
As a further improvement of the present invention, the S3 specifically is: and obtaining an optimal user loss prediction model by setting assessment indexes and adopting a K-fold cross-validation method.
As a further improvement of the invention, the assessment indexes comprise accuracy and recall rate; the accuracy rate refers to the probability of the users predicted to be lost among the lost users, and the recall rate refers to the probability of the users predicted to be lost among the lost users.
As a further improvement of the present invention, the step S4 specifically includes: the method comprises the steps of taking the characteristics of prediction set users as input variables, and outputting the loss probability of the users through a user loss prediction model; and if the loss probability is greater than the set threshold value, the label is a lost user.
The invention also provides a user loss prediction system based on the mobile phone game store, which comprises
The data acquisition and preprocessing module is used for acquiring basic information, behavior information and game information of training set users and prediction set users from a server log, establishing lost user labels for the training set users and preprocessing original data; the definition of the attrition users is as follows: among the users on the line of the first n days, the users which do not reach the active condition in the last m days are marked as lost users, wherein the active condition is that the total number of the events which have occurred of the users is greater than j and the active time is greater than k days; wherein n, m, j and k are adjustable parameters;
the characteristic extraction, selection and normalization module is used for extracting, selecting and normalizing the characteristics of the basic information, the behavior information and the game information of the training set users and the prediction set users;
the training module is used for training a gradient lifting decision tree algorithm to obtain a user loss prediction model according to the characteristics of the training set users and the loss user labels;
and the prediction module is used for identifying the lost users of the mobile game shops through the user loss prediction model according to the characteristics of the prediction set users.
Wherein, the feature extraction, selection and normalization module specifically comprises:
the basic feature and game feature extraction submodule is used for extracting basic features and game features based on basic information and game information of training set users and prediction set users;
the key behavior feature selection and extraction submodule is used for extracting behavior features based on the behavior information of the training set users; selecting the behavior characteristics of the training set users according to the lost user labels of the training set users to obtain key behavior characteristics; extracting key behavior characteristics of the prediction set users based on the key behavior characteristics of the training set users and the behavior information of the prediction set users;
a characteristic normalization submodule for normalizing the basic characteristics, game characteristics and key behavior characteristics of the training set users and the prediction set users
As a further improvement of the present invention, the data acquisition and preprocessing module comprises:
the data acquisition submodule is used for acquiring basic information, behavior information and game information of training set users and prediction set users in corresponding time periods from a server log according to the time period in which the user loss prediction is required, and establishing a loss user label for the training set users;
and the preprocessing submodule is used for carrying out data cleaning on the basic information, the behavior information and the game information of the training set users and the prediction set users, and removing the invalid events of the abnormal users and the users.
As a further improvement of the present invention, the training module specifically comprises: and obtaining an optimal user loss prediction model by setting assessment indexes and adopting a K-fold cross-validation method.
As a further improvement of the invention, the assessment indexes comprise accuracy and recall rate; the accuracy rate is the probability of the users predicted to be lost among the lost users, and the recall rate is the probability of the users predicted to be lost among the lost users.
As a further improvement of the present invention, the prediction module specifically includes: the method comprises the steps of taking the characteristics of prediction set users as input variables, and outputting the loss probability of the users through a user loss prediction model; and if the loss probability is greater than the set threshold value, the label is a lost user.
In summary, compared with the prior art, the invention has the following effects:
1. according to the method, the user data of the server log is extracted to serve as the basic characteristics of the user, the behavior characteristics of the user and the game characteristics of the user, an optimal gradient boosting decision tree algorithm model is trained and established, and the potential lost user of the mobile phone game shop in a future period can be quickly and accurately identified.
2. The invention provides that the loss user can be defined by combining with the complex scene of the mobile game shop, so that the model can flexibly reflect the current practical application situation, and the prediction accuracy is higher.
3. The user loss prediction model based on the mobile game store can be self-adjusted according to the actual business scene of the mobile game store, real-time selection is carried out on user behavior characteristics, an optimal gradient lifting decision tree model is trained in real time, the flexibility is high, and loss users can be identified in real time.
For a better understanding and practice, the invention is described in detail below with reference to the accompanying drawings.
Drawings
FIG. 1 is a flow chart of the steps of a method for predicting user churn based on a mobile gaming establishment in accordance with the present invention.
FIG. 2 is a block diagram of the connection of the user churn prediction system based on a mobile gaming establishment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.
The invention provides a user loss prediction method and system based on a mobile game store, aiming at the problems that in the prior art, the lack of lost user technologies of the mobile game store is predicted, and the urgent need of increasingly competitive mobile game store industries for identifying potential lost users is met. The method and the system are based on the definition of lost users, a user loss prediction model is established by using the selected server log user data and a gradient boosting decision tree algorithm, potential lost users of the mobile game store in a period of time in the future are identified, and decision support is provided for the retained users. The specific technical solution is described by the following examples.
Please refer to fig. 1, which is a flowchart illustrating a method for predicting user churn based on a mobile game store according to the present invention. The invention provides a user loss prediction method based on a mobile phone game store, which specifically comprises the following steps:
s1: acquiring basic information, behavior information and game information of training set users and prediction set users from server logs, establishing lost user labels for the training set users, and preprocessing original data. Specifically, as a further improvement of the present invention, the definition of the attrition users is: among the users on the line of the first n days, the users which do not reach the active condition in the last m days are marked as lost users, wherein the active condition is that the total number of the events which have occurred of the users is greater than j and the active time is greater than k days; wherein n, m, j and k are adjustable parameters, and can be adjusted in real time according to the actual service scene of the mobile game shop.
Specifically, the step S1 includes:
s11: and acquiring basic information, behavior information and game information of the training set users and the prediction set users in corresponding time periods from a server log according to the time period in which the user loss prediction is required, and establishing a loss user label for the training set users.
For example, in this embodiment, if n, m, j, and k take 7, and 0.5, respectively, in the definition of the attrition users, server log raw data 8-14 days and 7 days before the prediction date are taken as the basic information, behavior information, and game information of the training set users and the prediction set users, respectively. For each single user of the training set users, the following labeling method is performed:
(1) if the user meets the activity condition 7 days before the forecast date, the user is labeled as a retention user.
(2) If the user does not meet the active condition 7 days before the forecast date, then the user is tagged as an attrition user.
S12: and (4) carrying out data cleaning on the basic information, the behavior information and the game information of the training set users and the prediction set users, wherein the data cleaning comprises the elimination of abnormal users and invalid events of the users.
Specifically, in this step, the step of removing the abnormal user is: because some users of the mobile game stores have serious number swiping behaviors, the number of accounts corresponding to the equipment can reach tens of thousands, and the users have no need of predicting loss, and can generate noise pollution on data to influence the prediction effect. Therefore, the equipment with the account number larger than the set threshold is defined as the abnormal user, and the abnormal user is eliminated.
The steps of eliminating the invalid event of the user are as follows: when the server logs the occurrence of the user, due to reasons such as unstable mobile phone network connection and delayed response of the server, a single operation of the user may cause multiple repeated records. Therefore, the event which is the same as the last event and has the time interval smaller than the set threshold value under the same account number is defined as an invalid event, and the invalid event is eliminated.
S2: and performing feature extraction, selection and normalization on the basic information, behavior information and game information of the training set users and the prediction set users.
Specifically, the step S2 specifically includes:
s21: and extracting basic features and game features based on the basic information and game information of the training set users and the prediction set users.
Wherein, the user basic features include: the number of registered accounts, the total number of events which have occurred, the number of events which have occurred each day, the number of registered days, the number of days since the last login, the active interval, the number of login days, the registration channel, the mobile phone system of the user, the VIP level and the like.
The user game features include: user game rating, user game classification, user game guild rating, number of days the user game has been online, and the like.
S22: and extracting the behavior characteristics of the lost users based on the behavior information of the training set users.
In this step, the user behavior feature refers to the number of times that the user generates each behavior type in the mobile game store. In the embodiment, the events of the users in the mobile game stores are hundreds of types, that is, the user behavior characteristics are hundreds of dimensions in total. However, feature engineering of too high dimensions is not conducive to mathematical modeling, and in fact, most of the events are strongly correlated within themselves. Therefore, it is necessary to classify these hundreds of events first. In this embodiment, the classified user behavior features include: login behavior times, exit game behavior times, payment behavior times, click message reminding behavior times, click navigation bar behavior times, account information checking behavior times, group behavior times, customer service communication behavior times, gift bag clicking behavior times, strategy checking behavior times, VIP behavior times, screen recording behavior times, welfare behavior times and the like.
S23: and selecting the behavior characteristics of the training set users according to the lost user labels of the training set users to obtain the key behavior characteristics.
Specifically, Pearson correlation coefficients, mutual information values, classifier importance and the like of the user behavior characteristics of each training set and the lost user labels are calculated, and the user behavior characteristics with strong correlation are taken as key user behavior characteristics.
S24: and extracting the key behavior characteristics of the prediction set users based on the key behavior characteristics of the training set users and the behavior information of the prediction set users.
S25: the basic characteristics, game characteristics and key behavior characteristics of the training set users and the prediction set users are normalized.
Specifically, the specific processing manner in this step includes performing one-hot encoding processing on the enumerated features. In this embodiment, the enumerated features include: registration channels and user game categories, etc.
S3: and training a gradient lifting decision tree algorithm to obtain a user loss prediction model according to the characteristics of the training set users and the loss user labels.
The S3 specifically includes: and obtaining an optimal user loss prediction model by setting assessment indexes and adopting a K-fold cross-validation method.
The gradient lifting decision tree algorithm model can adopt, but is not limited to, an XGboost algorithm.
Specifically, the gradient boost decision tree algorithm of the XGBoost improves the traditional gradient boost decision tree, for example, a regularization term is added to an optimization target function, second derivative information is additionally obtained, column sampling of a random forest is used for reference, and the like, so that the prediction accuracy and the calculation efficiency are greatly improved.
The assessment indexes comprise accuracy and recall rate, and different weights can be given according to the business scene of the mobile game store. The accuracy rate refers to the probability of the users predicted to be lost among the lost users, and the recall rate refers to the probability of the users predicted to be lost among the lost users.
In this embodiment, based on the actual business scenario of the mobile game store, the cost of recalling a user who predicts the actual retention of an attrition is lower than the cost of missing an attrition user. That is, high recall is more important. Thus, greater weight is given to the recall rate.
The K-fold cross validation method of this embodiment means that a data set of a training module is divided into n mutually exclusive subsets, each subset data is made into a primary validation set, the rest K-1 sets of subset data are used as training sets, so that K models are obtained, and the average of the check index weight sum of the final validation set of the K models is used as the performance index of the classifier under the K-fold cross validation. Based on the performance index, an optimal user loss prediction model is selected. In this embodiment, K may take 10.
S4: and identifying lost users of the mobile game shop through a user loss prediction model according to the characteristics of the prediction set users.
The step S4 specifically includes: the method comprises the steps of taking the characteristics of prediction set users as input variables, and outputting the loss probability of the users through a user loss prediction model; and if the loss probability is greater than the set threshold value, the label is a lost user. For example, in this embodiment, the threshold may be set to 0.5.
Please refer to fig. 2, which is a connection block diagram of the system for predicting user churn based on mobile game stores according to the present invention. In order to realize the method, the invention also provides a user loss prediction system based on the mobile phone game store, which comprises a data acquisition and preprocessing module 1, a feature extraction, selection and normalization module 2, a training module 3 and a prediction module 4.
The data acquisition and preprocessing module 1 is used for acquiring basic information, behavior information and game information of training set users and prediction set users from server logs, establishing lost user labels for the training set users, and preprocessing original data. The attrition users are: among the users on the line of the first n days, the users which do not reach the active condition in the last m days are marked as lost users, wherein the active condition is that the total number of the events which have occurred of the users is greater than j and the active time is greater than k days; wherein n, m, j and k are adjustable parameters.
The characteristic extraction, selection and normalization module 2 is used for extracting, selecting and normalizing the characteristics of the basic information, the behavior information and the game information of the training set users and the prediction set users.
And the training module 3 is used for training a gradient lifting decision tree algorithm to obtain a user loss prediction model according to the characteristics of the training set users and the loss user labels.
And the prediction module 4 is used for identifying lost users of the mobile game shops through the user loss prediction model according to the characteristics of the prediction set users.
Further, the data collecting and preprocessing module 1 includes: a data acquisition sub-module 11 and a pre-processing sub-module 12.
The data acquisition submodule 11 is configured to acquire basic information, behavior information and game information of a training set user and a prediction set user in a corresponding time period from a server log according to the time period in which the user loss prediction is required, and establish a loss user tag for the training set user;
the preprocessing submodule 12 is configured to perform data cleaning on the basic information, behavior information, and game information of the training set users and the prediction set users, including eliminating invalid events of abnormal users and users.
Further, the feature extraction, selection and normalization module 2 specifically includes: a basic feature and game feature extraction sub-module 21, a key behavior feature selection and extraction sub-module 22 and a feature normalization sub-module 23.
The basic feature and game feature extraction sub-module 21 is configured to extract basic features and game features based on the basic information and game information of the training set users and the prediction set users.
The key behavior feature selection and extraction submodule 22 is configured to extract behavior features based on the behavior information of the training set user. And meanwhile, selecting the behavior characteristics of the training set users according to the lost user labels of the training set users to acquire the key behavior characteristics. And then, extracting the key behavior characteristics of the prediction set users based on the key behavior characteristics of the training set users and the behavior information of the prediction set users.
The characteristic normalization submodule 23 is configured to normalize the basic characteristics, game characteristics, and key behavior characteristics of the training set users and the prediction set users.
Further, the training module 3 specifically includes: and obtaining an optimal user loss prediction model by setting assessment indexes and adopting a K-fold cross-validation method. Specifically, the assessment indexes comprise accuracy and recall rate; the accuracy rate is the probability of the users predicted to be lost among the lost users, and the recall rate is the probability of the users predicted to be lost among the lost users.
Further, the prediction module 4 specifically includes: the method comprises the steps of taking the characteristics of prediction set users as input variables, and outputting the loss probability of the users through a user loss prediction model; and if the loss probability is greater than the set threshold value, the label is a lost user.
The user loss prediction method and the user loss prediction system provided by the invention are not only suitable for mobile phone game shops, but also suitable for applications and related products for providing services aiming at various mobile phone games.
Compared with the prior art, the invention provides a user loss prediction method and system based on a mobile game store. The method and the system are based on the definition of lost users, combine the service scenes of the mobile game stores, extract the user data of the server logs as the basic characteristics of the users, the behavior characteristics of the users and the game characteristics of the users, train and establish an optimal gradient boost decision tree algorithm model to identify the lost users in the future period. In addition, the invention can automatically adjust according to the actual business scene of the mobile game store, quickly and accurately identify the potential lost users of the mobile game store, provide decision support for the mobile game store to recall the potential lost users in time, and solve the urgent need of predicting the lost users in the mobile game store industry with increasingly saturated markets. Meanwhile, the invention also makes up the vacancy of the user loss prediction technology based on the mobile phone game store in the prior art.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims (8)

1. A user loss prediction method based on a mobile phone game store is characterized by comprising the following steps:
s1: acquiring basic information, behavior information and game information of training set users and prediction set users from a server log, establishing lost user labels for the training set users, and preprocessing original data; the method specifically comprises the following steps:
s11: acquiring basic information, behavior information and game information of training set users and prediction set users in corresponding time periods from a server log according to the time period in which user loss prediction is required, and establishing loss user labels for the training set users; wherein the attrition users are defined as: among the users on the line of the first n days, the users which do not reach the active condition in the last m days are marked as lost users, wherein the active condition is that the total number of the events which have occurred of the users is greater than j and the active time is greater than k days; wherein n, m, j and k are adjustable parameters;
s12: carrying out data cleaning on basic information, behavior information and game information of training set users and prediction set users, wherein invalid events of abnormal users and users are eliminated; all accounts in the equipment with the account number larger than a set threshold value are defined as abnormal users; defining an event which is the same as the previous event and has a time interval smaller than a set threshold value under the same account number as an invalid event;
s2: performing feature extraction, selection and normalization on basic information, behavior information and game information of training set users and prediction set users; the method specifically comprises the following steps:
s21: extracting basic features and game features based on the basic information and game information of the training set users and the prediction set users;
s22: extracting behavior characteristics based on the behavior information of the training set user;
s23: according to the lost user labels of the training set users, performing feature selection on the behavior features of the training set users to obtain key behavior features;
s24: extracting key behavior characteristics of the prediction set users based on the key behavior characteristics of the training set users and the behavior information of the prediction set users;
s25: standardizing basic characteristics, game characteristics and key behavior characteristics of training set users and prediction set users;
s3: training a gradient lifting decision tree algorithm to obtain a user loss prediction model according to the characteristics of the training set users and the loss user labels; the method for training the gradient boosting decision tree algorithm comprises the following steps: an optimal user loss prediction model is obtained by setting assessment indexes and adopting a K-fold cross-validation method; the assessment indexes comprise accuracy and recall rate; the accuracy rate refers to the probability of the users predicted to be lost among the lost users, and the recall rate refers to the probability of the users predicted to be lost among the lost users; weighting the recall rate more heavily relative to the precision rate;
s4: and identifying lost users of the mobile game shop through a user loss prediction model according to the characteristics of the prediction set users.
2. The mobile game store-based user churn prediction method according to claim 1, wherein in step S22, the behavior feature extraction method comprises: classifying the user behavior information and taking the classified category as the behavior feature, wherein the behavior feature comprises: the number of login behaviors, the number of game exit behaviors, the number of payment behaviors, the number of message prompt behaviors, the number of navigation bar clicking behaviors, the number of account information viewing behaviors, the number of group behaviors, the number of customer service communication behaviors, the number of gift bag clicking behaviors, the number of attack and policy viewing behaviors, the number of VIP behaviors, the number of screen recording behaviors and the number of welfare behaviors.
3. The mobile game store-based user churn prediction method according to claim 2, wherein in step S23, the method for obtaining the key behavior features comprises: and calculating the Pearson correlation coefficient, the mutual information value and the classifier importance of the user behavior characteristics and the lost user labels of each training set, and taking the user behavior characteristics with strong correlation as the key user behavior characteristics.
4. The method for predicting user churn based on mobile phone game shop as claimed in claim 3, wherein said step S4 is specifically: the method comprises the steps of taking the characteristics of prediction set users as input variables, and outputting the loss probability of the users through a user loss prediction model; and if the loss probability is greater than the set threshold value, the label is a lost user.
5. A user loss prediction system based on a mobile game store is characterized by comprising:
the data acquisition and preprocessing module is used for acquiring basic information, behavior information and game information of training set users and prediction set users from a server log, establishing lost user labels for the training set users and preprocessing original data;
the characteristic extraction, selection and normalization module is used for extracting, selecting and normalizing the characteristics of the basic information, the behavior information and the game information of the training set users and the prediction set users;
the training module is used for training a gradient lifting decision tree algorithm to obtain a user loss prediction model according to the characteristics of the training set users and the loss user labels;
the prediction module is used for identifying lost users of the mobile game shops through the user loss prediction model according to the characteristics of the prediction set users;
wherein, the data acquisition and preprocessing module comprises:
the data acquisition submodule is used for acquiring basic information, behavior information and game information of training set users and prediction set users in corresponding time periods from a server log according to the time period in which the user loss prediction is required, and establishing a loss user label for the training set users; wherein the attrition users are defined as: among the users on the line of the first n days, the users which do not reach the active condition in the last m days are marked as lost users, wherein the active condition is that the total number of the events which have occurred of the users is greater than j and the active time is greater than k days; wherein n, m, j and k are adjustable parameters; and
the preprocessing submodule is used for carrying out data cleaning on the basic information, the behavior information and the game information of the training set users and the prediction set users, and removing the invalid events of the abnormal users and the users; all accounts in the equipment with the account number larger than a set threshold value are defined as abnormal users; defining the event which is the same as the previous event and has the time interval smaller than the set threshold value under the same account as an invalid event;
the feature extraction, selection and normalization module comprises:
the basic feature and game feature extraction submodule is used for extracting basic features and game features based on basic information and game information of training set users and prediction set users;
the key behavior feature selection and extraction submodule is used for extracting behavior features based on the behavior information of the training set users; selecting the behavior characteristics of the training set users according to the lost user labels of the training set users to obtain key behavior characteristics; extracting key behavior characteristics of the prediction set users based on the key behavior characteristics of the training set users and the behavior information of the prediction set users; and
the characteristic normalization submodule is used for normalizing the basic characteristics, game characteristics and key behavior characteristics of the training set users and the prediction set users;
the method for training the gradient boosting decision tree algorithm comprises the following steps: an optimal user loss prediction model is obtained by setting assessment indexes and adopting a K-fold cross-validation method; the assessment indexes comprise accuracy and recall rate; the accuracy rate refers to the probability of the users predicted to be lost among the lost users, and the recall rate refers to the probability of the users predicted to be lost among the lost users; giving greater weight to the recall rate relative to the precision rate.
6. The system of claim 5, wherein in the key behavior feature selection and extraction submodule, the behavior feature extraction method comprises: classifying the user behavior information and taking the classified category as the behavior feature, wherein the behavior feature comprises: the number of login behaviors, the number of game exit behaviors, the number of payment behaviors, the number of message prompt behaviors, the number of navigation bar clicking behaviors, the number of account information viewing behaviors, the number of group behaviors, the number of customer service communication behaviors, the number of gift bag clicking behaviors, the number of attack and policy viewing behaviors, the number of VIP behaviors, the number of screen recording behaviors and the number of welfare behaviors.
7. The system of claim 6, wherein in the feature normalization submodule, the method for obtaining the key behavior features comprises: and calculating the Pearson correlation coefficient, the mutual information value and the classifier importance of the user behavior characteristics and the lost user labels of each training set, and taking the user behavior characteristics with strong correlation as the key user behavior characteristics.
8. The system of claim 7, wherein the prediction module specifically comprises: the method comprises the steps of taking the characteristics of prediction set users as input variables, and outputting the loss probability of the users through a user loss prediction model; and if the loss probability is greater than the set threshold value, the label is a lost user.
CN201710873746.7A 2017-09-25 2017-09-25 User loss prediction method and system based on mobile game shop Active CN107609708B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710873746.7A CN107609708B (en) 2017-09-25 2017-09-25 User loss prediction method and system based on mobile game shop

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710873746.7A CN107609708B (en) 2017-09-25 2017-09-25 User loss prediction method and system based on mobile game shop

Publications (2)

Publication Number Publication Date
CN107609708A CN107609708A (en) 2018-01-19
CN107609708B true CN107609708B (en) 2021-03-26

Family

ID=61057924

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710873746.7A Active CN107609708B (en) 2017-09-25 2017-09-25 User loss prediction method and system based on mobile game shop

Country Status (1)

Country Link
CN (1) CN107609708B (en)

Families Citing this family (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108257675A (en) * 2018-02-07 2018-07-06 平安科技(深圳)有限公司 Chronic obstructive pulmonary disease onset risk Forecasting Methodology, server and computer readable storage medium
CN110147803B (en) * 2018-02-08 2022-02-18 北大方正集团有限公司 User loss early warning processing method and device
CN108733631A (en) * 2018-04-09 2018-11-02 中国平安人寿保险股份有限公司 A kind of data assessment method, apparatus, terminal device and storage medium
CN108712279B (en) * 2018-04-27 2021-08-17 中国联合网络通信集团有限公司 User off-network prediction method and device
CN109034861B (en) * 2018-06-04 2022-06-07 挖财网络技术有限公司 User loss prediction method and device based on mobile terminal log behavior data
JP6573205B2 (en) * 2018-09-10 2019-09-11 澪標アナリティクス株式会社 PROCESSING DEVICE, INFORMATION PROCESSING METHOD, PROGRAM, AND MARKETING INFORMATION PROCESSING DEVICE FOR Calculating Predictive Data Regarding Use Of Application Program By One User
CN109299265B (en) * 2018-10-15 2020-08-21 广州虎牙信息科技有限公司 Potential reflow user screening method and device and electronic equipment
CN109711860A (en) * 2018-11-12 2019-05-03 平安科技(深圳)有限公司 Prediction technique and device, storage medium, the computer equipment of user behavior
CN109543734A (en) * 2018-11-14 2019-03-29 中国联合网络通信集团有限公司 User portrait method and device, storage medium
CN109636443A (en) * 2018-11-17 2019-04-16 南京中数媒介研究有限公司 The deep learning method and device of customer churn prediction
CN109784993B (en) * 2019-01-06 2020-04-14 广州银汉科技有限公司 Intelligent and accurate user track prediction system based on big data
CN109767269B (en) * 2019-01-15 2022-02-22 网易(杭州)网络有限公司 Game data processing method and device
CN109767045A (en) * 2019-01-17 2019-05-17 北京腾云天下科技有限公司 A kind of prediction technique, device, calculating equipment and the medium of loss user
CN109815631A (en) * 2019-02-26 2019-05-28 网易(杭州)网络有限公司 A kind for the treatment of method and apparatus of game data
CN110263326B (en) * 2019-05-21 2022-05-03 平安科技(深圳)有限公司 User behavior prediction method, prediction device, storage medium and terminal equipment
CN110222267B (en) * 2019-06-06 2023-07-25 中山大学 Game platform information pushing method, system, storage medium and equipment
CN110634018A (en) * 2019-08-30 2019-12-31 阿里巴巴集团控股有限公司 Feature depiction method, recognition method and related device for lost user
CN110852780A (en) * 2019-10-08 2020-02-28 百度在线网络技术(北京)有限公司 Data analysis method, device, equipment and computer storage medium
CN110930192A (en) * 2019-11-22 2020-03-27 携程旅游信息技术(上海)有限公司 User loss prediction method, system, device and storage medium
CN111821694B (en) * 2020-07-24 2024-05-21 北京达佳互联信息技术有限公司 Loss prevention method and device for new game user, electronic equipment and storage medium
CN113827981A (en) * 2021-08-17 2021-12-24 杭州电魂网络科技股份有限公司 Game loss user prediction method and system based on naive Bayes
CN114022194A (en) * 2021-10-26 2022-02-08 共享智能铸造产业创新中心有限公司 Prediction method for platform user loss
CN116757750A (en) * 2023-06-05 2023-09-15 广州盈风网络科技有限公司 Operation pushing method, device, equipment and medium based on loss rate prediction

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105335752A (en) * 2015-09-18 2016-02-17 国网山东省电力公司菏泽供电公司 Principal component analysis multivariable decision-making tree-based connection manner identification method
CN106203679A (en) * 2016-06-27 2016-12-07 武汉斗鱼网络科技有限公司 A kind of customer loss Forecasting Methodology and system
CN106250403A (en) * 2016-07-19 2016-12-21 北京奇艺世纪科技有限公司 Customer loss Forecasting Methodology and device
CN106997493A (en) * 2017-02-14 2017-08-01 云数信息科技(深圳)有限公司 Lottery user attrition prediction method and its system based on multi-dimensional data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105335752A (en) * 2015-09-18 2016-02-17 国网山东省电力公司菏泽供电公司 Principal component analysis multivariable decision-making tree-based connection manner identification method
CN106203679A (en) * 2016-06-27 2016-12-07 武汉斗鱼网络科技有限公司 A kind of customer loss Forecasting Methodology and system
CN106250403A (en) * 2016-07-19 2016-12-21 北京奇艺世纪科技有限公司 Customer loss Forecasting Methodology and device
CN106997493A (en) * 2017-02-14 2017-08-01 云数信息科技(深圳)有限公司 Lottery user attrition prediction method and its system based on multi-dimensional data

Also Published As

Publication number Publication date
CN107609708A (en) 2018-01-19

Similar Documents

Publication Publication Date Title
CN107609708B (en) User loss prediction method and system based on mobile game shop
CN110910901B (en) Emotion recognition method and device, electronic equipment and readable storage medium
CN111291816B (en) Method and device for carrying out feature processing aiming at user classification model
CN109492772B (en) Method and device for generating information
CN110400215B (en) Method and system for constructing enterprise family-oriented small micro enterprise credit assessment model
CN111695597A (en) Credit fraud group recognition method and system based on improved isolated forest algorithm
CN109978575B (en) Method and device for mining user flow operation scene
CN114154672A (en) Data mining method for customer churn prediction
CN115545886A (en) Overdue risk identification method, overdue risk identification device, overdue risk identification equipment and storage medium
CN112101692B (en) Identification method and device for mobile internet bad quality users
CN109146667B (en) Method for constructing external interface comprehensive application model based on quantitative statistics
CN112070564B (en) Advertisement pulling method, device and system and electronic equipment
CN110555007B (en) Method and device for discriminating theft behavior, computing equipment and storage medium
CN111476657A (en) Information pushing method, device and system
CN114841705B (en) Anti-fraud monitoring method based on scene recognition
CN107545347B (en) Attribute determination method and device for risk prevention and control and server
CN114626940A (en) Data analysis method and device and electronic equipment
CN110852854B (en) Method for generating quantitative gain model and method for evaluating risk control strategy
CN112215386A (en) Personnel activity prediction method and device and computer readable storage medium
CN113420789A (en) Method, device, storage medium and computer equipment for predicting risk account
CN112950392A (en) Information display method, posterior information determination method and device and related equipment
CN111882339A (en) Prediction model training and response rate prediction method, device, equipment and storage medium
CN117035434B (en) Suspicious transaction monitoring method and suspicious transaction monitoring device
CN116151670B (en) Intelligent evaluation method, system and medium for marketing project quality of marketing business
CN113537666B (en) Evaluation model training method, evaluation and business auditing method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant