CN107609708B

CN107609708B - User loss prediction method and system based on mobile game shop

Info

Publication number: CN107609708B
Application number: CN201710873746.7A
Authority: CN
Inventors: 刘冶; 刘宇琛; 彭楠; 陈宇恒; 杨泽锋; 印鉴
Original assignee: Guangzhou Heyan Big Data Technology Co ltd; National Sun Yat Sen University
Current assignee: Guangzhou Heyan Big Data Technology Co ltd; National Sun Yat Sen University
Priority date: 2017-09-25
Filing date: 2017-09-25
Publication date: 2021-03-26
Anticipated expiration: 2037-09-25
Also published as: CN107609708A

Abstract

The invention provides a user loss prediction method and a user loss prediction system based on a mobile phone game store, which comprise the following steps: collecting basic information, behavior information and game information of users from a server log, and dividing the information into training set users and prediction set users; establishing a lost user label for a training set user, and preprocessing original data; performing feature extraction, selection and normalization on basic information, behavior information and game information of training set users and prediction set users; training a gradient lifting decision tree algorithm to obtain a user loss prediction model according to the characteristics of the training set users and the loss user labels; and identifying lost users of the mobile game shop through a user loss prediction model according to the characteristics of the prediction set users. The method and the system can quickly and accurately identify the potential lost users based on the business scene of the mobile game shop, and provide decision support for the mobile game shop to recall the lost users in time.

Description

User loss prediction method and system based on mobile game shop

Technical Field

The invention relates to the technical field of network data mining, in particular to a user loss prediction method and system based on a mobile game store.

Background

In recent years, with the popularization and development of mobile communication devices, the market of mobile phone games continues to steadily and rapidly grow on a global scale. The mobile game store, as the mobile game entrance of the user, is always the strategic deployment of the mobile game merchants. On the premise, the mobile phone game store industry competes for enthusiasm, and each mobile phone game store faces a situation that users run away seriously; moreover, retaining an existing user can often generate greater profits than introducing a new user. Therefore, for the increasingly saturated mobile phone game store industry, an effective user loss prediction analysis mechanism is established, decision support can be provided for user retention of the mobile phone game store and even occupation and expansion of market share, and the business significance is great.

On the other hand, the establishment of an effective user churn prediction analysis mechanism must be based on accurate understanding of a specific service scenario. Although the user churn prediction research under the existing game service scene is various in category and the aimed game types are fully from a large-scale multi-player battle game to a leisure game, the user churn prediction research is only aimed at the analysis of a single game. And the user loss prediction analysis based on the mobile phone game store needs to research user behaviors under various game types, namely, the game dimensionality is increased, and the complexity and the modeling difficulty of a service scene and corresponding characteristic engineering are greatly increased.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides an effective user loss prediction method and system based on a mobile game store.

The invention is realized by the following scheme: a user loss prediction method based on a mobile phone game store comprises the following steps:

s1: acquiring basic information, behavior information and game information of training set users and prediction set users from a server log, establishing lost user labels for the training set users, and preprocessing original data; the definition of the attrition users is as follows: among the users on the line of the first n days, the users which do not reach the active condition in the last m days are marked as lost users, wherein the active condition is that the total number of the events which have occurred of the users is greater than j and the active time is greater than k days; wherein n, m, j and k are adjustable parameters;

s2: performing feature extraction, selection and normalization on basic information, behavior information and game information of training set users and prediction set users;

s3: training a gradient lifting decision tree algorithm to obtain a user loss prediction model according to the characteristics of the training set users and the loss user labels;

s4: identifying lost users of the mobile game shop through a user loss prediction model according to the characteristics of the prediction set users;

wherein, the step S2 specifically includes:

s21: extracting basic features and game features based on the basic information and game information of the training set users and the prediction set users;

s22: extracting behavior characteristics based on the behavior information of the training set user;

s23: according to the lost user labels of the training set users, performing feature selection on the behavior features of the training set users to obtain key behavior features;

s24: extracting key behavior characteristics of the prediction set users based on the key behavior characteristics of the training set users and the behavior information of the prediction set users;

s25: the basic characteristics, game characteristics and key behavior characteristics of the training set users and the prediction set users are normalized.

The invention provides a user loss prediction method and system based on a mobile game store. The method and the system are based on the definition of lost users, combine the service scenes of the mobile game stores, extract the user data of the server logs as the basic characteristics of the users, the behavior characteristics of the users and the game characteristics of the users, train and establish an optimal gradient boost decision tree algorithm model to identify the lost users in the future period. The invention can automatically adjust according to the actual business scene of the mobile game shop, quickly and accurately identify the potential lost users of the mobile game shop, provide decision support for the mobile game shop to recall the potential lost users in time, and solve the urgent need of predicting lost users in the mobile game shop industry with increasingly saturated markets. Meanwhile, the invention also makes up the vacancy of the user loss prediction technology based on the mobile phone game store in the prior art.

As a further improvement of the present invention, the step S1 specifically includes:

s11: acquiring basic information, behavior information and game information of training set users and prediction set users in corresponding time periods from a server log according to the time period in which user loss prediction is required, and establishing a loss user label for the training set users;

s12: and (4) carrying out data cleaning on the basic information, the behavior information and the game information of the training set users and the prediction set users, wherein the data cleaning comprises the elimination of abnormal users and invalid events of the users.

As a further improvement of the present invention, the S3 specifically is: and obtaining an optimal user loss prediction model by setting assessment indexes and adopting a K-fold cross-validation method.

As a further improvement of the invention, the assessment indexes comprise accuracy and recall rate; the accuracy rate refers to the probability of the users predicted to be lost among the lost users, and the recall rate refers to the probability of the users predicted to be lost among the lost users.

As a further improvement of the present invention, the step S4 specifically includes: the method comprises the steps of taking the characteristics of prediction set users as input variables, and outputting the loss probability of the users through a user loss prediction model; and if the loss probability is greater than the set threshold value, the label is a lost user.

The invention also provides a user loss prediction system based on the mobile phone game store, which comprises

The data acquisition and preprocessing module is used for acquiring basic information, behavior information and game information of training set users and prediction set users from a server log, establishing lost user labels for the training set users and preprocessing original data; the definition of the attrition users is as follows: among the users on the line of the first n days, the users which do not reach the active condition in the last m days are marked as lost users, wherein the active condition is that the total number of the events which have occurred of the users is greater than j and the active time is greater than k days; wherein n, m, j and k are adjustable parameters;

the characteristic extraction, selection and normalization module is used for extracting, selecting and normalizing the characteristics of the basic information, the behavior information and the game information of the training set users and the prediction set users;

the training module is used for training a gradient lifting decision tree algorithm to obtain a user loss prediction model according to the characteristics of the training set users and the loss user labels;

and the prediction module is used for identifying the lost users of the mobile game shops through the user loss prediction model according to the characteristics of the prediction set users.

Wherein, the feature extraction, selection and normalization module specifically comprises:

the basic feature and game feature extraction submodule is used for extracting basic features and game features based on basic information and game information of training set users and prediction set users;

the key behavior feature selection and extraction submodule is used for extracting behavior features based on the behavior information of the training set users; selecting the behavior characteristics of the training set users according to the lost user labels of the training set users to obtain key behavior characteristics; extracting key behavior characteristics of the prediction set users based on the key behavior characteristics of the training set users and the behavior information of the prediction set users;

a characteristic normalization submodule for normalizing the basic characteristics, game characteristics and key behavior characteristics of the training set users and the prediction set users

As a further improvement of the present invention, the data acquisition and preprocessing module comprises:

the data acquisition submodule is used for acquiring basic information, behavior information and game information of training set users and prediction set users in corresponding time periods from a server log according to the time period in which the user loss prediction is required, and establishing a loss user label for the training set users;

and the preprocessing submodule is used for carrying out data cleaning on the basic information, the behavior information and the game information of the training set users and the prediction set users, and removing the invalid events of the abnormal users and the users.

As a further improvement of the present invention, the training module specifically comprises: and obtaining an optimal user loss prediction model by setting assessment indexes and adopting a K-fold cross-validation method.

As a further improvement of the invention, the assessment indexes comprise accuracy and recall rate; the accuracy rate is the probability of the users predicted to be lost among the lost users, and the recall rate is the probability of the users predicted to be lost among the lost users.

As a further improvement of the present invention, the prediction module specifically includes: the method comprises the steps of taking the characteristics of prediction set users as input variables, and outputting the loss probability of the users through a user loss prediction model; and if the loss probability is greater than the set threshold value, the label is a lost user.

In summary, compared with the prior art, the invention has the following effects:

1. according to the method, the user data of the server log is extracted to serve as the basic characteristics of the user, the behavior characteristics of the user and the game characteristics of the user, an optimal gradient boosting decision tree algorithm model is trained and established, and the potential lost user of the mobile phone game shop in a future period can be quickly and accurately identified.

2. The invention provides that the loss user can be defined by combining with the complex scene of the mobile game shop, so that the model can flexibly reflect the current practical application situation, and the prediction accuracy is higher.

3. The user loss prediction model based on the mobile game store can be self-adjusted according to the actual business scene of the mobile game store, real-time selection is carried out on user behavior characteristics, an optimal gradient lifting decision tree model is trained in real time, the flexibility is high, and loss users can be identified in real time.

For a better understanding and practice, the invention is described in detail below with reference to the accompanying drawings.

Drawings

FIG. 1 is a flow chart of the steps of a method for predicting user churn based on a mobile gaming establishment in accordance with the present invention.

FIG. 2 is a block diagram of the connection of the user churn prediction system based on a mobile gaming establishment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.

The invention provides a user loss prediction method and system based on a mobile game store, aiming at the problems that in the prior art, the lack of lost user technologies of the mobile game store is predicted, and the urgent need of increasingly competitive mobile game store industries for identifying potential lost users is met. The method and the system are based on the definition of lost users, a user loss prediction model is established by using the selected server log user data and a gradient boosting decision tree algorithm, potential lost users of the mobile game store in a period of time in the future are identified, and decision support is provided for the retained users. The specific technical solution is described by the following examples.

Please refer to fig. 1, which is a flowchart illustrating a method for predicting user churn based on a mobile game store according to the present invention. The invention provides a user loss prediction method based on a mobile phone game store, which specifically comprises the following steps:

s1: acquiring basic information, behavior information and game information of training set users and prediction set users from server logs, establishing lost user labels for the training set users, and preprocessing original data. Specifically, as a further improvement of the present invention, the definition of the attrition users is: among the users on the line of the first n days, the users which do not reach the active condition in the last m days are marked as lost users, wherein the active condition is that the total number of the events which have occurred of the users is greater than j and the active time is greater than k days; wherein n, m, j and k are adjustable parameters, and can be adjusted in real time according to the actual service scene of the mobile game shop.

Specifically, the step S1 includes:

s11: and acquiring basic information, behavior information and game information of the training set users and the prediction set users in corresponding time periods from a server log according to the time period in which the user loss prediction is required, and establishing a loss user label for the training set users.

For example, in this embodiment, if n, m, j, and k take 7, and 0.5, respectively, in the definition of the attrition users, server log raw data 8-14 days and 7 days before the prediction date are taken as the basic information, behavior information, and game information of the training set users and the prediction set users, respectively. For each single user of the training set users, the following labeling method is performed:

(1) if the user meets the activity condition 7 days before the forecast date, the user is labeled as a retention user.

(2) If the user does not meet the active condition 7 days before the forecast date, then the user is tagged as an attrition user.

Specifically, in this step, the step of removing the abnormal user is: because some users of the mobile game stores have serious number swiping behaviors, the number of accounts corresponding to the equipment can reach tens of thousands, and the users have no need of predicting loss, and can generate noise pollution on data to influence the prediction effect. Therefore, the equipment with the account number larger than the set threshold is defined as the abnormal user, and the abnormal user is eliminated.

The steps of eliminating the invalid event of the user are as follows: when the server logs the occurrence of the user, due to reasons such as unstable mobile phone network connection and delayed response of the server, a single operation of the user may cause multiple repeated records. Therefore, the event which is the same as the last event and has the time interval smaller than the set threshold value under the same account number is defined as an invalid event, and the invalid event is eliminated.

S2: and performing feature extraction, selection and normalization on the basic information, behavior information and game information of the training set users and the prediction set users.

Specifically, the step S2 specifically includes:

s21: and extracting basic features and game features based on the basic information and game information of the training set users and the prediction set users.

Wherein, the user basic features include: the number of registered accounts, the total number of events which have occurred, the number of events which have occurred each day, the number of registered days, the number of days since the last login, the active interval, the number of login days, the registration channel, the mobile phone system of the user, the VIP level and the like.

The user game features include: user game rating, user game classification, user game guild rating, number of days the user game has been online, and the like.

S22: and extracting the behavior characteristics of the lost users based on the behavior information of the training set users.

In this step, the user behavior feature refers to the number of times that the user generates each behavior type in the mobile game store. In the embodiment, the events of the users in the mobile game stores are hundreds of types, that is, the user behavior characteristics are hundreds of dimensions in total. However, feature engineering of too high dimensions is not conducive to mathematical modeling, and in fact, most of the events are strongly correlated within themselves. Therefore, it is necessary to classify these hundreds of events first. In this embodiment, the classified user behavior features include: login behavior times, exit game behavior times, payment behavior times, click message reminding behavior times, click navigation bar behavior times, account information checking behavior times, group behavior times, customer service communication behavior times, gift bag clicking behavior times, strategy checking behavior times, VIP behavior times, screen recording behavior times, welfare behavior times and the like.

S23: and selecting the behavior characteristics of the training set users according to the lost user labels of the training set users to obtain the key behavior characteristics.

Specifically, Pearson correlation coefficients, mutual information values, classifier importance and the like of the user behavior characteristics of each training set and the lost user labels are calculated, and the user behavior characteristics with strong correlation are taken as key user behavior characteristics.

S24: and extracting the key behavior characteristics of the prediction set users based on the key behavior characteristics of the training set users and the behavior information of the prediction set users.

Specifically, the specific processing manner in this step includes performing one-hot encoding processing on the enumerated features. In this embodiment, the enumerated features include: registration channels and user game categories, etc.

S3: and training a gradient lifting decision tree algorithm to obtain a user loss prediction model according to the characteristics of the training set users and the loss user labels.

The S3 specifically includes: and obtaining an optimal user loss prediction model by setting assessment indexes and adopting a K-fold cross-validation method.

The gradient lifting decision tree algorithm model can adopt, but is not limited to, an XGboost algorithm.

Specifically, the gradient boost decision tree algorithm of the XGBoost improves the traditional gradient boost decision tree, for example, a regularization term is added to an optimization target function, second derivative information is additionally obtained, column sampling of a random forest is used for reference, and the like, so that the prediction accuracy and the calculation efficiency are greatly improved.

The assessment indexes comprise accuracy and recall rate, and different weights can be given according to the business scene of the mobile game store. The accuracy rate refers to the probability of the users predicted to be lost among the lost users, and the recall rate refers to the probability of the users predicted to be lost among the lost users.

In this embodiment, based on the actual business scenario of the mobile game store, the cost of recalling a user who predicts the actual retention of an attrition is lower than the cost of missing an attrition user. That is, high recall is more important. Thus, greater weight is given to the recall rate.

The K-fold cross validation method of this embodiment means that a data set of a training module is divided into n mutually exclusive subsets, each subset data is made into a primary validation set, the rest K-1 sets of subset data are used as training sets, so that K models are obtained, and the average of the check index weight sum of the final validation set of the K models is used as the performance index of the classifier under the K-fold cross validation. Based on the performance index, an optimal user loss prediction model is selected. In this embodiment, K may take 10.

S4: and identifying lost users of the mobile game shop through a user loss prediction model according to the characteristics of the prediction set users.

The step S4 specifically includes: the method comprises the steps of taking the characteristics of prediction set users as input variables, and outputting the loss probability of the users through a user loss prediction model; and if the loss probability is greater than the set threshold value, the label is a lost user. For example, in this embodiment, the threshold may be set to 0.5.

Please refer to fig. 2, which is a connection block diagram of the system for predicting user churn based on mobile game stores according to the present invention. In order to realize the method, the invention also provides a user loss prediction system based on the mobile phone game store, which comprises a data acquisition and preprocessing module 1, a feature extraction, selection and normalization module 2, a training module 3 and a prediction module 4.

The data acquisition and preprocessing module 1 is used for acquiring basic information, behavior information and game information of training set users and prediction set users from server logs, establishing lost user labels for the training set users, and preprocessing original data. The attrition users are: among the users on the line of the first n days, the users which do not reach the active condition in the last m days are marked as lost users, wherein the active condition is that the total number of the events which have occurred of the users is greater than j and the active time is greater than k days; wherein n, m, j and k are adjustable parameters.

The characteristic extraction, selection and normalization module 2 is used for extracting, selecting and normalizing the characteristics of the basic information, the behavior information and the game information of the training set users and the prediction set users.

And the training module 3 is used for training a gradient lifting decision tree algorithm to obtain a user loss prediction model according to the characteristics of the training set users and the loss user labels.

And the prediction module 4 is used for identifying lost users of the mobile game shops through the user loss prediction model according to the characteristics of the prediction set users.

Further, the data collecting and preprocessing module 1 includes: a data acquisition sub-module 11 and a pre-processing sub-module 12.

The data acquisition submodule 11 is configured to acquire basic information, behavior information and game information of a training set user and a prediction set user in a corresponding time period from a server log according to the time period in which the user loss prediction is required, and establish a loss user tag for the training set user;

the preprocessing submodule 12 is configured to perform data cleaning on the basic information, behavior information, and game information of the training set users and the prediction set users, including eliminating invalid events of abnormal users and users.

Further, the feature extraction, selection and normalization module 2 specifically includes: a basic feature and game feature extraction sub-module 21, a key behavior feature selection and extraction sub-module 22 and a feature normalization sub-module 23.

The basic feature and game feature extraction sub-module 21 is configured to extract basic features and game features based on the basic information and game information of the training set users and the prediction set users.

The key behavior feature selection and extraction submodule 22 is configured to extract behavior features based on the behavior information of the training set user. And meanwhile, selecting the behavior characteristics of the training set users according to the lost user labels of the training set users to acquire the key behavior characteristics. And then, extracting the key behavior characteristics of the prediction set users based on the key behavior characteristics of the training set users and the behavior information of the prediction set users.

The characteristic normalization submodule 23 is configured to normalize the basic characteristics, game characteristics, and key behavior characteristics of the training set users and the prediction set users.

Further, the training module 3 specifically includes: and obtaining an optimal user loss prediction model by setting assessment indexes and adopting a K-fold cross-validation method. Specifically, the assessment indexes comprise accuracy and recall rate; the accuracy rate is the probability of the users predicted to be lost among the lost users, and the recall rate is the probability of the users predicted to be lost among the lost users.

Further, the prediction module 4 specifically includes: the method comprises the steps of taking the characteristics of prediction set users as input variables, and outputting the loss probability of the users through a user loss prediction model; and if the loss probability is greater than the set threshold value, the label is a lost user.

The user loss prediction method and the user loss prediction system provided by the invention are not only suitable for mobile phone game shops, but also suitable for applications and related products for providing services aiming at various mobile phone games.

Compared with the prior art, the invention provides a user loss prediction method and system based on a mobile game store. The method and the system are based on the definition of lost users, combine the service scenes of the mobile game stores, extract the user data of the server logs as the basic characteristics of the users, the behavior characteristics of the users and the game characteristics of the users, train and establish an optimal gradient boost decision tree algorithm model to identify the lost users in the future period. In addition, the invention can automatically adjust according to the actual business scene of the mobile game store, quickly and accurately identify the potential lost users of the mobile game store, provide decision support for the mobile game store to recall the potential lost users in time, and solve the urgent need of predicting the lost users in the mobile game store industry with increasingly saturated markets. Meanwhile, the invention also makes up the vacancy of the user loss prediction technology based on the mobile phone game store in the prior art.

The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims

1. A user loss prediction method based on a mobile phone game store is characterized by comprising the following steps:

s1: acquiring basic information, behavior information and game information of training set users and prediction set users from a server log, establishing lost user labels for the training set users, and preprocessing original data; the method specifically comprises the following steps:

s11: acquiring basic information, behavior information and game information of training set users and prediction set users in corresponding time periods from a server log according to the time period in which user loss prediction is required, and establishing loss user labels for the training set users; wherein the attrition users are defined as: among the users on the line of the first n days, the users which do not reach the active condition in the last m days are marked as lost users, wherein the active condition is that the total number of the events which have occurred of the users is greater than j and the active time is greater than k days; wherein n, m, j and k are adjustable parameters;

s12: carrying out data cleaning on basic information, behavior information and game information of training set users and prediction set users, wherein invalid events of abnormal users and users are eliminated; all accounts in the equipment with the account number larger than a set threshold value are defined as abnormal users; defining an event which is the same as the previous event and has a time interval smaller than a set threshold value under the same account number as an invalid event;

s2: performing feature extraction, selection and normalization on basic information, behavior information and game information of training set users and prediction set users; the method specifically comprises the following steps:

s25: standardizing basic characteristics, game characteristics and key behavior characteristics of training set users and prediction set users;

s3: training a gradient lifting decision tree algorithm to obtain a user loss prediction model according to the characteristics of the training set users and the loss user labels; the method for training the gradient boosting decision tree algorithm comprises the following steps: an optimal user loss prediction model is obtained by setting assessment indexes and adopting a K-fold cross-validation method; the assessment indexes comprise accuracy and recall rate; the accuracy rate refers to the probability of the users predicted to be lost among the lost users, and the recall rate refers to the probability of the users predicted to be lost among the lost users; weighting the recall rate more heavily relative to the precision rate;

2. The mobile game store-based user churn prediction method according to claim 1, wherein in step S22, the behavior feature extraction method comprises: classifying the user behavior information and taking the classified category as the behavior feature, wherein the behavior feature comprises: the number of login behaviors, the number of game exit behaviors, the number of payment behaviors, the number of message prompt behaviors, the number of navigation bar clicking behaviors, the number of account information viewing behaviors, the number of group behaviors, the number of customer service communication behaviors, the number of gift bag clicking behaviors, the number of attack and policy viewing behaviors, the number of VIP behaviors, the number of screen recording behaviors and the number of welfare behaviors.

3. The mobile game store-based user churn prediction method according to claim 2, wherein in step S23, the method for obtaining the key behavior features comprises: and calculating the Pearson correlation coefficient, the mutual information value and the classifier importance of the user behavior characteristics and the lost user labels of each training set, and taking the user behavior characteristics with strong correlation as the key user behavior characteristics.

4. The method for predicting user churn based on mobile phone game shop as claimed in claim 3, wherein said step S4 is specifically: the method comprises the steps of taking the characteristics of prediction set users as input variables, and outputting the loss probability of the users through a user loss prediction model; and if the loss probability is greater than the set threshold value, the label is a lost user.

5. A user loss prediction system based on a mobile game store is characterized by comprising:

the data acquisition and preprocessing module is used for acquiring basic information, behavior information and game information of training set users and prediction set users from a server log, establishing lost user labels for the training set users and preprocessing original data;

the prediction module is used for identifying lost users of the mobile game shops through the user loss prediction model according to the characteristics of the prediction set users;

wherein, the data acquisition and preprocessing module comprises:

the data acquisition submodule is used for acquiring basic information, behavior information and game information of training set users and prediction set users in corresponding time periods from a server log according to the time period in which the user loss prediction is required, and establishing a loss user label for the training set users; wherein the attrition users are defined as: among the users on the line of the first n days, the users which do not reach the active condition in the last m days are marked as lost users, wherein the active condition is that the total number of the events which have occurred of the users is greater than j and the active time is greater than k days; wherein n, m, j and k are adjustable parameters; and

the preprocessing submodule is used for carrying out data cleaning on the basic information, the behavior information and the game information of the training set users and the prediction set users, and removing the invalid events of the abnormal users and the users; all accounts in the equipment with the account number larger than a set threshold value are defined as abnormal users; defining the event which is the same as the previous event and has the time interval smaller than the set threshold value under the same account as an invalid event;

the feature extraction, selection and normalization module comprises:

the key behavior feature selection and extraction submodule is used for extracting behavior features based on the behavior information of the training set users; selecting the behavior characteristics of the training set users according to the lost user labels of the training set users to obtain key behavior characteristics; extracting key behavior characteristics of the prediction set users based on the key behavior characteristics of the training set users and the behavior information of the prediction set users; and

the characteristic normalization submodule is used for normalizing the basic characteristics, game characteristics and key behavior characteristics of the training set users and the prediction set users;

the method for training the gradient boosting decision tree algorithm comprises the following steps: an optimal user loss prediction model is obtained by setting assessment indexes and adopting a K-fold cross-validation method; the assessment indexes comprise accuracy and recall rate; the accuracy rate refers to the probability of the users predicted to be lost among the lost users, and the recall rate refers to the probability of the users predicted to be lost among the lost users; giving greater weight to the recall rate relative to the precision rate.

6. The system of claim 5, wherein in the key behavior feature selection and extraction submodule, the behavior feature extraction method comprises: classifying the user behavior information and taking the classified category as the behavior feature, wherein the behavior feature comprises: the number of login behaviors, the number of game exit behaviors, the number of payment behaviors, the number of message prompt behaviors, the number of navigation bar clicking behaviors, the number of account information viewing behaviors, the number of group behaviors, the number of customer service communication behaviors, the number of gift bag clicking behaviors, the number of attack and policy viewing behaviors, the number of VIP behaviors, the number of screen recording behaviors and the number of welfare behaviors.

7. The system of claim 6, wherein in the feature normalization submodule, the method for obtaining the key behavior features comprises: and calculating the Pearson correlation coefficient, the mutual information value and the classifier importance of the user behavior characteristics and the lost user labels of each training set, and taking the user behavior characteristics with strong correlation as the key user behavior characteristics.

8. The system of claim 7, wherein the prediction module specifically comprises: the method comprises the steps of taking the characteristics of prediction set users as input variables, and outputting the loss probability of the users through a user loss prediction model; and if the loss probability is greater than the set threshold value, the label is a lost user.