CN113827979A - LightGBM-based game churn user prediction method and system - Google Patents

LightGBM-based game churn user prediction method and system Download PDF

Info

Publication number
CN113827979A
CN113827979A CN202110944562.1A CN202110944562A CN113827979A CN 113827979 A CN113827979 A CN 113827979A CN 202110944562 A CN202110944562 A CN 202110944562A CN 113827979 A CN113827979 A CN 113827979A
Authority
CN
China
Prior art keywords
user data
user
time period
lightgbm
game
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110944562.1A
Other languages
Chinese (zh)
Inventor
黄晓鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Electronic Soul Network Technology Co Ltd
Original Assignee
Hangzhou Electronic Soul Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Electronic Soul Network Technology Co Ltd filed Critical Hangzhou Electronic Soul Network Technology Co Ltd
Priority to CN202110944562.1A priority Critical patent/CN113827979A/en
Publication of CN113827979A publication Critical patent/CN113827979A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/70Game security or game management aspects
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/70Game security or game management aspects
    • A63F13/79Game security or game management aspects involving player-related data, e.g. identities, accounts, preferences or play histories
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Business, Economics & Management (AREA)
  • Computer Security & Cryptography (AREA)
  • General Business, Economics & Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application relates to a game churn user prediction method and system based on LightGBM.A churn user label and a non-churn user label are added to a user data set by acquiring the user data set, and the user data set added with the labels is divided into a training set and a testing set; calling a LightGBM packet of python, performing model training according to a training set to obtain a LightGBM model, obtaining the accuracy of the LightGBM model to a test set prediction result, stopping training when the accuracy is greater than a preset value, and obtaining a well-trained LightGBM model; the method comprises the steps of obtaining data to be predicted, outputting a prediction result of the data to be predicted through a trained LightGBM model, and solving the problems of large calculated amount, large memory occupation and low accuracy rate of a game loss user prediction method in the related technology.

Description

LightGBM-based game churn user prediction method and system
Technical Field
The application relates to the technical field of network data mining, in particular to a game churn user prediction method and system based on LightGBM.
Background
Nowadays, online games are increasingly popular in online entertainment, and the scale of game users is on the increasing trend, so that the competition of game market is intensified day by day, for game operators, if users who may lose can be discovered as soon as possible, appropriate intervention can be performed on the users as soon as possible, the staying time of the users in the games can be prolonged to the maximum extent, and meanwhile, the loss of the users in the games can be prevented. In the related art, the game loss user prediction method has the problems of large calculation amount, large memory occupation and low accuracy.
At present, an effective solution is not provided aiming at the problems of large calculation amount, large memory occupation and low accuracy rate of a game loss user prediction method in the related technology.
Disclosure of Invention
The embodiment of the application provides a LightGBM-based game churn user prediction method and system, which are used for at least solving the problems of large calculation amount, large memory occupation and low accuracy rate of a game churn user prediction method in the related art.
In a first aspect, an embodiment of the present application provides a LightGBM-based game churn user prediction method, where the method includes:
acquiring a user data set, adding a loss user label and a non-loss user label to the user data set, and dividing the user data set added with the labels into a training set and a test set;
calling a LightGBM packet of python, performing model training according to the training set to obtain a LightGBM model, obtaining the accuracy of the LightGBM model on the prediction result of the test set, and stopping training when the accuracy is greater than a preset value to obtain a trained LightGBM model;
and acquiring data to be predicted, and outputting a prediction result of the data to be predicted through the trained LightGBM model.
In some embodiments, after the outputting the prediction result of the data to be predicted through the trained LightGBM model, the method further includes:
and adding the loss users to a user loss prediction result table based on the prediction results of the data to be predicted, and implementing targeted saving measures for the loss users according to the user loss prediction result table.
In some embodiments, calling the lightgbm packet of python, performing model training according to the training set comprises:
a import package from lightgbm.
Initializing model clf _ LightGBM ═ LGBMClassifier (max _ bin ═ 5, num _ leaves ═ 32, max _ depth ═ 7);
training model clf _ lightgbm.fit (x _ train, y _ train).
In some of these embodiments, prior to obtaining the user data set, the method further comprises:
acquiring user data of multiple dimensions, constructing a user data table, and preprocessing the user data, wherein the preprocessing comprises the following steps:
storing the user data into a preset format, and vectorizing and expressing the user data stored into the preset format;
and supplementing the missing value of each dimension in the user data according to a missing value supplementing rule, correcting the abnormal value of each dimension in the user data according to an abnormal value judging rule, converting the data type in the user data into a preset type, and obtaining the preprocessed user data.
In some embodiments, after obtaining the preprocessed user data, the method further comprises:
and acquiring correlation thermodynamic diagrams of all dimensions in the preprocessed user data, acquiring dimensions of which the correlation degrees are greater than a preset correlation threshold according to the correlation thermodynamic diagrams, and removing unnecessary dimensions to obtain a user data set.
In some embodiments, the multi-dimensional user data includes a user ID, a user class, a total online duration, a first predetermined time period login number of days, a first predetermined time period login number of times, a second predetermined time period online duration, a second predetermined time period login number of days, a second predetermined time period login number of times, a number of days since the first login was from the current date, a number of days since the last login was from the current date, a first payment time, a last payment time, a first predetermined time period payment number, a first predetermined time period payment amount, a second predetermined time period payment amount, whether the studio user is, a first predetermined time period game number, a first predetermined time period win number, a first predetermined time period game duration, a first predetermined time period game number of days, a second predetermined time period payment amount, a studio user number of days, a second predetermined time period game duration, a second predetermined time period payment amount, a second predetermined time period, a second predetermined time period, a second predetermined time period, a second predetermined time period, a second predetermined time, a second, The game system comprises a second preset time period game frequency, a second preset time period game duration and a second preset time period game days, wherein the first preset time period is greater than the second preset time period.
In a second aspect, embodiments of the present application provide a LightGBM-based game churn user prediction system, which includes a tagging module, a partitioning module, a training module, and a prediction module,
the label module is used for acquiring a user data set and adding a lost user label and a non-lost user label to the user data set;
the dividing module is used for dividing the user data set added with the label into a training set and a test set;
the training module is used for calling a LightGBM packet of python, performing model training according to the training set to obtain a LightGBM model, obtaining the accuracy of the LightGBM model to the prediction result of the test set, stopping training when the accuracy is greater than a preset value, and obtaining the trained LightGBM model;
the prediction module is used for acquiring data to be predicted and outputting a prediction result of the data to be predicted through the trained LightGBM model.
In some embodiments, the system further comprises an adding module, after the predicting module outputs the prediction result of the data to be predicted through the trained LightGBM model,
the adding module adds the lost users to a user loss prediction result table based on the prediction results of the data to be predicted, and implements targeted saving measures for the lost users according to the user loss prediction result table.
In some embodiments, the system further includes a database operation module and a preprocessing module, the server-side database stores a user data table, before the user data set is acquired,
the database operation module is used for reading user data in a user data table, the preprocessing module is used for preprocessing the user data, and the preprocessing comprises the following steps:
storing the user data into a preset format, and vectorizing and expressing the user data stored into the preset format;
and supplementing the missing value of each dimension in the user data according to a missing value supplementing rule, correcting the abnormal value of each dimension in the user data according to an abnormal value judging rule, converting the data type in the user data into a preset type, and obtaining the preprocessed user data.
In some embodiments, the system further comprises a culling module configured to, after obtaining the pre-processed user data,
the removing module obtains the correlation heat maps of all dimensions in the preprocessed user data, obtains the dimensions of which the correlation degrees are larger than a preset correlation threshold according to the correlation heat maps, and removes unnecessary dimensions to obtain a user data set.
Compared with the related art, the LightGBM-based game churn user prediction method provided by the embodiment of the application adds churn user tags and non-churn user tags to a user data set by acquiring the user data set, and divides the user data set added with the tags into a training set and a test set; calling a LightGBM packet of python, performing model training according to a training set to obtain a LightGBM model, obtaining the accuracy of the LightGBM model to a test set prediction result, stopping training when the accuracy is greater than a preset value, and obtaining a well-trained LightGBM model; the method comprises the steps of obtaining data to be predicted, outputting a prediction result of the data to be predicted through a trained LightGBM model, and solving the problems of large calculated amount, large memory occupation and low accuracy rate of a game loss user prediction method in the related technology.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a flow chart of a LightGBM based game churn user prediction method according to an embodiment of the application;
FIG. 2 is a block diagram of a LightGBM based game churn user prediction system according to an embodiment of the present application;
fig. 3 is a block diagram of another LightGBM-based game churn user prediction system according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described and illustrated below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments provided in the present application without any inventive step are within the scope of protection of the present application. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of ordinary skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments without conflict.
Unless defined otherwise, technical or scientific terms referred to herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar words throughout this application are not to be construed as limiting in number, and may refer to the singular or the plural. The present application is directed to the use of the terms "including," "comprising," "having," and any variations thereof, which are intended to cover non-exclusive inclusions; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to the listed steps or elements, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. Reference to "connected," "coupled," and the like in this application is not intended to be limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. Reference herein to "a plurality" means greater than or equal to two. "and/or" describes an association relationship of associated objects, meaning that three relationships may exist, for example, "A and/or B" may mean: a exists alone, A and B exist simultaneously, and B exists alone. Reference herein to the terms "first," "second," "third," and the like, are merely to distinguish similar objects and do not denote a particular ordering for the objects.
Fig. 1 is a flowchart of a method for predicting game churn users based on a LightGBM according to an embodiment of the present application, and as shown in fig. 1, the method includes the following steps:
step S101, acquiring a user data set, adding a loss user label and a non-loss user label to the user data set, and dividing the user data set added with the labels into a training set and a testing set; illustratively, the recorded data used for model training and testing is up to the nth day, if the login times from the (n + 1) th day to the (n + 7) th day are greater than 0, a user is not lost, the user is marked as 0, a non-lost user label is shown, the user is lost for not 7 days is shown, if the login times from the (n + 1) th day to the (n + 7) th day are 0, the user is marked as 1, the user is lost for 7 days is shown, the user data set after the label is added can be divided by sklern, model _ selection, train _ test _ split, and the user data set is divided into a training set and a testing set according to the ratio of 7: 3.
Step S102, calling a LightGBM packet of python, performing model training according to a training set to obtain a lightGBM model, obtaining the accuracy of the lightGBM model to a test set prediction result, stopping training when the accuracy is greater than a preset value, and obtaining a well-trained lightGBM model; in this embodiment, the performance of the model can be seen from the prediction result of predicting the test set, and the training can be stopped when the performance of the model meets the requirement.
And S103, acquiring data to be predicted, and outputting a prediction result of the data to be predicted through the trained LightGBM model.
Compared with the related technology, the XGboost algorithm is used for user loss prediction, and the XGboost algorithm needs to traverse the whole training data for multiple times during each iteration. If the whole training data is loaded into the memory, the size of the training data is limited; if the training data is not loaded in the memory, the training data is repeatedly read and written, which consumes very long time, thus causing the problems of large calculated amount, large memory occupation, easy generation of overfitting and low accuracy.
In the technical scheme of the application, through the steps S101 to S103, the user data set is tagged and divided into the training set and the test set, the LightGBM packet of python is called, the training set is used for model training, the test result of the test set is used for knowing the performance of the model, the trained LightGBM model is obtained, and then the data to be predicted is predicted through the trained LightGBM model, so that the predicted lost user is obtained.
The LightGBM algorithm is a high-efficiency and high-accuracy classification algorithm, is different from an algorithm framework XGboost realized on the basis of a gradient lifting tree, is realized on the basis of a histogram method, can support continuous features and category features, is higher in cache hit rate, lower in calculation complexity and lower in memory consumption, is divided according to leaves, is not similar to the XGboost algorithm, and can reduce a lot of calculation amount due to layer division, and is also supported by the LightGBM, a unilateral gradient sampling algorithm, a mutual exclusion feature binding algorithm and the like, so that the calculation amount can be reduced, and therefore the problems of large calculation amount, large memory occupation and low accuracy rate in game loss user prediction in the related technology are solved through the embodiment of the application.
In some embodiments, after the prediction result of the data to be predicted is output through the trained LightGBM model, the lost user is added to the user loss prediction result table based on the prediction result of the data to be predicted, and a targeted saving measure is implemented for the lost user according to the user loss prediction result table. In this embodiment, in order to prolong the game life cycle of the game user, after the game user is predicted to be the lost user, a saving measure is implemented on the lost user, for example, a prop is sent or the falling probability of article equipment of the lost user is improved, so as to improve the user experience. And a targeted saving measure can be implemented according to the conditions of different lost users, for example, the reason of the loss of the game user is analyzed according to the data of multiple dimensions recorded by the user data table, and then the saving measure is implemented according to the loss reason.
In some embodiments, calling the lightgbm packet of python, performing model training according to the training set includes:
a import package from lightgbm.
Initializing model clf _ LightGBM ═ LGBMClassifier (max _ bin ═ 5, num _ leaves ═ 32, max _ depth ═ 7);
training model clf _ lightgbm.fit (x _ train, y _ train).
The light Gradient Boosting machine (LightGBM) is an algorithm framework implemented based on a Gradient Boosting Tree (GBDT), and the algorithm principle is as follows:
the following first explains the principle of the gradient lifting tree algorithm of the two classes:
let data set D { (x)i,yi)|i=1,2,…,n,xi∈Rm,yiE {0,1} }, because the problem is a binary problem, the loss function l selects a logarithmic loss function, and a specific calculation formula is the following formula 1:
Figure BDA0003216322720000061
the LightGBM is implemented based on a gradient lifting tree, and its objective function is the following equation 2:
Figure BDA0003216322720000071
wherein the content of the first and second substances,
Figure BDA0003216322720000072
recording the predicted value in the t-th round, f, for the ith entrytFor the t-th tree, ft(x)=wq(x) Q denotes a tree ftW is a tree ftWeight of middle leaf, by wiTo represent the weight of the ith leaf, omega (f)t) To relate to the tree ftIs the regular term of complexity of L(t)At point ft(xi) The second order approximation is made as the following equation 3:
Figure BDA0003216322720000073
wherein the content of the first and second substances,
Figure BDA0003216322720000074
due to the fact that
Figure BDA0003216322720000075
Constant in the calculation process of the t tree, and further, solving the minimization L(t)Problem is equivalent to solving minimization
Figure BDA0003216322720000076
There is a problem that,
Figure BDA0003216322720000077
the calculation of (d) is obtained from the following equations 4 to 7:
Figure BDA0003216322720000078
definition Ij={i|q(xi)=j represents the sample set on leaf j, and the following formula 5 is obtained by explicitly quantizing Ω:
Figure BDA0003216322720000079
therefore, for a tree with a fixed structure of q, the optimal weight of each leaf j can be calculated by the above formula
Figure BDA00032163227200000710
Obtained from the following equation 6:
Figure BDA00032163227200000711
further calculate out
Figure BDA00032163227200000712
Obtained from the following equation 7:
Figure BDA00032163227200000713
regarding node partitioning, the LightGBM is different from the algorithm framework XGBoost implemented based on the gradient spanning tree, and the LightGBM is implemented based on the histogram method, and is detailed as follows:
before the histogram algorithm is used, all features need to be subjected to bin division, namely, the value space of each feature is segmented, and different values in each segment are not distinguished, such as [0, 0.3) - >0, [0.3, 0.7) - > 1.
Histogram algorithm pseudo code starts:
Figure BDA00032163227200000714
Figure BDA0003216322720000081
this time, in the (leaf, feature, v) division, the records and G, H under all the division leaf nodes are updated.
The histogram algorithm pseudo-code ends. The termination condition in the histogram algorithm pseudo code may be various, such as setting the maximum number of layers of the tree of the current round to d, and terminating when the maximum number of layers is reached.
Compared with the pre-sorting algorithm, the histogram algorithm has the advantages of higher cache hit rate, lower calculation complexity and lower memory consumption. And LightGBM adopts the division by leaves, unlike the division by layers of XGBoost, which has the advantage of reducing many low-gain node divisions and reducing many calculation amounts. Although partitioning by leaves may result in a tree that is too deep, it may be improved by limiting the depth of the tree.
LightGBM also supports unilateral gradient sampling algorithms, mutually exclusive feature bundling algorithms, and the like. The main idea of the unilateral gradient sampling algorithm is to discard a part of samples with smaller computational information gain to reduce the computational complexity, and the specific operation is to sort the samples from small to large according to the gradient, take the maximum a x 100% and the minimum b x 100%, and multiply the b x 100% samples by the weight (1-a)/b to calculate the gain.
The purpose of the mutually exclusive feature bundling algorithm is to reduce the number of features of the samples, thereby reducing the amount of computation. The starting point is that data in a high-dimensional space is often sparse, and if a plurality of features are mutually exclusive, namely non-0 values are not taken simultaneously, the data can be bundled as 1 feature. However, there are few cases where the value is not 0 at the same time, and by setting the maximum collision ratio γ, when the collision ratio is smaller than γ, bundling is possible.
In some embodiments, before the user data set is acquired, user data of multiple dimensions is acquired, a user data table is constructed, and the user data is preprocessed, where the preprocessing includes: storing the user data into a preset format, and vectorizing and expressing the user data stored into the preset format; the user data is stored as a format required by training the LightGBM model, and the format can be a dataframe format, which is directly supported by a pandas module of python, so that the processing speed is very high, and the calculation speed can also be improved by vectorization.
When the dimension missing value is not supported, the missing value of each dimension in the user data is complemented according to a missing value complementation rule, for example, date missing can be represented by 1970-01-0110: 00:00, so as to distinguish the date missing from the normal date; in addition, values which do not conform to the actual situation or the business logic can be regarded as abnormal values, the abnormal values of each dimension in the user data are corrected according to an abnormal value judgment rule, and the values do not conform to the actual situation when the accumulated online time is one hundred years; the data type in the user data is converted into a preset type, wherein an integer type variable can be converted into an int64 type, a real number type variable is converted into a float64 type, integers correspond to integers, real numbers correspond to floating points, precision loss caused by data type digits is avoided, 64 digits are suitable for all situations, if the value range of dimensionality is small, the digits of the data type can be reduced to reduce memory occupation, for example, the age only needs to be converted into an int8 type, and the preprocessed user data are obtained after the processing.
In some embodiments, after the preprocessed user data is obtained, correlation thermodynamic diagrams of all dimensions in the preprocessed user data are obtained, dimensions of which the correlation degrees are larger than a preset correlation threshold value are obtained according to the correlation thermodynamic diagrams, unnecessary dimensions in the dimensions are removed, and a user data set is obtained. In this embodiment, correlation thermodynamic diagrams about all dimensions can be obtained through the seaborn package of python, feature engineering with excessively high dimensions is not favorable for mathematical modeling, and training efficiency of the model can be improved by eliminating some unnecessary dimensions in the dimensions with the correlation larger than a preset correlation threshold.
In some embodiments, the multi-dimensional user data includes a user ID, a user class, a total online duration, a first predetermined period login number of days, a first predetermined period login number of times, a second predetermined period online duration, a second predetermined period login number of days, a second predetermined period login number of times, a first login number of days from a current date, a last login number of days from a current date, a first payment time, a last payment time, a first predetermined period payment number, a first predetermined period payment amount, a second predetermined period payment number, a second predetermined period payment amount, whether the studio user is, a first predetermined period game number, a first predetermined period win number, a first predetermined period game duration, a first predetermined period game number of days, a first predetermined period game number of times, a second predetermined period payment amount, a studio user, a first predetermined period game number of times, a first predetermined period win number of times, a first predetermined period game number of times, a second predetermined period game number of days, a second predetermined period game number of times, a second predetermined period payment amount, a second predetermined amount, a second predetermined amount, a predetermined period, a predetermined amount, a predetermined period of time, a predetermined period of time, The game playing time of the second preset time period, the game playing time of the second preset time period and the game playing days of the second preset time period, wherein the first preset time period is greater than the second preset time period. The difference between the online time and the game time is that the online time refers to the accumulated time for starting the game client, the game time refers to the time for the user to play the game, such as a breakthrough game, the game time only includes the breakthrough time, and the first preset time period and the second preset time period can be set as required, for example, the first preset time period is set to 14 days, and the second preset time period is set to 7 days.
It should be noted that the steps illustrated in the above-described flow diagrams or in the flow diagrams of the figures may be performed in a computer system, such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flow diagrams, in some cases, the steps illustrated or described may be performed in an order different than here.
The embodiment further provides a system for predicting game churn users based on a LightGBM, which is used for implementing the foregoing embodiments and preferred embodiments, and the description of the system is omitted. As used hereinafter, the terms "module," "unit," "subunit," and the like may implement a combination of software and/or hardware for a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.
Fig. 2 is a block diagram of a LightGBM-based game churn user prediction system according to an embodiment of the present application, and as shown in fig. 2, the system includes a label module 21, a partitioning module 22, a training module 23, and a prediction module 24, where the label module 21 is configured to obtain a user data set and add churn user labels and non-churn user labels to the user data set; the dividing module 22 is configured to divide the labeled user data set into a training set and a test set; the training module 23 is configured to call a LightGBM packet of python, perform model training according to the training set, obtain a LightGBM model, obtain accuracy of the LightGBM model on a prediction result of the test set, stop training when the accuracy is greater than a preset value, and obtain a trained LightGBM model; the prediction module 24 is configured to obtain data to be predicted, and output a prediction result of the data to be predicted through the trained LightGBM model, so that problems of large calculation amount, large memory occupation and low accuracy in a game loss user prediction method in the related art are solved.
In some embodiments, fig. 3 is a block diagram of another LightGBM-based game churn user prediction system according to embodiments of the present application, as shown in fig. 3, the system also comprises a server-side database 31, a database operation module 32, a preprocessing module 33, a removing module 34 and an adding module 35, wherein the server-side database 31 comprises two tables which are respectively a user data table and a user loss prediction result table, the database operation module 32 reads the parameters of the database in the ini configuration file through a configparser packet of Python, the loss prediction system comprises a preprocessing module 33, a removing module 34, an adding module 35 and a user data table, wherein the preprocessing module 33 is used for reading user data in the user data table, preprocessing the read user data, removing unnecessary dimensions in the dimensions with the correlation larger than a preset correlation threshold value by the removing module 34, and adding lost users to the user loss prediction result table and implementing targeted saving measures for the lost users.
The above modules may be functional modules or program modules, and may be implemented by software or hardware. For a module implemented by hardware, the modules may be located in the same processor; or the modules can be respectively positioned in different processors in any combination.
The present embodiment also provides an electronic device comprising a memory having a computer program stored therein and a processor configured to execute the computer program to perform the steps of any of the above method embodiments.
Optionally, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.
It should be noted that, for specific examples in this embodiment, reference may be made to examples described in the foregoing embodiments and optional implementations, and details of this embodiment are not described herein again.
In addition, in combination with the LightGBM-based game churn user prediction method in the foregoing embodiment, the embodiment of the present application may be implemented by providing a storage medium. The storage medium having stored thereon a computer program; the computer program, when executed by a processor, implements any of the LightGBM-based game churn user prediction methods described in the embodiments above.
In one embodiment, a computer device is provided, which may be a terminal. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program when executed by a processor implements a LightGBM based game churn user prediction method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
It should be understood by those skilled in the art that various features of the above-described embodiments can be combined in any combination, and for the sake of brevity, all possible combinations of features in the above-described embodiments are not described in detail, but rather, all combinations of features which are not inconsistent with each other should be construed as being within the scope of the present disclosure.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A LightGBM-based game churn user prediction method, the method comprising:
acquiring a user data set, adding a loss user label and a non-loss user label to the user data set, and dividing the user data set added with the labels into a training set and a test set;
calling a LightGBM packet of python, performing model training according to the training set to obtain a LightGBM model, obtaining the accuracy of the LightGBM model on the prediction result of the test set, and stopping training when the accuracy is greater than a preset value to obtain a trained LightGBM model;
and acquiring data to be predicted, and outputting a prediction result of the data to be predicted through the trained LightGBM model.
2. The method as claimed in claim 1, wherein after the outputting the prediction result of the data to be predicted through the trained LightGBM model, the method further comprises:
and adding the loss users to a user loss prediction result table based on the prediction results of the data to be predicted, and implementing targeted saving measures for the loss users according to the user loss prediction result table.
3. The method of claim 1, wherein calling the lightgbm package of python, wherein performing model training according to the training set comprises:
a import package from lightgbm.
Initializing model clf _ LightGBM ═ LGBMClassifier (max _ bin ═ 5, num _ leaves ═ 32, max _ depth ═ 7);
training model clf _ lightgbm.fit (x _ train, y _ train).
4. The method of claim 1, wherein prior to obtaining the user data set, the method further comprises:
acquiring user data of multiple dimensions, constructing a user data table, and preprocessing the user data, wherein the preprocessing comprises the following steps:
storing the user data into a preset format, and vectorizing and expressing the user data stored into the preset format;
and supplementing the missing value of each dimension in the user data according to a missing value supplementing rule, correcting the abnormal value of each dimension in the user data according to an abnormal value judging rule, converting the data type in the user data into a preset type, and obtaining the preprocessed user data.
5. The method of claim 4, wherein after obtaining the pre-processed user data, the method further comprises:
and acquiring correlation thermodynamic diagrams of all dimensions in the preprocessed user data, acquiring dimensions of which the correlation degrees are greater than a preset correlation threshold according to the correlation thermodynamic diagrams, and removing unnecessary dimensions to obtain a user data set.
6. The method of claim 4, wherein the multi-dimensional user data comprises a user ID, a user class, a total online time, a first predetermined time period login day, a first predetermined time period login number of times, a second predetermined time period online time, a second predetermined time period login day, a second predetermined time period login number of times, a first login number of days from a current date, a last login number of days from the current date, a first payment time, a last payment time, a first predetermined time period payment number, a first predetermined time period payment amount, a second predetermined time period payment number, a second predetermined time period payment amount, whether the studio user is a studio user, a first predetermined time period game number, a first predetermined time period win number, a first predetermined time period game time, a total online time, a first predetermined time period login number of times, a second predetermined time period login number of times, a first predetermined time period payment amount, a second predetermined time period number of money amount, a second predetermined time period game amount, a second predetermined time period number of money amount, a predetermined time period number of game time, a total number of game play, a predetermined number of game numbers, a predetermined time, a predetermined number of game numbers, a predetermined number of game numbers, a predetermined time, a predetermined number of game numbers, a predetermined number of game numbers, a predetermined number of game numbers of a predetermined number of a predetermined period, a predetermined number of game numbers of a predetermined number of game numbers of a predetermined period, a predetermined number of a predetermined, The game playing method comprises a first preset time period game day, a second preset time period game frequency, a second preset time period game duration and a second preset time period game day, wherein the first preset time period is larger than the second preset time period.
7. A LightGBM-based game churn user prediction system is characterized by comprising a label module, a dividing module, a training module and a prediction module,
the label module is used for acquiring a user data set and adding a lost user label and a non-lost user label to the user data set;
the dividing module is used for dividing the user data set added with the label into a training set and a test set;
the training module is used for calling a LightGBM packet of python, performing model training according to the training set to obtain a LightGBM model, obtaining the accuracy of the LightGBM model to the prediction result of the test set, stopping training when the accuracy is greater than a preset value, and obtaining the trained LightGBM model;
the prediction module is used for acquiring data to be predicted and outputting a prediction result of the data to be predicted through the trained LightGBM model.
8. The system of claim 7, further comprising an adding module, wherein after the predicting module outputs the prediction result of the data to be predicted through the trained LightGBM model,
the adding module adds the lost users to a user loss prediction result table based on the prediction results of the data to be predicted, and implements targeted saving measures for the lost users according to the user loss prediction result table.
9. The system of claim 7, further comprising a database operation module and a preprocessing module, wherein the server database stores a user data table, before the user data set is obtained,
the database operation module is used for reading user data in a user data table, the preprocessing module is used for preprocessing the user data, and the preprocessing comprises the following steps:
storing the user data into a preset format, and vectorizing and expressing the user data stored into the preset format;
and supplementing the missing value of each dimension in the user data according to a missing value supplementing rule, correcting the abnormal value of each dimension in the user data according to an abnormal value judging rule, converting the data type in the user data into a preset type, and obtaining the preprocessed user data.
10. The system of claim 9, further comprising a culling module that, after obtaining the pre-processed user data,
the removing module obtains the correlation heat maps of all dimensions in the preprocessed user data, obtains the dimensions of which the correlation degrees are larger than a preset correlation threshold according to the correlation heat maps, and removes unnecessary dimensions to obtain a user data set.
CN202110944562.1A 2021-08-17 2021-08-17 LightGBM-based game churn user prediction method and system Pending CN113827979A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110944562.1A CN113827979A (en) 2021-08-17 2021-08-17 LightGBM-based game churn user prediction method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110944562.1A CN113827979A (en) 2021-08-17 2021-08-17 LightGBM-based game churn user prediction method and system

Publications (1)

Publication Number Publication Date
CN113827979A true CN113827979A (en) 2021-12-24

Family

ID=78960648

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110944562.1A Pending CN113827979A (en) 2021-08-17 2021-08-17 LightGBM-based game churn user prediction method and system

Country Status (1)

Country Link
CN (1) CN113827979A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115054925A (en) * 2022-06-29 2022-09-16 上海益世界信息技术集团有限公司 Method, device, server and storage medium for determining lost user
CN116757750A (en) * 2023-06-05 2023-09-15 广州盈风网络科技有限公司 Operation pushing method, device, equipment and medium based on loss rate prediction

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111242358A (en) * 2020-01-07 2020-06-05 杭州策知通科技有限公司 Enterprise information loss prediction method with double-layer structure
CN111582577A (en) * 2020-05-07 2020-08-25 北京思特奇信息技术股份有限公司 Method, system, medium and equipment for predicting off-network of telecommunication user
CN111803957A (en) * 2020-07-17 2020-10-23 网易(杭州)网络有限公司 Player prediction method and device for online game, computer equipment and medium
CN111861588A (en) * 2020-08-06 2020-10-30 网易(杭州)网络有限公司 Training method of loss prediction model, player loss reason analysis method and player loss reason analysis device
CN112712383A (en) * 2019-10-24 2021-04-27 上海莉莉丝科技股份有限公司 Potential user prediction method, device, equipment and storage medium of application program

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112712383A (en) * 2019-10-24 2021-04-27 上海莉莉丝科技股份有限公司 Potential user prediction method, device, equipment and storage medium of application program
CN111242358A (en) * 2020-01-07 2020-06-05 杭州策知通科技有限公司 Enterprise information loss prediction method with double-layer structure
CN111582577A (en) * 2020-05-07 2020-08-25 北京思特奇信息技术股份有限公司 Method, system, medium and equipment for predicting off-network of telecommunication user
CN111803957A (en) * 2020-07-17 2020-10-23 网易(杭州)网络有限公司 Player prediction method and device for online game, computer equipment and medium
CN111861588A (en) * 2020-08-06 2020-10-30 网易(杭州)网络有限公司 Training method of loss prediction model, player loss reason analysis method and player loss reason analysis device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
佚名: "实例|教你用python写一个电信客户流失预测模型", pages 3 - 6, Retrieved from the Internet <URL:https://baijiahao.baidu.com/s?id=1677864297815823534&wfr=spider&for=pc> *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115054925A (en) * 2022-06-29 2022-09-16 上海益世界信息技术集团有限公司 Method, device, server and storage medium for determining lost user
CN116757750A (en) * 2023-06-05 2023-09-15 广州盈风网络科技有限公司 Operation pushing method, device, equipment and medium based on loss rate prediction

Similar Documents

Publication Publication Date Title
CN108491817B (en) Event detection model training method and device and event detection method
EP3748545A1 (en) Sparsity constraints and knowledge distillation based learning of sparser and compressed neural networks
US20230222353A1 (en) Method and system for training a neural network model using adversarial learning and knowledge distillation
CN113827979A (en) LightGBM-based game churn user prediction method and system
CN109840509B (en) Multilayer cooperative identification method and device for bad anchor in network live video
CN113705775A (en) Neural network pruning method, device, equipment and storage medium
CN109543029B (en) Text classification method, device, medium and equipment based on convolutional neural network
CN111870959B (en) Resource recommendation method and device in game
US11694111B2 (en) Learning device and learning method
CN109145107B (en) Theme extraction method, device, medium and equipment based on convolutional neural network
CN112214766A (en) Method and device for detecting mining trojans, electronic device and storage medium
CN113538070A (en) User life value cycle detection method and device and computer equipment
CN113239697B (en) Entity recognition model training method and device, computer equipment and storage medium
CN113435499B (en) Label classification method, device, electronic equipment and storage medium
CN113827977A (en) Game loss user prediction method and system based on BP neural network
CN113827978A (en) Loss user prediction method and device and computer readable storage medium
CN111125543B (en) Training method of book recommendation sequencing model, computing device and storage medium
US11645573B2 (en) Learning device and learning method
US20200143285A1 (en) Learning device and learning method
CN108920492B (en) Webpage classification method, system, terminal and storage medium
CN113378866B (en) Image classification method, system, storage medium and electronic device
CN113342932B (en) Target word vector determining method and device, storage medium and electronic device
CN113827981A (en) Game loss user prediction method and system based on naive Bayes
US11681926B2 (en) Learning device and learning method
CN114118411A (en) Training method of image recognition network, image recognition method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination