CN116757750A - Operation pushing method, device, equipment and medium based on loss rate prediction - Google Patents

Operation pushing method, device, equipment and medium based on loss rate prediction Download PDF

Info

Publication number
CN116757750A
CN116757750A CN202310653871.2A CN202310653871A CN116757750A CN 116757750 A CN116757750 A CN 116757750A CN 202310653871 A CN202310653871 A CN 202310653871A CN 116757750 A CN116757750 A CN 116757750A
Authority
CN
China
Prior art keywords
behavior
behavior change
change
loss
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310653871.2A
Other languages
Chinese (zh)
Inventor
王传鹏
罗谊烽
吴灿杰
蔡挺
曾锦明
李佳新
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Yingfeng Network Technology Co ltd
Original Assignee
Guangzhou Yingfeng Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Yingfeng Network Technology Co ltd filed Critical Guangzhou Yingfeng Network Technology Co ltd
Priority to CN202310653871.2A priority Critical patent/CN116757750A/en
Publication of CN116757750A publication Critical patent/CN116757750A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • G06Q30/0255Targeted advertisements based on user history
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/70Game security or game management aspects
    • A63F13/79Game security or game management aspects involving player-related data, e.g. identities, accounts, preferences or play histories
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • G06F18/2113Selection of the most significant subset of features by ranking or filtering the set of features, e.g. using a measure of variance or of feature cross-correlation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2148Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the process organisation or structure, e.g. boosting cascade
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • G06Q30/0269Targeted advertisements based on user profile or attribute

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • General Business, Economics & Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Game Theory and Decision Science (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Multimedia (AREA)
  • Computer Security & Cryptography (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application relates to the technical field of user churn rate prediction, in particular to an operation pushing method, device, equipment and medium based on churn rate prediction, wherein the method specifically comprises the following steps: obtaining player attributes, payment behavior changes, login behavior changes and intra-game behavior changes of each losing player; calculating the association degree of the payment behavior change, the login behavior change and the intra-game behavior change with the loss behavior respectively based on a feature selection algorithm, and obtaining a feature data set with the highest association degree with the loss behavior; and combining the characteristic data set and the player attribute into a training data set, and training the GBDT model according to the training data set to obtain a player churn rate prediction model. According to the application, the feature data set with high correlation with the loss behavior is selected through the feature selection algorithm, and the model for predicting the player loss rate is obtained after training based on the GBDT model.

Description

Operation pushing method, device, equipment and medium based on loss rate prediction
Technical Field
The present application relates to the field of user churn rate prediction technologies, and in particular, to an operation pushing method, apparatus, device, and medium based on churn rate prediction.
Background
At present, a main player churn rate prediction method is usually based on a statistical model or machine learning, churn behaviors are predicted by analyzing historical data and behavior patterns of players, but a traditional method usually adopts a manual selection feature or experience-based method to construct a prediction model, complex feature relation and change patterns cannot be captured, the traditional method also usually adopts a single statistical module or machine learning algorithm to predict, potential information and nonlinear relation in data cannot be fully mined, the interpretation of prediction results is lacking, and reasons and influence factors of the prediction results are difficult to understand and interpret, so that the guidance on operation decisions is insufficient.
Disclosure of Invention
The application aims to provide an operation pushing method, device, equipment and medium based on loss rate prediction, which are used for selecting a characteristic data set with high correlation with loss behavior through a characteristic selection algorithm and obtaining a model for predicting player loss rate after training based on a GBDT model so as to solve at least one of the existing problems.
The application provides an operation pushing method based on loss rate prediction, which specifically comprises the following steps:
obtaining player attributes, payment behavior changes, login behavior changes and intra-game behavior changes of each losing player;
calculating the association degree of the payment behavior change, the login behavior change and the intra-game behavior change with the loss behavior respectively based on a feature selection algorithm, and obtaining a feature data set with the highest association degree with the loss behavior;
combining the characteristic data set and the player attribute into a training data set, and training the GBDT model according to the training data set to obtain a player loss rate prediction model;
and determining the predicted loss rate of each player based on the player loss rate prediction model, and making operation pushing actions to players with the predicted loss rate exceeding the predicted loss rate threshold.
Further, the calculating, based on a feature selection algorithm, the association degrees of the payment behavior change, the login behavior change and the intra-game behavior change with the loss behavior respectively, and obtaining a feature data set with the highest association degree with the loss behavior specifically includes:
calculating correlation coefficients of the payment behavior change, the login behavior change and the intra-game behavior change and the loss behavior respectively based on Pearson correlation coefficients, and determining a first feature subset;
calculating mutual information values of the payment behavior change, the login behavior change and the intra-game behavior change and the loss behavior respectively based on mutual information, and determining a second feature subset;
and merging the first feature subset and the second feature subset to obtain a third feature subset, and taking the third feature subset as a feature data set with highest association degree with the churn behavior.
Further, the calculating, based on the Pearson correlation coefficient, correlation coefficients of the payment behavior change, the login behavior change, and the intra-game behavior change with the churn behavior, respectively, and determining a first feature subset specifically includes:
calculating covariance among the payment behavior change, the login behavior change and the intra-game behavior change and the loss behavior respectively;
calculating standard deviations of the payment behavior change, the login behavior change, the intra-game behavior change and the loss behavior respectively;
multiplying the standard deviation of the payment behavior change, the standard deviation of the login behavior change and the standard deviation of the intra-game behavior change with the standard deviation of the loss behavior to obtain a standard deviation product set;
dividing the covariance by each standard deviation product in the standard deviation product set to obtain Pearson correlation coefficients among the payment behavior change, the login behavior change and the intra-game behavior change and the loss behavior respectively;
the first k Pearson correlation coefficients with absolute values close to 1 are filtered to determine a first feature subset.
Further, the calculating the mutual information values of the payment behavior change, the login behavior change and the intra-game behavior change and the churn behavior based on the mutual information, and determining the second feature subset specifically includes:
calculating joint probability distribution among the payment behavior change, the login behavior change and the intra-game behavior change and the loss behavior respectively;
calculating the edge probability distribution of the payment behavior change, the login behavior change, the intra-game behavior change and the loss behavior respectively;
obtaining mutual information values among the payment behavior change, the login behavior change and the intra-game behavior change and the loss behavior respectively through the joint probability distribution and the edge probability distribution according to a mutual information formula;
and determining a second feature subset according to the mutual information value.
Further, the mutual information formula satisfies I (X; Y) = ΣΣp (X, Y) × (P (X, Y)/(P (X) × (P (Y))), where I (X, Y) represents a mutual information value between a joint probability distribution and an edge probability distribution, P (X, Y) represents a joint probability distribution, P (X) represents an edge probability distribution of a change in payment behavior or an edge probability distribution of a change in login behavior or an edge probability distribution of a change in-game behavior, and P (Y) represents an edge probability distribution of a churn behavior.
Further, the merging the first feature subset and the second feature subset to obtain a third feature subset specifically includes:
when the relation among the payment behavior change, the login behavior change and the intra-game behavior change and the loss behavior is a multi-element linear relation or a multi-element nonlinear relation, performing union operation on the first feature subset and the second feature subset to obtain a third feature subset;
or when the relation among the payment behavior change, the login behavior change and the intra-game behavior change and the losing behavior is a unitary linear relation or a unitary nonlinear relation, respectively, performing intersection operation on the first feature subset and the second feature subset to obtain a third feature subset;
or determining the specific gravity of the linear relation data and the nonlinear relation data between the payment behavior change, the login behavior change and the intra-game behavior change and the loss behavior respectively, determining the weight proportion between the first feature subset and the second feature subset according to the specific gravity, and carrying out weighted fusion on the first feature subset and the second feature subset according to the weight proportion to obtain a third feature subset.
Further, training the GBDT model according to the training data set to obtain a player churn rate prediction model, which specifically includes:
dividing the training data set into a feature matrix and a target variable, wherein the feature matrix comprises a feature subset and player attributes selected by each of payment behavior change, login behavior change and in-game behavior, and the variables represent loss behaviors;
setting super parameters of the GBDT model, wherein the super parameters comprise the number of decision trees, the depth of the decision trees, the learning rate, the feature weights and regularization parameters;
and inputting the training data set into the GBDT model for iterative training and tuning to obtain the player churn rate prediction model.
The application also provides an operation pushing device based on loss rate prediction, which specifically comprises:
the sample collection module is used for obtaining the player attribute, payment behavior change, login behavior change and game internal behavior change of each losing player;
the feature selection module is used for calculating the association degree of the payment behavior change, the login behavior change and the intra-game behavior change with the loss behavior respectively based on a feature selection algorithm, and acquiring a feature data set with the highest association degree with the loss behavior;
the prediction model module is used for combining the characteristic data set and the player attribute into a training data set, training the GBDT model according to the training data set, and obtaining a player loss rate prediction model;
and the operation evaluation module is used for determining the predicted loss rate of each player based on the player loss rate prediction model and making operation pushing actions to the players with the predicted loss rates exceeding the predicted loss rate threshold.
The present application also provides a computer device comprising: memory and processor and computer program stored on the memory, which when executed on the processor, implements a churn rate prediction based operational push method as described in any of the above methods.
The application also provides a computer readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the churn rate prediction based operational push method as described in any one of the above methods.
Compared with the prior art, the application has at least one of the following technical effects:
1. according to the scheme, the feature subset with high correlation with the loss behavior is selected by combining the Pearson correlation coefficient and the mutual information feature selection algorithm, and the accuracy and the efficiency of feature selection are improved by comprehensively considering the linear correlation and the nonlinear correlation.
2. By training with the GBDT model and integrating a plurality of decision trees, the accuracy and generalization capability of the prediction model can be improved.
3. Based on the characteristics that the feature subsets have different association relations, the method is suitable for feature selection of different scenes by means of union operation, intersection operation, weighted fusion and the like, and is beneficial to analyzing influence factors of prediction results.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic flow chart of an operation push method based on churn rate prediction according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of an operation pushing device based on churn rate prediction according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It should also be understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
As used in the present description and the appended claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".
Furthermore, the terms "first," "second," "third," and the like in the description of the present specification and in the appended claims, are used for distinguishing between descriptions and not necessarily for indicating or implying a relative importance.
Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.
At present, a main player churn rate prediction method is usually based on a statistical model or machine learning, churn behaviors are predicted by analyzing historical data and behavior patterns of players, but a traditional method usually adopts a manual selection feature or experience-based method to construct a prediction model, complex feature relation and change patterns cannot be captured, the traditional method also usually adopts a single statistical module or machine learning algorithm to predict, potential information and nonlinear relation in data cannot be fully mined, the interpretation of prediction results is lacking, and reasons and influence factors of the prediction results are difficult to understand and interpret, so that the guidance on operation decisions is insufficient.
Referring to fig. 1, an embodiment of the present application provides an operation push method based on churn rate prediction, where the method specifically includes:
s101: obtaining player attributes, payment behavior changes, login behavior changes and intra-game behavior changes of each losing player;
and calculating the association degrees of the payment behavior change, the login behavior change and the intra-game behavior change with the loss behavior respectively based on a feature selection algorithm, and obtaining a feature data set with the highest association degree with the loss behavior.
In this embodiment, the loss of the player in the game goes from potential loss to complete loss, and as the loss intention is continuously overlapped, the player gradually goes to the complete loss result, assuming that the current time is time t, the device logging in between t-14 and t days is still logged in, and the device logging in between t-14 and t+14 days is regarded as not lost, and the device logging out is regarded as the loss player at this stage, and the data of the paying behavior, logging in behavior, in-game behavior and the like of the loss player at t-14 and t days are collected as features.
The payment behavior of the player changes, such as the payment times of nearly seven days, the payment times of nearly thirty days, the number of lost days of stored value, the accumulated recharging of nearly seven days, the accumulated recharging of nearly thirty days, the accumulated recharging of history, whether the third party is recharged to the user, etc. The player login behavior changes, such as a near seven day login day, a near thirty day login day, a cumulative login day average active time, a near seven day login day average active time, a login day, and the like. Intra-game behavior changes such as number of intra-game activity participation, number of task completions, number of item acquisitions, duration of game play, highest level, etc. And obtaining the respective association degree of each feature and the loss behavior through a feature selection algorithm, and determining the feature with the association degree exceeding the association degree threshold value by setting the association degree threshold value, so as to generate a feature data set.
In some embodiments, the calculating, based on a feature selection algorithm, the association degrees of the payment behavior change, the login behavior change and the intra-game behavior change with the churn behavior, and obtaining the feature data set with the highest association degree with the churn behavior specifically includes:
calculating correlation coefficients of the payment behavior change, the login behavior change and the intra-game behavior change and the loss behavior respectively based on Pearson correlation coefficients, and determining a first feature subset;
calculating mutual information values of the payment behavior change, the login behavior change and the intra-game behavior change and the loss behavior respectively based on mutual information, and determining a second feature subset;
and merging the first feature subset and the second feature subset to obtain a third feature subset, and taking the third feature subset as a feature data set with highest association degree with the churn behavior.
In this embodiment, the Pearson correlation coefficient and mutual information are used as the feature selection algorithm of this embodiment, the degree of association between the payment behavior change, the login behavior change and the intra-game behavior change and the loss behavior is calculated, and finally feature subsets selected by the two feature selection algorithms are combined to obtain the target feature subset.
In some embodiments, the calculating, based on the Pearson correlation coefficient, correlation coefficients of the payment behavior change, the login behavior change, and the intra-game behavior change with the churn behavior, respectively, and determining the first feature subset specifically includes:
calculating covariance among the payment behavior change, the login behavior change and the intra-game behavior change and the loss behavior respectively;
calculating standard deviations of the payment behavior change, the login behavior change, the intra-game behavior change and the loss behavior respectively;
multiplying the standard deviation of the payment behavior change, the standard deviation of the login behavior change and the standard deviation of the intra-game behavior change with the standard deviation of the loss behavior to obtain a standard deviation product set;
dividing the covariance by each standard deviation product in the standard deviation product set to obtain Pearson correlation coefficients among the payment behavior change, the login behavior change and the intra-game behavior change and the loss behavior respectively;
the first k Pearson correlation coefficients with absolute values close to 1 are filtered to determine a first feature subset.
In this embodiment, the linear correlation between features can be measured using Pearson correlation coefficients. And calculating correlation coefficients among the features, and evaluating the relevance among the features according to the absolute value of the correlation coefficients.
The Pearson correlation coefficient measures the linear relation degree between two variables, the value range is between-1 and 1, the value is closer to 1 or-1 indicates stronger correlation, namely the absolute value is closer to 1 indicates stronger correlation, and the absolute value is closer to 0 indicates weaker correlation.
Covariance describes the trend of two variables in the same time period, and its calculation formula is as follows:
cov(X,Y)=∑[(X-μX)*(Y-μY)]/(n-1)
where X and Y are observations of two variables, μX and μY are the mean of X and Y, respectively, and n is the number of samples. For example, assume that there is a set of observations x= [ X 1 ,x 2 ,x 3 ,...,x n ]And Y= [ Y ] 1 ,y 2 ,y 3 ...,y n ]The mean μx and μy of X and Y can be calculated and then the covariance cov (X, Y) calculated according to the above formula.
The standard deviation measures the degree of difference between the variable value and the mean value, and the calculation formula is as follows:
where X is the observed value of the variable, μ is the mean value of the variable, and n is the number of samples. For example, assume that there is a set of observations x= [ X 1 ,x 2 ,x 3 ,...,x n ]The mean value mu of X can be calculated and then the standard deviation sigma can be calculated according to the above formula.
The calculation of covariance and standard deviation can help understand the relationship between variables and the degree of dispersion of the variables. In feature selection and correlation evaluation, covariance is used to calculate Pearson correlation coefficients, while standard deviation is used to evaluate the degree of dispersion of the variables. These calculations can help analyze correlations between variables in the dataset and the importance of the variables.
In some embodiments, the calculating the mutual information values of the payment behavior change, the login behavior change, and the intra-game behavior change and the churn behavior based on the mutual information, and determining the second feature subset specifically includes:
calculating joint probability distribution among the payment behavior change, the login behavior change and the intra-game behavior change and the loss behavior respectively;
calculating the edge probability distribution of the payment behavior change, the login behavior change, the intra-game behavior change and the loss behavior respectively;
obtaining mutual information values among the payment behavior change, the login behavior change and the intra-game behavior change and the loss behavior respectively through the joint probability distribution and the edge probability distribution according to a mutual information formula;
and determining a second feature subset according to the mutual information value.
Specifically, the mutual information formula satisfies I (X; Y) = Σp (X, Y) × log (P (X, Y)/(P (X) × P (Y))), where I (X, Y) represents a mutual information value between a joint probability distribution and an edge probability distribution, P (X, Y) represents a joint probability distribution, P (X) represents an edge probability distribution of a change in payment behavior or an edge probability distribution of a change in logging behavior or an edge probability distribution of a change in-game behavior, and P (Y) represents an edge probability distribution of a churn behavior.
In this embodiment, the mutual information is measured by the degree of non-linear relationship between two variables, and the larger the value is, the stronger the correlation is, and the steps include calculating joint probability distribution of the two variables, calculating respective probability distribution of the two variables, and calculating the mutual information using the joint probability distribution and the respective probability distribution. The value of the mutual information will describe the correlation between the individual behavior changes of the player and the churn behavior, with a larger value indicating a stronger correlation. The edge probability refers to the probability that a certain event occurs independently, and is irrelevant to other events.
In some embodiments, the merging the first feature subset and the second feature subset to obtain a third feature subset specifically includes:
when the relation among the payment behavior change, the login behavior change and the intra-game behavior change and the loss behavior is a multi-element linear relation or a multi-element nonlinear relation, performing union operation on the first feature subset and the second feature subset to obtain a third feature subset;
or when the relation among the payment behavior change, the login behavior change and the intra-game behavior change and the losing behavior is a unitary linear relation or a unitary nonlinear relation, respectively, performing intersection operation on the first feature subset and the second feature subset to obtain a third feature subset;
or determining the specific gravity of the linear relation data and the nonlinear relation data between the payment behavior change, the login behavior change and the intra-game behavior change and the loss behavior respectively, determining the weight proportion between the first feature subset and the second feature subset according to the specific gravity, and carrying out weighted fusion on the first feature subset and the second feature subset according to the weight proportion to obtain a third feature subset.
In this embodiment, when there is a complex association between the player's pay-out behavior, the player's login behavior, and the in-game player's behavior and churn behavior, it may be selected to retain more feature information, i.e., obtain the third feature subset through a union operation. In this case, it may be necessary to comprehensively consider the influence of a plurality of features and to retain features having different types of correlations to obtain a more comprehensive feature set. Complex associations may exist between these activities in a variety of interactions and non-linearities, for example, a player's pay-per-play may interact with factors such as their frequency of logging in, liveness in the game, and the type of virtual item purchased, which may also exist in complex interactions with churn activities. This complex association means that the impact of a single feature may not be sufficient to fully describe the cause of the churn behavior, but rather the combined effects of multiple features need to be considered.
When there is a higher consistency between the player's pay-out behavior, the player's login behavior, and the in-game player's behavior and churn behavior, the feature with the higher consistency may be selected, i.e., the third feature subset is obtained through the intersection operation. In this case, there may be similar patterns or trends between features, and selecting features with consistent correlation may provide a more stable and reliable feature set. A feature of higher consistency is that these behaviors may exhibit similar trends or patterns. For example, a player's pay-per-view behavior, login frequency, and in-game liveness may all exhibit a tendency to increase or decrease in relation to churn behavior, i.e., the direction of change in these behaviors is relatively consistent. This consistent association means that selecting features with similar trends during feature selection can provide more stable and reliable information, helping to predict churn behavior more accurately.
In addition, according to the condition that the Pearson correlation coefficient and the mutual information pay more attention to the linear relation and the linear relation among the features respectively, the first feature subset and the second feature subset which are obtained through the Pearson correlation coefficient and the mutual information respectively can be subjected to weighted fusion, and the third feature subset is obtained.
S102: combining the characteristic data set and the player attribute into a training data set, and training the GBDT model according to the training data set to obtain a player loss rate prediction model;
and determining the predicted loss rate of each player based on the player loss rate prediction model, and making operation pushing actions to players with the predicted loss rate exceeding the predicted loss rate threshold.
In this embodiment, when performing effect evaluation of operation push behavior, a target player group may be randomly divided into an experiment group and a control group, where the experiment group is a player group that receives operation push, and the control group is a player group that does not receive operation push, and the effect of operation push behavior is evaluated by indexes such as a churn rate, user activity, payment behavior, etc., whether the influence of operation push on prediction churn rate is significant is tested by an a/B experiment, and adjustment or optimization of operation push behavior is performed according to an experiment result.
In some embodiments, the training the GBDT model according to the training data set to obtain a player churn rate prediction model specifically includes:
dividing the training data set into a feature matrix and a target variable, wherein the feature matrix comprises a feature subset and player attributes selected by each of payment behavior change, login behavior change and in-game behavior, and the variables represent loss behaviors;
setting super parameters of the GBDT model, wherein the super parameters comprise the number of decision trees, the depth of the decision trees, the learning rate, the feature weights and regularization parameters;
and inputting the training data set into the GBDT model for iterative training and tuning to obtain the player churn rate prediction model.
In this embodiment, the GBDT model is an integrated learning model based on decision trees, and the prediction performance is improved by iteratively training a weak classifier, and the combined training data set is divided into a feature matrix (X) and a target variable (y). The feature matrix X contains a selected feature subset and player attribute data, and the training steps of the GBDT (gradient boost decision tree) model are as follows, with the target variable y representing churn behavior (0 representing no churn, 1 representing churn):
first, initializing a model: setting super parameters such as the number of trees, the depth of the trees, the learning rate and the like, and initializing an empty GBDT model.
Second, calculating an initial predicted value: the initial prediction result of the model is initialized using the initial prediction value (e.g., the average value of the target variable).
Thirdly, iterative training: for each tree, the following steps are performed:
a. calculating residual errors: and calculating a residual error between the current predicted value and the actual value. This may help the model to gradually correct errors.
b. Training a decision tree: a decision tree is trained using the feature subset and the residual data. The training process of the decision tree may employ a gradient boosting method to fit the data by minimizing the loss function of the residual.
c. Updating the predicted value: and updating the predicted value of the model according to the currently trained decision tree, and adding the predicted value with the previous predicted result.
Fourth, repeating the iteration: repeating the third step until the preset number of trees is reached.
Fifthly, training is completed: after the GBDT model is trained, the model can be used for loss prediction, loss probability of the player is calculated according to the characteristic data, and a prediction result is returned, wherein the prediction result comprises player equipment ID, loss probability and judgment factors (including player attributes, payment behavior change, login behavior change and intra-game behavior change).
Referring to fig. 2, the embodiment of the present application further provides an operation pushing device 2 based on churn rate prediction, where the device 2 specifically includes:
the sample collection module 201 is configured to obtain a player attribute, a payment behavior change, a login behavior change, and a game internal behavior change of each losing player;
the feature selection module 202 is configured to calculate association degrees of the payment behavior change, the login behavior change, and the intra-game behavior change with the churn behavior, and obtain a feature data set with the highest association degree with the churn behavior, based on a feature selection algorithm;
the prediction model module 203 is configured to combine the feature data set and the player attribute into a training data set, train the GBDT model according to the training data set, and obtain a player churn rate prediction model;
an operation evaluation module 204, configured to determine a predicted churn rate of each player based on the player churn rate prediction model, and make an operation push action to a player whose predicted churn rate exceeds a predicted churn rate threshold.
It can be understood that the content in the embodiment of the operation pushing method based on the churn rate prediction shown in fig. 1 is applicable to the embodiment of the operation pushing device based on the churn rate prediction, and the functions specifically implemented by the embodiment of the operation pushing device based on the churn rate prediction are the same as those in the embodiment of the operation pushing method based on the churn rate prediction shown in fig. 1, and the beneficial effects achieved by the embodiment of the operation pushing method based on the churn rate prediction shown in fig. 1 are the same as those achieved by the embodiment of the operation pushing method based on the churn rate prediction shown in fig. 1.
It should be noted that, because the content of information interaction and execution process between the above devices is based on the same concept as the method embodiment of the present application, specific functions and technical effects thereof may be referred to in the method embodiment section, and will not be described herein.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units and modules are only for distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.
Referring to fig. 3, an embodiment of the present application further provides a computer device 3, including: memory 302 and processor 301 and a computer program 303 stored on the memory 302, which computer program 303, when executed on the processor 301, implements a churn rate prediction based operational push method according to any of the above methods.
The computer device 3 may be a desktop computer, a notebook computer, a palm computer, a cloud server, or the like. The computer device 3 may include, but is not limited to, a processor 301, a memory 302. It will be appreciated by those skilled in the art that fig. 3 is merely an example of the computer device 3 and is not meant to be limiting as the computer device 3, and may include more or fewer components than shown, or may combine certain components, or different components, such as may also include input-output devices, network access devices, etc.
The processor 301 may be a central processing unit (Central Processing Unit, CPU), the processor 301 may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 302 may in some embodiments be an internal storage unit of the computer device 3, such as a hard disk or a memory of the computer device 3. The memory 302 may in other embodiments also be an external storage device of the computer device 3, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card) or the like, which are provided on the computer device 3. Further, the memory 302 may also include both an internal storage unit and an external storage device of the computer device 3. The memory 302 is used to store an operating system, application programs, boot loader (BootLoader), data, and other programs, such as program code for the computer program. The memory 302 may also be used to temporarily store data that has been output or is to be output.
The embodiment of the application also provides a computer readable storage medium, on which a computer program is stored, the computer program, when being executed by a processor, realizes the operation pushing method based on the churn rate prediction according to any one of the above methods.
In this embodiment, the integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application may implement all or part of the flow of the method of the above embodiments, and may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to a photographing device/terminal apparatus, recording medium, computer Memory, read-Only Memory (ROM), random access Memory (RAM, random Access Memory), electrical carrier signals, telecommunications signals, and software distribution media. Such as a U-disk, removable hard disk, magnetic or optical disk, etc. In some jurisdictions, computer readable media may not be electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the disclosed embodiments of the application, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the apparatus/terminal device embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical function division, and there may be additional divisions in actual implementation, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

Claims (10)

1. An operation pushing method based on loss rate prediction is characterized by comprising the following steps:
obtaining player attributes, payment behavior changes, login behavior changes and intra-game behavior changes of each losing player;
calculating the association degree of the payment behavior change, the login behavior change and the intra-game behavior change with the loss behavior respectively based on a feature selection algorithm, and obtaining a feature data set with the highest association degree with the loss behavior;
combining the characteristic data set and the player attribute into a training data set, and training the GBDT model according to the training data set to obtain a player loss rate prediction model;
and determining the predicted loss rate of each player based on the player loss rate prediction model, and making operation pushing actions to players with the predicted loss rate exceeding the predicted loss rate threshold.
2. The method according to claim 1, wherein the calculating, based on a feature selection algorithm, the association degree of the payment behavior change, the login behavior change, and the intra-game behavior change with the churn behavior, respectively, and obtaining the feature data set with the highest association degree with the churn behavior, specifically includes:
calculating correlation coefficients of the payment behavior change, the login behavior change and the intra-game behavior change and the loss behavior respectively based on Pearson correlation coefficients, and determining a first feature subset;
calculating mutual information values of the payment behavior change, the login behavior change and the intra-game behavior change and the loss behavior respectively based on mutual information, and determining a second feature subset;
and merging the first feature subset and the second feature subset to obtain a third feature subset, and taking the third feature subset as a feature data set with highest association degree with the churn behavior.
3. The method according to claim 2, wherein calculating correlation coefficients of the payment behavior change, the login behavior change, and the in-game behavior change with churn behavior, respectively, based on Pearson correlation coefficients, determines a first feature subset, comprising:
calculating covariance among the payment behavior change, the login behavior change and the intra-game behavior change and the loss behavior respectively;
calculating standard deviations of the payment behavior change, the login behavior change, the intra-game behavior change and the loss behavior respectively;
multiplying the standard deviation of the payment behavior change, the standard deviation of the login behavior change and the standard deviation of the intra-game behavior change with the standard deviation of the loss behavior to obtain a standard deviation product set;
dividing the covariance by each standard deviation product in the standard deviation product set to obtain Pearson correlation coefficients among the payment behavior change, the login behavior change and the intra-game behavior change and the loss behavior respectively;
the first k Pearson correlation coefficients with absolute values close to 1 are filtered to determine a first feature subset.
4. The method according to claim 2, wherein calculating mutual information values of the payment behavior change, the login behavior change and the in-game behavior change and the churn behavior, respectively, based on mutual information, determines a second feature subset, in particular comprising:
calculating joint probability distribution among the payment behavior change, the login behavior change and the intra-game behavior change and the loss behavior respectively;
calculating the edge probability distribution of the payment behavior change, the login behavior change, the intra-game behavior change and the loss behavior respectively;
obtaining mutual information values among the payment behavior change, the login behavior change and the intra-game behavior change and the loss behavior respectively through the joint probability distribution and the edge probability distribution according to a mutual information formula;
and determining a second feature subset according to the mutual information value.
5. The method of claim 4, wherein the mutual information formula satisfies I (X; Y) = ΣΣp (X, Y) × (P (X, Y)/(P (X) × (P (Y))), wherein I (X, Y) represents a mutual information value between a joint probability distribution and an edge probability distribution, P (X, Y) represents a joint probability distribution, P (X) represents an edge probability distribution of a change in payment behavior or an edge probability distribution of a change in logging behavior or an edge probability distribution of a change in-game behavior, and P (Y) represents an edge probability distribution of churn behavior.
6. The method according to claim 2, wherein the merging the first feature subset and the second feature subset to obtain a third feature subset, in particular comprises:
when the relation among the payment behavior change, the login behavior change and the intra-game behavior change and the loss behavior is a multi-element linear relation or a multi-element nonlinear relation, performing union operation on the first feature subset and the second feature subset to obtain a third feature subset;
or when the relation among the payment behavior change, the login behavior change and the intra-game behavior change and the losing behavior is a unitary linear relation or a unitary nonlinear relation, respectively, performing intersection operation on the first feature subset and the second feature subset to obtain a third feature subset;
or determining the specific gravity of the linear relation data and the nonlinear relation data between the payment behavior change, the login behavior change and the intra-game behavior change and the loss behavior respectively, determining the weight proportion between the first feature subset and the second feature subset according to the specific gravity, and carrying out weighted fusion on the first feature subset and the second feature subset according to the weight proportion to obtain a third feature subset.
7. The method according to claim 1, wherein training the GBDT model according to the training data set to obtain a player churn rate prediction model specifically comprises:
dividing the training data set into a feature matrix and a target variable, wherein the feature matrix comprises a feature subset and player attributes selected by each of payment behavior change, login behavior change and in-game behavior, and the variables represent loss behaviors;
setting super parameters of the GBDT model, wherein the super parameters comprise the number of decision trees, the depth of the decision trees, the learning rate, the feature weights and regularization parameters;
and inputting the training data set into the GBDT model for iterative training and tuning to obtain the player churn rate prediction model.
8. An operation pushing device based on churn rate prediction, which is characterized in that the device specifically comprises:
the sample collection module is used for obtaining the player attribute, payment behavior change, login behavior change and game internal behavior change of each losing player;
the feature selection module is used for calculating the association degree of the payment behavior change, the login behavior change and the intra-game behavior change with the loss behavior respectively based on a feature selection algorithm, and acquiring a feature data set with the highest association degree with the loss behavior;
the prediction model module is used for combining the characteristic data set and the player attribute into a training data set, training the GBDT model according to the training data set, and obtaining a player loss rate prediction model;
and the operation evaluation module is used for determining the predicted loss rate of each player based on the player loss rate prediction model and making operation pushing actions to the players with the predicted loss rates exceeding the predicted loss rate threshold.
9. A computer device, comprising: memory and processor and computer program stored on the memory, which when executed on the processor, implements the churn rate prediction based operational push method according to any one of claims 1 to 7.
10. A computer readable storage medium, characterized in that it has stored thereon a computer program which, when executed by a processor, implements the churn rate prediction based operational push method according to any one of claims 1 to 7.
CN202310653871.2A 2023-06-05 2023-06-05 Operation pushing method, device, equipment and medium based on loss rate prediction Pending CN116757750A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310653871.2A CN116757750A (en) 2023-06-05 2023-06-05 Operation pushing method, device, equipment and medium based on loss rate prediction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310653871.2A CN116757750A (en) 2023-06-05 2023-06-05 Operation pushing method, device, equipment and medium based on loss rate prediction

Publications (1)

Publication Number Publication Date
CN116757750A true CN116757750A (en) 2023-09-15

Family

ID=87959996

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310653871.2A Pending CN116757750A (en) 2023-06-05 2023-06-05 Operation pushing method, device, equipment and medium based on loss rate prediction

Country Status (1)

Country Link
CN (1) CN116757750A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107609708A (en) * 2017-09-25 2018-01-19 广州赫炎大数据科技有限公司 A kind of customer loss Forecasting Methodology and system based on mobile phone games shop
CN110222267A (en) * 2019-06-06 2019-09-10 中山大学 A kind of gaming platform information-pushing method, system, storage medium and equipment
CN111861588A (en) * 2020-08-06 2020-10-30 网易(杭州)网络有限公司 Training method of loss prediction model, player loss reason analysis method and player loss reason analysis device
CN113827979A (en) * 2021-08-17 2021-12-24 杭州电魂网络科技股份有限公司 LightGBM-based game churn user prediction method and system
CN114358854A (en) * 2022-01-12 2022-04-15 平安普惠企业管理有限公司 Customer loss early warning method, device, equipment and storage medium
US20230080056A1 (en) * 2021-09-13 2023-03-16 Vignav Ramesh Systems and methods for evaluating game elements

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107609708A (en) * 2017-09-25 2018-01-19 广州赫炎大数据科技有限公司 A kind of customer loss Forecasting Methodology and system based on mobile phone games shop
CN110222267A (en) * 2019-06-06 2019-09-10 中山大学 A kind of gaming platform information-pushing method, system, storage medium and equipment
CN111861588A (en) * 2020-08-06 2020-10-30 网易(杭州)网络有限公司 Training method of loss prediction model, player loss reason analysis method and player loss reason analysis device
CN113827979A (en) * 2021-08-17 2021-12-24 杭州电魂网络科技股份有限公司 LightGBM-based game churn user prediction method and system
US20230080056A1 (en) * 2021-09-13 2023-03-16 Vignav Ramesh Systems and methods for evaluating game elements
CN114358854A (en) * 2022-01-12 2022-04-15 平安普惠企业管理有限公司 Customer loss early warning method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN103020978B (en) SAR (synthetic aperture radar) image change detection method combining multi-threshold segmentation with fuzzy clustering
US20090043715A1 (en) Method to Continuously Diagnose and Model Changes of Real-Valued Streaming Variables
CN116306323B (en) Determination method and device of digital twin model, terminal equipment and medium
CN105023165A (en) Method, device and system for controlling release tasks in social networking platform
CN110020866B (en) Training method and device for recognition model and electronic equipment
CN111773732A (en) Target game user detection method, device and equipment
CN113298152B (en) Model training method, device, terminal equipment and computer readable storage medium
CN115510042A (en) Power system load data filling method and device based on generation countermeasure network
CN113807469A (en) Multi-energy user value prediction method, device, storage medium and equipment
CN108197795A (en) The account recognition methods of malice group, device, terminal and storage medium
CN113256335B (en) Data screening method, multimedia data delivery effect prediction method and device
CN116757750A (en) Operation pushing method, device, equipment and medium based on loss rate prediction
CN116662904A (en) Method, device, computer equipment and medium for detecting variation of data type
CN109934352B (en) Automatic evolution method of intelligent model
CN114417942B (en) Clutter recognition method, system, device and medium
CN115563568A (en) Abnormal data detection method and device, electronic device and storage medium
CN111984842B (en) Bank customer data processing method and device
CN114581086A (en) Phishing account detection method and system based on dynamic time sequence network
CN110354501B (en) Behavior prediction method and device and electronic equipment
KR102184655B1 (en) Improvement Of Regression Performance Using Asymmetric tanh Activation Function
CN113313582A (en) Guest refusing and reflashing model training method and device and electronic equipment
US20230013574A1 (en) Distributed Representations of Computing Processes and Events
CN117648585B (en) Intelligent decision model generalization method and device based on task similarity
CN117786478B (en) Multi-model-based user activity prediction method, system, equipment and medium
CN108510071A (en) Feature extracting method, device and the computer readable storage medium of data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination