CN112712383A - Potential user prediction method, device, equipment and storage medium of application program - Google Patents

Potential user prediction method, device, equipment and storage medium of application program Download PDF

Info

Publication number
CN112712383A
CN112712383A CN201911016389.8A CN201911016389A CN112712383A CN 112712383 A CN112712383 A CN 112712383A CN 201911016389 A CN201911016389 A CN 201911016389A CN 112712383 A CN112712383 A CN 112712383A
Authority
CN
China
Prior art keywords
data
user
prediction
data set
application program
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911016389.8A
Other languages
Chinese (zh)
Inventor
杨格蒙
江锐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Lilith Technology Corp
Original Assignee
Shanghai Lilith Technology Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Lilith Technology Corp filed Critical Shanghai Lilith Technology Corp
Priority to CN201911016389.8A priority Critical patent/CN112712383A/en
Publication of CN112712383A publication Critical patent/CN112712383A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities

Landscapes

  • Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Engineering & Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Game Theory and Decision Science (AREA)
  • Data Mining & Analysis (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a potential user prediction method of an application program, which comprises the steps of obtaining and classifying user data corresponding to the application program, respectively training a plurality of prediction models by utilizing a training data set to obtain a plurality of standard prediction models, verifying each standard prediction model by utilizing a verification data set, inputting a test data set to each verified standard prediction model to predict potential users in the test data set, and selecting an optimal prediction result from prediction results of each standard prediction model. The scheme can predict the potential users, and meanwhile, the optimal prediction result can be selected from the prediction results of the plurality of standard prediction models to complete prediction of the potential users, so that the accuracy of prediction of the potential users is guaranteed. And further, high-quality service can be provided for potential users, and loss of the potential users is avoided. In addition, the invention also discloses a prediction device, equipment and a storage medium of the potential user.

Description

Potential user prediction method, device, equipment and storage medium of application program
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, and a storage medium for predicting a potential user of an application.
Background
With the rapid development of artificial intelligence, smart phones are more and more widely used, and in order to meet the living needs of users, various manufacturers develop application programs with various modes, including game application programs, living application programs, learning application programs, and the like.
After the user downloads the application program in the application store, the user needs to register personal information on the application program, and then continue to use the application program after the registration is completed. Taking a game application program as an example, after a user registers the game application program, the user can be classified according to the time length of the user logging in the game application program and the consumption behavior of the user on the game application program, wherein the user registering the game application program comprises a plurality of potential users, and how to scientifically predict the potential users, so that high-quality service is provided for the potential users, and the technical problem to be solved by technical personnel in the field is to avoid the loss of the potential users.
Disclosure of Invention
The invention aims to solve the problem that potential users are lost due to the fact that the potential users cannot be scientifically predicted and cannot provide high-quality services in the prior art. Therefore, the invention provides a method, a device, equipment and a storage medium for predicting potential users of application programs, which can scientifically predict the potential users, thereby providing high-quality service for the potential users and avoiding user loss.
In order to solve the above problems, the embodiment of the present invention discloses a method for predicting a potential user of an application program, which obtains user data corresponding to the application program;
classifying the user data to obtain multiple types of user data, and marking the various types of user data, wherein the types of the user data comprise historical user data subjected to value data exchange and historical user data not subjected to value data exchange;
dividing the data set marked with various user data into a training data set, a verification data set and a test data set;
respectively training a plurality of prediction models by using the training data set to obtain a plurality of standard prediction models;
verifying each standard prediction model by using the verification data set;
inputting the test data set into each of the validated standard predictive models to predict potential users in the test data set;
and selecting the optimal prediction result from the prediction results of the standard prediction models.
By adopting the technical scheme, after user data corresponding to an application program is obtained and classified, the user data is divided into a training data set, a verification data set and a test data set, then the training data set is used for respectively training a plurality of prediction models to obtain a plurality of standard prediction models, after the plurality of standard prediction models are obtained, the verification data set is used for verifying each standard prediction model, then the test data set is input into each verified standard prediction model to predict potential users in the test data set, and therefore the optimal prediction result is selected from the prediction results of each standard prediction model. The scheme can predict the potential users, and meanwhile, the potential users are predicted by utilizing the plurality of standard prediction models, and the optimal prediction result can be selected from the prediction results of the plurality of standard prediction models to complete the prediction of the potential users, so that the accuracy of predicting the potential users is ensured. And further, high-quality service can be provided for potential users, and the loss of the potential users is avoided.
Optionally, after classifying the user data to obtain multiple types of user data and marking the various types of user data, the method further includes:
and reducing the dimension of the data set obtained after marking various user data from a high-dimensional space to a low-dimensional space based on a principal component analysis method.
Optionally, after selecting an optimal prediction result from the prediction results of each of the standard prediction models, the method further includes:
performing behavior analysis on the potential user based on the data of the potential user in the embedded point of the application program to obtain value data exchange behavior characteristics of the potential user;
providing the potential user with a service policy corresponding to the value data exchange behavior feature.
Optionally, the user data includes: first data of an old user with the time for logging in the application program exceeding a first preset time and second data of a new user with the time for registering the application program not exceeding a second preset time;
the first data and the second data each include: login ID, data of last login of the application program, role ID of a user in the application program, behavior characteristic data, grade data, value data for exchanging value data, server ID and browsing path data of the application program.
Optionally, the prediction model includes: an XGboost prediction model, a LightGBM prediction model and a Catboost prediction model.
Optionally, the process for constructing the standard XGBoost prediction model of the XGBoost prediction model includes:
setting the maximum probability of the value data exchange behavior characteristics of the application program to the application program by the user of the application program as an objective function;
constructing the XGboost prediction model by using the objective function;
inputting the training data set into the XGboost prediction model, and training the XGboost prediction model to continuously adjust parameters of the XGboost prediction model;
and when the training precision of the XGboost prediction model reaches an ideal value, taking the weight of the XGboost prediction model corresponding to the ideal value as an optimal weight, wherein the XGboost prediction model corresponding to the optimal weight is the standard XGboost prediction model.
Optionally, the construction process of the standard LightGBM prediction model of the LightGBM prediction model includes:
extracting features of the training data set, and analyzing feature values of the features, wherein the features comprise login time of a user logging out of the application program, time of the user continuously logging in the application program, value data exchange behavior features of the user when logging in the application program, and login time of the user logging in the application program;
setting the maximum probability of the value data exchange behavior characteristics of the application program to the application program by the user of the application program as an objective function;
constructing the LightGBM prediction model using the objective function;
and performing barrel division on the characteristic values, and inputting the characteristics corresponding to the characteristic values subjected to barrel division into the LightGBM prediction model for training to obtain the standard LightGBM prediction model.
Optionally, the building process of the standard castboost prediction model of the castboost prediction model includes:
randomly rearranging data in the training data set;
converting the labeled values in the training data set into integer data;
traversing the training data set and converting the classification characteristics of the training data set into a data type;
and inputting the integer data and the classification features of the data type into a Catboost prediction model, and training to obtain the Catboost prediction model.
Further, the embodiment of the invention discloses a device for predicting potential users of an application program, which comprises:
the acquisition module is used for acquiring user data corresponding to the application program;
the classification module is used for classifying the user data to obtain multiple types of user data and marking the various types of user data, wherein the types of the user data comprise user data subjected to historical value data exchange and user data not subjected to historical value data exchange;
the dividing module is used for dividing the data set marked with various user data into a training data set, a verification data set and a test data set;
the training module is used for respectively training a plurality of prediction models by utilizing the training data set to obtain a plurality of standard prediction models;
the verification module is used for verifying each standard prediction model by utilizing the verification data set;
a prediction module for inputting the test data set to each of the validated standard prediction models to predict potential users in the test data set;
and the selection module is used for selecting the optimal prediction result from the prediction results of the standard prediction models. Further, the embodiment of the invention discloses a potential user prediction device of an application program, which comprises:
a memory for storing a computer program;
a processor for executing a computer program stored in the memory to implement the steps of any of the above potential user prediction methods for an application.
Further, an embodiment of the present invention discloses a computer readable storage medium having a prediction program stored thereon, the prediction program being executed by a processor to implement the steps of the method for predicting a potential user of an application program as described in any one of the above.
Additional features and corresponding advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
Fig. 1 is a schematic flowchart of a method for predicting a potential user of an application according to embodiment 1 of the present invention;
fig. 2 is a flowchart illustrating a method for predicting potential users of another application disclosed in embodiment 1 of the present invention;
fig. 3 is a flowchart illustrating a method for predicting potential users of an application according to embodiment 2 of the present invention;
fig. 4 is a schematic structural diagram of a potential user prediction apparatus for an application according to embodiment 3 of the present invention;
fig. 5 is a schematic structural diagram of a potential user prediction device of an application according to embodiment 4 of the present invention.
Detailed Description
The following description of the embodiments of the present invention is provided for illustrative purposes, and other advantages and effects of the present invention will become apparent to those skilled in the art from the present disclosure. While the invention will be described in conjunction with the preferred embodiments, it is not intended that features of the invention be limited to these embodiments. On the contrary, the invention is described in connection with the embodiments for the purpose of covering alternatives or modifications that may be extended based on the claims of the present invention. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. The invention may be practiced without these particulars. Moreover, some of the specific details have been left out of the description in order to avoid obscuring or obscuring the focus of the present invention. It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.
It should be noted that in this specification, like reference numerals and letters refer to like items in the following drawings, and thus, once an item is defined in one drawing, it need not be further defined and explained in subsequent drawings.
The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the description of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", and the like indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of description and simplicity of description, but do not indicate or imply that the device or element being referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
In the description of the present invention, it should be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. Those skilled in the art can understand the specific meaning of the above terms in the present invention in specific cases.
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
Example 1
A method for predicting a potential user of an application disclosed in embodiment 1 of the present invention is described below with reference to fig. 1 and fig. 2, where fig. 1 is a schematic flow chart of the method for predicting a potential user of an application disclosed in embodiment 1 of the present invention, and fig. 2 is a schematic flow chart of the method for predicting a potential user of another application disclosed in embodiment 1 of the present invention.
As shown in fig. 1, the method includes:
s10: user data corresponding to the application program is acquired.
Specifically, the application may be a game application, a life application, a learning application, or the like. Taking the game-type application as an example, the user data includes old player data of an old player that refers to a player who created a character and logged in the application for more than a predetermined time, which may be one week, and new user data in units of days. The characteristics of the new user data and old player data are consistent with those of the new user, hereinafter referred to as characteristics, which refer to characteristics constructed according to objective facts, realistic conventions, business requirements, and game logic, as follows:
the old player data is a table based on snapshot data, the snapshot data refers to data that a player logs out last time in the current day, and the old player data mainly comprises: the created character ID, the game ID of the game application, the server ID, the character warfare, the character level, the total paid amount of the character, the area ID of the character, the gold coin stock of the character, the precious stone stock, the character union, the device information (device type, device model, and device manufacturer) of the player terminal of the game application, the item information (acquisition times, item consumption times, item acquisition quantity, and item consumption quantity) of the user clicking the game application, the total acquisition quantity of each resource, the upgrade times of the game application, the acquisition times of each resource (gift package opening times, gold coin acquisition times, precious stone collection times and total number, stone collection times, grain collection times, wood collection times, and the like), the task completion data (the master line task access and completion condition, the branch line task access and completion condition, the daily task completion and access condition, and the daily task total score)', the game system can be used for the user to perform the game, Mail data (number of times of sending mail, number of times of acquiring system mail), and the like.
As an alternative embodiment of the present invention, the user data includes: first data of an old user who logs in the application program for a time exceeding a first predetermined time (predetermined time period) and second data of a new user who registers the application program for a time not exceeding a second predetermined time (predetermined time period).
The first data and the second data each include: login ID, data of last login of the application program, role ID of the user in the application program, behavior characteristic data, grade data, value data for exchanging value data, server ID and browsing path data of the application program.
S11: classifying the user data to obtain multiple types of user data, and marking the various types of user data, wherein the types of the user data comprise historical user data subjected to value data exchange and historical user data not subjected to value data exchange.
Specifically, the value data exchange refers to that a player carries out a payment behavior, the user data can be classified into old player data and new user data, the old player data and the new user data are collectively called player data, a first time point is selected, the player data is divided into paid players and unpaid players, then a second time point is selected, the first consumption of the unpaid player is marked as a first identifier, if the unpaid player still does not consume, the first identifier is marked as a second identifier, the first identifier can be selected as 1, and the second identifier can be selected as 0.
After the marking of the user data is completed, the marked user data is cleaned, for example, the total number of online payment actions performed by the player is counted, and if no online payment is performed by the player, the missing value is filled with the number "0".
S12: and dividing the data marked with various user data into a training data set, a verification data set and a test data set.
Specifically, the training data set is a sample data set used for training and learning the prediction model.
The validation dataset is a sample dataset used to adjust parameters of the predictive model. After a plurality of prediction models are trained through the training data set, in order to find out the prediction model with the best effect, each prediction model is used for predicting the verification data set, and indexes such as model accuracy are recorded.
The data set is tested to test a sample data set of classification capabilities of the standard predictive model after training and validation.
S13: and training the plurality of prediction models respectively by utilizing the training data set to obtain a plurality of standard prediction models.
Specifically, as an alternative embodiment of the present invention, the prediction model includes: an XGboost prediction model, a LightGBM prediction model, and a Catboost prediction model.
The construction processes of the XGBoost prediction model, the LightGBM prediction model and the CatBoost prediction model are described below respectively:
for the XGBoost prediction model, the training process may specifically be as follows:
and setting the maximum probability of the value data exchange behavior characteristics of the application program to the application program by the user of the application program as a target function.
In particular, the value data exchange behavior characteristic refers to the probability that the user pays online.
And constructing an XGboost prediction model by using the objective function.
Specifically, the constructed objective function is as follows:
Figure BDA0002245842440000081
Figure RE-GDA0002260711100000082
wherein the content of the first and second substances,
Figure BDA0002245842440000083
Figure BDA0002245842440000084
Figure BDA0002245842440000085
is a predicted value at time t-1, xiRepresenting training sample data in the training data set, const being a constant term, yiIn order to be the true value of the value,
Figure BDA0002245842440000086
denotes the training error, Ω (f)t) Representing the regularization term, giDenotes the first derivative, hiRepresenting the second derivative, ft(xi) Shown is a new function that is added for each training overlay.
After the above objective function is constructed, a regularization term is introduced, as follows:
Figure BDA0002245842440000087
ωjthe parameters of the XGboost prediction model are represented, T represents the number of introduced independent quadratic functions, gamma represents the introduced extra leaves, and a lambda tableAnd (4) indicating the characteristic value.
Specifically, after the regular term is added to the objective function, the objective function summed according to the training data set is converted into the leaf summation according to the tree model, and the objective function obtained by adding the regular term is as follows:
Figure BDA0002245842440000088
taking a minimum value for the target function added with the regular term to obtain the weight of the leaf and the value of the target function when the target function takes the minimum value, and the method specifically comprises the following steps:
Figure BDA0002245842440000091
Figure BDA0002245842440000092
wherein the content of the first and second substances,
Figure BDA0002245842440000093
Ij={ig(xi)=j}。Ijrepresented is an example set of leaf nodes j.
And inputting the training data set into the XGboost prediction model, and training the XGboost prediction model to continuously adjust the parameters of the XGboost prediction model.
When the training precision of the XGboost prediction model reaches an ideal value, taking the parameters of the XGboost prediction model corresponding to the ideal value as optimal parameters, wherein the XGboost prediction model corresponding to the optimal parameters is a standard XGboost prediction model.
Specifically, the training accuracy of the XGBoost prediction model is determined by a difference between the splitting gains, and for the splitting gain of the XGBoost prediction model, the value of the objective function after splitting may be subtracted from the value of the objective function before splitting, and specifically may be calculated by using the following formula:
Figure BDA0002245842440000094
wherein the content of the first and second substances,
Figure BDA0002245842440000095
representing the left child node score,
Figure BDA0002245842440000096
Representing the right child node score,
Figure BDA0002245842440000097
Represented are the scores that are not divided.
For the LightGBM prediction model, the training process is specifically as follows:
carrying out feature extraction on the training data set, and analyzing the feature value of each feature, wherein the features comprise: the login method comprises the steps of logging out time of a user logging out of an application program, time of the user continuously logging in the application program, value data exchange behavior characteristics of the user when logging in the application program and logging in time of the user logging in the application program.
And setting the maximum probability of the value data exchange behavior characteristics of the application program to the application program by the user of the application program as a target function.
And constructing a LightGBM prediction model by using the objective function.
And dividing the characteristic values into buckets, and inputting the characteristics corresponding to the characteristic values after the buckets are divided into the LightGBM prediction model for training to obtain the standard LightGBM prediction model.
Specifically, the value data exchange behavior is a behavior that a user clicks an application program to pay, after characteristic values are obtained, indexes are established according to the sequence of the characteristic values, and then each threshold value is traversed according to the indexes to calculate the splitting gain. Specifically, in the embodiment of the invention, the LightGBM prediction model adopts a histogram mode, the characteristic values are firstly subjected to bucket separation, and after the bucket separation operation is performed, the risk of data overfitting can be reduced.
The process of calculating the splitting gain is as follows: after splitting a leaf of the LightGBM prediction model, the histograms of two child nodes need to be recalculated, wherein each bucket of the histogram stores the number of samples of the training data set falling into the bucket, the sum of the first derivatives of the training data set and the sum of the second derivatives of the training data set. By adopting the scheme, the aim of accelerating calculation can be achieved only by traversing the sample data of the test data set on the small leaf.
For the LightGBM prediction model, a Leafwise tree growth mode is adopted, while the traditional XGboost adopts a Levelwise tree growth mode, the leaf with the maximum splitting gain is selected for splitting by calculating the splitting gains of all leaves of the current tree, and the Leafwise prediction model may generate larger gain compared with the Levelwise splitting every time, so the data fitting speed by adopting the LightGBM prediction model is higher.
Furthermore, after One-Hot processing is carried out on the features of the training data set, the method occupies a larger storage space, and a LightGBM prediction model is adopted, so that the features do not need to be processed, an overlarge storage space does not need to be occupied, and the utilization rate of the storage disk is improved.
For the Catboost prediction model, the training process is specifically as follows:
the data in the training data set is randomly rearranged.
The labeled values in the training data set are converted to integer data.
The training data set is traversed and the classification features of the training data set are converted into a dataform.
Inputting the integer data and the classification features of the data type into a Catboost prediction model, and training to obtain the Catboost prediction model.
Specifically, the Catboost prediction model is a gradient lifting tree model framework based on a Boosting tree, and the biggest characteristic is direct support for category characteristics, even support for character string type characteristics.
Converting the labeled values in the training data set into integer data is specifically as follows:
in the embodiment of the invention, the classification results can be classified into two types, namely player payment results and player non-payment results, in the embodiment of the invention, the payment users in the training data set are marked by adopting the mark 1, and the non-payment users in the training data set are marked by adopting the mark 0.
The following formula may be specifically adopted for traversing the training data set and converting the classification features of the training data set into the data type:
Figure BDA0002245842440000111
wherein valuenewThe numerical value after the classification feature conversion is shown, the prior is a smoothing factor, the countInclass is the total number of samples with the same label value of the current sample and the current sample which are traversed to the training data set, and totalCount is the total number of samples of the current sample which are traversed to the training data set.
S14: and verifying each standard prediction model by using a verification data set.
In particular, the validation dataset is a sample dataset used to adjust parameters of the predictive model. After a plurality of prediction models are trained through the training data set, in order to find out the prediction model with the best effect, each prediction model is used for predicting the verification data set, and indexes such as model accuracy and recall rate are recorded.
S15: inputting the test data set into each verified standard prediction model to predict potential users in the test data set;
s16: and selecting the optimal prediction result from the prediction results of the standard prediction models.
In particular, predicting potential users in the test data set refers to users in the test data set who have paid for behavior. And the optimal prediction result in the standard prediction model is evaluated by the accuracy and the recall ratio of each standard prediction model and a confusion matrix, and the prediction result of the standard prediction model with the highest accuracy and the highest recall ratio is selected as the optimal prediction result.
Further, in order to reduce the storage space required by the data, speed up the calculation of the data, and avoid overfitting the data, embodiment 1 of the present invention provides another method for predicting a potential user of an application program, which, on the basis of fig. 1, as shown in fig. 2, after step S11, further includes:
s20: and reducing the dimension of the data set obtained after marking various user data from a high-dimensional space to a low-dimensional space based on a principal component analysis method.
Specifically, the predetermined dimension of the low-dimensional space reduced to the low-dimensional space may be preset by a user, and the principal component analysis method may specifically include the following steps:
first a training dataset and a spatial dimension d' of the low dimensional space are input.
Then centralizing the samples in the training data set, specifically adopting the following mode:
Figure BDA0002245842440000121
where m refers to the number of samples in the training dataset, xiRefers to a sample point of the user data set.
A covariance matrix in the training dataset is then calculated.
And secondly, performing eigenvalue decomposition on the covariance matrix.
And finally, taking the eigenvectors corresponding to the largest d' eigenvalues. And outputs the projection matrix as a data set in a low-dimensional space.
It should be noted that, the principal component analysis method itself is not improved in the embodiment of the present invention, and specifically, the prior art can be referred to.
In particular, it is assumed that the labeled user data sets (training data set, validation data set, and test data set) are centralized, i.e., Σixi=0。
Then, assume the new coordinate system { w }obtained after projective transformation1,w2,......,wdIn which wiOf fingersIs the orthonormal basis vector, i ∈ (0, d). Wherein, | | wi||2=1,wi Twj=0。
Discarding part of the coordinate system in the new coordinate system, which represents to reduce the dimensionality of the data from the dimensionality d to the dimensionality d', and using the sample point w of the user data setiThe projection in the low-dimensional coordinate system is zi=(zi1;zi2;...;zid) Wherein z isij=wj TwiIs wiCoordinates of j-th dimension in low-dimensional coordinate system if based on ziTo reconstruct xiThen will get
Figure BDA0002245842440000122
Integrating the entire user data set, sample points x of the original user data setiAnd sample points reconstructed based on projection
Figure BDA0002245842440000127
The distance of (d) can be calculated using the following equation:
Figure BDA0002245842440000123
wherein W ═ { W ═ W1,w2,...,wdAccording to recent reconstructability, the above equation should be minimized in consideration of wjIs a basis for the orthonormal of the standard,
Figure BDA0002245842440000124
as covariance matrix, there is an optimization objective:
Figure BDA0002245842440000125
s.t.WTW=I
the above formula is used as an optimization target of the principal component analysis method.
Sample point xiThe projection on the hyperplane in the new space is WTxiIf all the sample points xiShould be as far apart as possible, the projected sample point x should be madeiMaximization of variance of (1), post-projection sample point xiThe variance of is
Figure BDA0002245842440000126
The optimization objective may be represented by the following formula:
Figure BDA0002245842440000131
s.t.WTW=I
the lagrange multiplier method is used for the optimization target represented by the above two formulas to obtain the following formula:
XXTwi=λiwi
only the covariance matrix XXTPerforming eigenvalue decomposition, sorting the obtained eigenvalues, i.e. λ1≥λ2≥…≥λdThen, the characteristics corresponding to the first d' characteristic values are taken to form W*=(w1,w2,...,wd') W is to be*=(w1,w2,...,wd') As a solution for principal component analysis.
It should be noted that the principles of the XGBoost prediction model, the LightGBM prediction model and the castboost prediction model can also be referred to in the prior art.
By adopting the technical scheme, after user data corresponding to an application program is obtained and classified, the user data is divided into a training data set, a verification data set and a test data set, then a plurality of prediction models are respectively trained by the training data set to obtain a plurality of standard prediction models, after the plurality of standard prediction models are obtained, the verification data set is used for verifying each standard prediction model, then the test data set is input into each verified standard prediction model to predict potential users in the test data set, and therefore the optimal prediction result is selected from the prediction results of each standard prediction model. The scheme can predict the potential users, and meanwhile, the potential users are predicted by utilizing the plurality of standard prediction models, and the optimal prediction result can be selected from the prediction results of the plurality of standard prediction models to complete the prediction of the potential users, so that the accuracy of predicting the potential users is ensured. And further, high-quality service can be provided for potential users, and loss of the potential users is avoided.
Example 2
After the payment probability of the potential user is predicted, in order to provide the service required by the potential user to the potential user, embodiment 2 is provided in the present invention, and fig. 3 is a flowchart illustrating a method for predicting the potential user of an application disclosed in embodiment 2 of the present invention.
As shown in fig. 3, the method includes:
s30: user data corresponding to the application program is acquired.
S11: and classifying the user data to obtain multiple types of user data, and marking the various types of user data.
S20: and reducing the dimension of the data set in the low-dimensional space to a preset dimension from the high-dimensional space to the low-dimensional space based on a principal component analysis method, wherein the data set is obtained after various user data are marked.
S12: and dividing the data marked with various user data into a training data set, a verification data set and a test data set.
S13: and training the plurality of prediction models respectively by utilizing the training data set to obtain a plurality of standard prediction models.
S14: and verifying each standard prediction model by using a verification data set.
S15: and selecting the optimal standard prediction model from the plurality of standard prediction models as a target prediction model based on the verification result of each standard prediction model.
S16: the test data set is input to a target prediction model to predict potential users in the test data set.
S31: and performing behavior analysis on the potential user based on the data of the potential user in the embedded point of the application program to obtain the value data exchange behavior characteristics of the potential user.
S32: and providing the potential user with a service strategy corresponding to the value data exchange behavior characteristics.
Specifically, the value data exchange behavior characteristics of the user refer to the payment transaction behavior generated after the user logs in the application program. The service strategy refers to that according to the payment items of the user, service items related to the payment items are pushed to the user for the user to select.
By adopting the technical scheme, after user data corresponding to the application program is obtained and classified, the user data is divided into a training data set, a verification data set and a test data set, then the training data set is used for respectively training a plurality of prediction models to obtain a plurality of standard prediction models, after the plurality of standard prediction models are obtained, the verification data set is used for verifying each standard prediction model, then the test data set is input into each verified standard prediction model to predict potential users in the test data set, and therefore the optimal prediction result is selected from the prediction results of each standard prediction model. The scheme can predict the potential users, and meanwhile, the potential users are predicted by utilizing the plurality of standard prediction models, and the optimal prediction result can be selected from the prediction results of the plurality of standard prediction models to complete the prediction of the potential users, so that the accuracy of predicting the potential users is ensured. And further, high-quality service can be provided for potential users, and loss of the potential users is avoided.
In addition, corresponding service strategies can be provided for potential users aiming at the value data exchange behavior characteristics of the users, and the loss of the potential users is avoided.
Example 3
Next, a device for predicting potential users of an application disclosed in embodiment 3 of the present invention is described with reference to fig. 4, and fig. 4 is a schematic structural diagram of the device for predicting potential users of an application disclosed in embodiment 3 of the present invention.
As shown in fig. 4, the apparatus includes:
an obtaining module 40, configured to obtain user data corresponding to an application;
the classifying module 41 is configured to classify the user data to obtain multiple types of user data, and mark the user data, where the types of the user data include user data with history subjected to value data exchange and user data without history subjected to value data exchange.
A dividing module 42, configured to divide the data set labeled with various types of user data into a training data set, a verification data set, and a test data set;
a training module 43, configured to train the multiple prediction models respectively by using a training data set, so as to obtain multiple standard prediction models;
a verification module 44, configured to verify each standard prediction model by using a verification data set;
a selecting module 45, configured to select an optimal standard prediction model from the multiple standard prediction models as a target prediction model based on a verification result of each standard prediction model;
a prediction module 46 for inputting the test data set to each validated standard prediction model to predict potential users in the test data set;
and the selecting module 47 is used for selecting the optimal prediction result from the prediction results of the standard prediction models.
Further, as an optional embodiment of the present invention, the method further includes:
and the dimension reduction module is used for reducing the dimension of the data set obtained after marking various user data from a high-dimensional space to a low-dimensional space based on a principal component analysis method.
Further, as an optional embodiment of the present invention, the method further includes:
the analysis module is used for carrying out behavior analysis on the potential user based on the data of the potential user in the embedded point of the application program to obtain the value data exchange behavior characteristics of the potential user;
and the providing module is used for providing a service strategy corresponding to the value data exchange behavior characteristics for the potential user.
The device for predicting potential users of an application disclosed in embodiment 3 of the present invention is configured such that, after an obtaining module obtains user data corresponding to an application and is classified by a classification module, the user data is divided into a training data set, a verification data set, and a test data set, the training module trains a plurality of prediction models respectively using the training data set to obtain a plurality of standard prediction models, after a plurality of standard prediction models are obtained, the verification module verifies each standard prediction model using the verification data set, and then inputs the test data set to each verified standard prediction model to predict potential users in the test data set, so as to select an optimal prediction result from prediction results of each standard prediction model. According to the scheme, not only can the potential user be predicted, but also the potential user can be predicted by utilizing the plurality of standard prediction models, and the optimal prediction result can be selected from the prediction results of the plurality of standard prediction models to complete the prediction of the potential user, so that the accuracy of predicting the potential user is ensured. And further, high-quality service can be provided for potential users, and loss of the potential users is avoided.
Example 4
Next, a potential user prediction device for an application disclosed in embodiment 4 of the present invention is described with reference to fig. 5, and fig. 5 is a schematic structural diagram of the potential user prediction device for an application disclosed in embodiment 4 of the present invention.
As shown in fig. 5, the apparatus includes:
a memory 50 for storing a computer program;
a processor 51 for executing a computer program stored in a memory for implementing the steps of the potential user prediction method of an application as mentioned in any of the above embodiments.
In the device for predicting potential users of an application disclosed in embodiment 4 of the present invention, after the processor executes the computer program stored in the memory, the following beneficial effects are achieved: after user data corresponding to an application program is obtained and classified, the user data are divided into a training data set, a verification data set and a test data set, then a plurality of prediction models are respectively trained by the training data set to obtain a plurality of standard prediction models, after the plurality of standard prediction models are obtained, the verification data set is used for verifying each standard prediction model, then the test data set is input into each verified standard prediction model to predict potential users in the test data set, and therefore the optimal prediction result is selected from the prediction results of each standard prediction model. The scheme can predict the potential users, and meanwhile, the potential users are predicted by utilizing the plurality of standard prediction models, and the optimal prediction result can be selected from the prediction results of the plurality of standard prediction models to complete the prediction of the potential users, so that the accuracy of predicting the potential users is ensured. And further, high-quality service can be provided for potential users, and loss of the potential users is avoided.
Example 5
In the following, a description is given in conjunction with a computer-readable storage medium disclosed in embodiment 5 of the present invention, where a prediction program is stored on the computer-readable storage medium, and the prediction program is executed by a processor to implement the steps of the potential user prediction method for an application program according to any one of the above embodiments.
In the computer-readable storage medium disclosed in embodiment 5 of the present invention, after the processor executes the computer program stored in the computer-readable storage medium, the following beneficial effects are provided: after user data corresponding to an application program are obtained and classified, the user data are divided into a training data set, a verification data set and a test data set, then a plurality of prediction models are respectively trained by the aid of the training data set to obtain a plurality of standard prediction models, after the plurality of standard prediction models are obtained, the standard prediction models are verified by the aid of the verification data set, then the test data set is input into the verified standard prediction models to predict potential users in the test data set, and therefore the optimal prediction result is selected from the prediction results of the standard prediction models. The scheme can predict the potential users, and meanwhile, the potential users are predicted by utilizing the plurality of standard prediction models, and the optimal prediction result can be selected from the prediction results of the plurality of standard prediction models to complete the prediction of the potential users, so that the accuracy of predicting the potential users is ensured. And further, high-quality service can be provided for potential users, and loss of the potential users is avoided.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, those skilled in the art will understand that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (11)

1. A method for predicting potential users of an application, the method comprising:
acquiring user data corresponding to the application program;
classifying the user data to obtain multiple types of user data, and marking the various types of user data, wherein the types of the user data comprise historical user data subjected to value data exchange and historical user data not subjected to value data exchange;
dividing the data set marked with various user data into a training data set, a verification data set and a test data set;
respectively training a plurality of prediction models by using the training data set to obtain a plurality of standard prediction models;
verifying each standard prediction model by using the verification data set;
inputting the test data set into each of the validated standard predictive models to predict potential users in the test data set;
and selecting the optimal prediction result from the prediction results of the standard prediction models.
2. The method of claim 1, wherein after classifying the user data to obtain a plurality of classes of user data and marking the classes of user data, the method further comprises:
and reducing the dimension of the data set obtained after marking various user data from a high-dimensional space to a low-dimensional space based on a principal component analysis method.
3. The method of claim 2, wherein after selecting the optimal prediction result from the prediction results of each of the standard prediction models, the method further comprises:
performing behavior analysis on the potential user based on the data of the potential user in the embedded point of the application program to obtain value data exchange behavior characteristics of the potential user;
providing the potential user with a service policy corresponding to the value data exchange behavior feature.
4. The method of potential user prediction for an application of claim 3, wherein the user data comprises: first data of an old user with the time for logging in the application program exceeding a first preset time and second data of a new user with the time for registering the application program not exceeding a second preset time;
the first data and the second data each include: login ID, data of last login of the application program, role ID of a user in the application program, behavior characteristic data, grade data, value data for exchanging value data, server ID and browsing path data of the application program.
5. The method for predicting potential users of an application program according to any one of claims 1 to 4, wherein the prediction model comprises: an XGboost prediction model, a LightGBM prediction model, and a Catboost prediction model.
6. The method of claim 5, wherein the XGboost prediction model is constructed by a process comprising:
setting the maximum probability of the value data exchange behavior characteristics of the application program to the application program by the user of the application program as an objective function;
constructing the XGboost prediction model by using the objective function;
inputting the training data set into the XGboost prediction model, and training the XGboost prediction model to continuously adjust parameters of the XGboost prediction model;
and when the training precision of the XGboost prediction model reaches an ideal value, taking the parameters of the XGboost prediction model corresponding to the ideal value as optimal parameters, wherein the XGboost prediction model corresponding to the optimal parameters is the standard XGboost prediction model.
7. The method of predicting potential users of an application of claim 5, wherein the building process of the standard LightGBM predictive model of the LightGBM predictive model comprises:
extracting features of the training data set, and analyzing feature values of the features, wherein the features comprise login time of a user logging out of the application program, time of the user continuously logging in the application program, value data exchange behavior features of the user when logging in the application program, and login time of the user logging in the application program;
setting the maximum probability of the value data exchange behavior characteristics of the application program to the application program by the user of the application program as an objective function;
constructing the LightGBM prediction model using the objective function;
and performing barrel division on the characteristic values, and inputting the characteristics corresponding to the characteristic values subjected to barrel division into the LightGBM prediction model for training to obtain the standard LightGBM prediction model.
8. The method for predicting the potential users of the application program according to claim 5, wherein the construction process of the standard Catboost prediction model of the Catboost prediction model comprises the following steps:
randomly rearranging data in the training data set;
converting the labeled values in the training data set into integer data;
traversing the training data set and converting the classification characteristics of the training data set into a data type;
and inputting the integer data and the classification features of the data type into a Catboost prediction model, and training to obtain the Catboost prediction model.
9. An apparatus for predicting potential users of an application, the apparatus comprising:
the acquisition module is used for acquiring user data corresponding to the application program;
the classification module is used for classifying the user data to obtain multiple types of user data and marking the user data, wherein the types of the user data comprise user data subjected to historical value data exchange and user data not subjected to historical value data exchange;
the dividing module is used for dividing the data set marked with various user data into a training data set, a verification data set and a test data set;
the training module is used for respectively training a plurality of prediction models by utilizing the training data set to obtain a plurality of standard prediction models;
the verification module is used for verifying each standard prediction model by utilizing the verification data set;
a prediction module for inputting the test data set to each of the validated standard prediction models to predict potential users in the test data set;
and the selection module is used for selecting the optimal prediction result from the prediction results of the standard prediction models.
10. A potential user prediction device for an application, comprising:
a memory for storing a computer program;
a processor for executing a computer program stored in the memory to implement the steps of the potential user prediction method of an application program according to any of claims 1 to 8.
11. A computer readable storage medium having a prediction program stored thereon, the prediction program being executable by a processor to implement the steps of the method for potential user prediction of an application program according to any one of claims 1 to 8.
CN201911016389.8A 2019-10-24 2019-10-24 Potential user prediction method, device, equipment and storage medium of application program Pending CN112712383A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911016389.8A CN112712383A (en) 2019-10-24 2019-10-24 Potential user prediction method, device, equipment and storage medium of application program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911016389.8A CN112712383A (en) 2019-10-24 2019-10-24 Potential user prediction method, device, equipment and storage medium of application program

Publications (1)

Publication Number Publication Date
CN112712383A true CN112712383A (en) 2021-04-27

Family

ID=75540578

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911016389.8A Pending CN112712383A (en) 2019-10-24 2019-10-24 Potential user prediction method, device, equipment and storage medium of application program

Country Status (1)

Country Link
CN (1) CN112712383A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113822356A (en) * 2021-09-22 2021-12-21 广东电网有限责任公司 Method and device for classifying electricity users, electronic equipment and storage medium
CN113827978A (en) * 2021-08-17 2021-12-24 杭州电魂网络科技股份有限公司 Loss user prediction method and device and computer readable storage medium
CN113827979A (en) * 2021-08-17 2021-12-24 杭州电魂网络科技股份有限公司 LightGBM-based game churn user prediction method and system
CN113827980A (en) * 2021-08-17 2021-12-24 杭州电魂网络科技股份有限公司 Loss user prediction method and device and computer readable storage medium
CN114584601A (en) * 2022-01-26 2022-06-03 上海钧正网络科技有限公司 User loss identification and intervention method, system, terminal and medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106845731A (en) * 2017-02-20 2017-06-13 重庆邮电大学 A kind of potential renewal user based on multi-model fusion has found method
CN107657267A (en) * 2017-08-11 2018-02-02 百度在线网络技术(北京)有限公司 Product potential user method for digging and device
CN109034903A (en) * 2018-07-27 2018-12-18 广州视源电子科技股份有限公司 User's conversion ratio prediction technique and device, computer readable storage medium
CN109325640A (en) * 2018-12-07 2019-02-12 中山大学 User's Value Prediction Methods, device, storage medium and equipment
CN109767259A (en) * 2018-12-15 2019-05-17 深圳壹账通智能科技有限公司 Based on operation event promotion method, apparatus, equipment and the medium for burying point data
CN109785002A (en) * 2019-01-17 2019-05-21 东华大学 A kind of interior prediction technique of paying of user's game

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106845731A (en) * 2017-02-20 2017-06-13 重庆邮电大学 A kind of potential renewal user based on multi-model fusion has found method
CN107657267A (en) * 2017-08-11 2018-02-02 百度在线网络技术(北京)有限公司 Product potential user method for digging and device
CN109034903A (en) * 2018-07-27 2018-12-18 广州视源电子科技股份有限公司 User's conversion ratio prediction technique and device, computer readable storage medium
CN109325640A (en) * 2018-12-07 2019-02-12 中山大学 User's Value Prediction Methods, device, storage medium and equipment
CN109767259A (en) * 2018-12-15 2019-05-17 深圳壹账通智能科技有限公司 Based on operation event promotion method, apparatus, equipment and the medium for burying point data
CN109785002A (en) * 2019-01-17 2019-05-21 东华大学 A kind of interior prediction technique of paying of user's game

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈宗海, 中国科学技术大学出版社 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113827978A (en) * 2021-08-17 2021-12-24 杭州电魂网络科技股份有限公司 Loss user prediction method and device and computer readable storage medium
CN113827979A (en) * 2021-08-17 2021-12-24 杭州电魂网络科技股份有限公司 LightGBM-based game churn user prediction method and system
CN113827980A (en) * 2021-08-17 2021-12-24 杭州电魂网络科技股份有限公司 Loss user prediction method and device and computer readable storage medium
CN113822356A (en) * 2021-09-22 2021-12-21 广东电网有限责任公司 Method and device for classifying electricity users, electronic equipment and storage medium
CN114584601A (en) * 2022-01-26 2022-06-03 上海钧正网络科技有限公司 User loss identification and intervention method, system, terminal and medium

Similar Documents

Publication Publication Date Title
CN112712383A (en) Potential user prediction method, device, equipment and storage medium of application program
TWI689871B (en) Gradient lifting decision tree (GBDT) model feature interpretation method and device
US10354201B1 (en) Scalable clustering for mixed machine learning data
CN108171280A (en) A kind of grader construction method and the method for prediction classification
CN107766929B (en) Model analysis method and device
CN109960808B (en) Text recognition method, device and equipment and computer readable storage medium
KR20190113924A (en) Methods and devices for building scoring models and evaluating user credit
CN105069534A (en) Customer loss prediction method and device
CN107230108A (en) The processing method and processing device of business datum
CN104915879A (en) Social relationship mining method and device based on financial data
CN106778863A (en) The warehouse kinds of goods recognition methods of dictionary learning is differentiated based on Fisher
CN108596760A (en) loan risk evaluation method and server
CN105786711A (en) Data analysis method and device
CN108665148B (en) Electronic resource quality evaluation method and device and storage medium
CN111241992B (en) Face recognition model construction method, recognition method, device, equipment and storage medium
CN110288459A (en) Loan prediction technique, device, equipment and storage medium
CN110415103A (en) The method, apparatus and electronic equipment that tenant group mentions volume are carried out based on variable disturbance degree index
CN108228684A (en) Training method, device, electronic equipment and the computer storage media of Clustering Model
CN107203772B (en) User type identification method and device
CN110458600A (en) Portrait model training method, device, computer equipment and storage medium
CN112365007A (en) Model parameter determination method, device, equipment and storage medium
CN112070559A (en) State acquisition method and device, electronic equipment and storage medium
CN108647714A (en) Acquisition methods, terminal device and the medium of negative label weight
CN110288468A (en) Data characteristics method for digging, device, electronic equipment and storage medium
CN108629381A (en) Crowd's screening technique based on big data and terminal device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination