CN109767269B - Game data processing method and device - Google Patents

Game data processing method and device Download PDF

Info

Publication number
CN109767269B
CN109767269B CN201910037504.3A CN201910037504A CN109767269B CN 109767269 B CN109767269 B CN 109767269B CN 201910037504 A CN201910037504 A CN 201910037504A CN 109767269 B CN109767269 B CN 109767269B
Authority
CN
China
Prior art keywords
data
game
user group
group information
decision
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910037504.3A
Other languages
Chinese (zh)
Other versions
CN109767269A (en
Inventor
陶建容
钟倩
巩琳霞
冯潞潞
沈乔治
范长杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Netease Hangzhou Network Co Ltd
Original Assignee
Netease Hangzhou Network Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Netease Hangzhou Network Co Ltd filed Critical Netease Hangzhou Network Co Ltd
Priority to CN201910037504.3A priority Critical patent/CN109767269B/en
Publication of CN109767269A publication Critical patent/CN109767269A/en
Application granted granted Critical
Publication of CN109767269B publication Critical patent/CN109767269B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The embodiment of the invention provides a game data processing method, which comprises the following steps: obtaining game log data; dividing game users according to the game log data to obtain at least one user group information; extracting feature dimension data corresponding to the user group information from the game log data; inputting the user group information and the corresponding characteristic dimension data into a decision tree model to obtain a trained decision tree model and a corresponding model file; extracting decision path data in the model file; the decision tree algorithm can preferentially select the feature with the strongest judging performance from the multi-dimensional features and place the feature close to the root node, so that researchers are helped to analyze the importance of the loss reason; compared with the existing method, the method greatly reduces the labor cost, improves the efficiency and enhances the reliability.

Description

Game data processing method and device
Technical Field
The present invention relates to the field of game technologies, and in particular, to a method and an apparatus for processing game data.
Background
In a game company, user loss is one of the most concerned problems of relevant departments of manufacturing, planning and operation, and the number and consumption of users are important bases for influencing game development directions, operation strategies and subsequent popularization expenses. For the online game that is currently mainstream and has a strong dependence on the charge for the additional content, the cost of keeping an old user is about 1/5 that is the cost required for obtaining a new user, and the profit difference is further increased in consideration of the possibility of losing a high-consumption user and the cost of developing a new user into a premium user. Therefore, the loss reasons of the users are analyzed, the game experience of the users is known, the improvement scheme is put forward in a targeted manner, the remaining quantity of the game users can be improved, the game playability is enhanced, and the commercial value is improved.
The existing user loss reason analysis method mainly comprises user research and numerical analysis based on statistics; the user research is that partial users who run off are sampled, partial users are selected randomly for research, the research forms are various, common forms include questionnaire survey, telephone inquiry and the like, and the user loss reasons can be obtained visually through the mode. The numerical analysis based on statistics specifically refers to that an operation department carries out statistical analysis on game log data of users, extracts information such as loss rate, retention rate, online time, task completion number and the like from a database, and guesses and analyzes the loss reasons. The common methods include regression analysis, funnel analysis, and feedback investigation.
However, the method for user research has low efficiency and high labor cost, and the research result has no universality. The statistical-based numerical analysis method has the advantages of strong subjectivity of prediction results, high requirements on related experiences of related prediction workers of an operation department, incapability of judging the relative importance of a plurality of characteristics, low efficiency and high labor cost.
Disclosure of Invention
The embodiment of the invention provides a game data processing method and a corresponding game data processing device.
In order to solve the above problem, an embodiment of the present invention discloses a method for processing game data, including:
obtaining game log data;
dividing game users according to the game log data to obtain at least one user group information;
extracting feature dimension data corresponding to the user group information from the game log data;
inputting the user group information and the corresponding characteristic dimension data into a decision tree model to obtain a trained decision tree model and a corresponding model file;
and extracting decision path data in the model file.
Preferably, the method further comprises the following steps:
and visualizing the trained decision tree model to obtain a model visualization result.
Preferably, the step of dividing game users according to the game log data to obtain at least one user group information includes:
extracting the accumulated online time within the preset time period of the game log data;
and dividing game users according to the accumulated online time to obtain the user group information.
Preferably, the step of extracting feature dimension data corresponding to the user group information from the game log data includes:
extracting initial dimension data corresponding to the user group information from the game log data;
and extracting characteristic dimension data in the initial dimension data.
Preferably, the step of inputting the user group information and the corresponding feature dimension data into a decision tree model to obtain a trained decision tree model and a corresponding model file includes:
inputting the user group information and the corresponding feature dimension data into a decision tree model to obtain a trained decision tree model;
and analyzing the trained decision tree model to obtain the model file.
Preferably, before the step of extracting the decision path data from the model file, the method further includes:
and screening the decision path data in the model file to obtain the screened decision path data.
Preferably, the step of extracting the decision path data in the model file includes:
and extracting the screened decision path data.
The embodiment of the invention also discloses a game data processing device, which comprises:
the game log data acquisition module is used for acquiring game log data;
the user group information acquisition module is used for dividing game users according to the game log data to acquire at least one user group information;
the characteristic dimension data extraction module is used for extracting characteristic dimension data corresponding to the user group information from the game log data;
the training module is used for inputting the user group information and the corresponding characteristic dimension data into a decision tree model to obtain a trained decision tree model and a corresponding model file;
and the decision path data extraction module is used for extracting the decision path data in the model file.
The embodiment of the invention also discloses electronic equipment which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor realizes the step of processing the game data when executing the program.
The embodiment of the invention also discloses a computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, and the computer program realizes the steps of processing the game data when being executed by a processor.
The embodiment of the invention has the following advantages:
in the embodiment of the invention, game log data are obtained; dividing game users according to the game log data to obtain at least one user group information; extracting feature dimension data corresponding to the user group information from the game log data; inputting the user group information and the corresponding characteristic dimension data into a decision tree model to obtain a trained decision tree model and a corresponding model file; extracting decision path data in the model file; outputting the decision path data; important performance behaviors of user loss are analyzed from a behavior log of a game user, and importance sequencing is carried out on the behaviors, so that the high efficiency and the expandability of the scheme are embodied; the decision tree algorithm can preferentially select the feature with the strongest judging performance from the multi-dimensional features and place the feature close to the root node, so that researchers are helped to analyze the importance of the loss reason; compared with the existing method, the method greatly reduces the labor cost, improves the efficiency and enhances the reliability.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts;
FIG. 1 is a flow chart of a first step of a first embodiment of a method for processing game data according to an embodiment of the present invention;
FIG. 2 is a flow chart of steps of a second embodiment of a method for processing game data according to the present invention;
fig. 3 is a block diagram of an embodiment of a game data processing apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the technical problems, technical solutions and advantageous effects solved by the embodiments of the present invention more clearly apparent, the embodiments of the present invention are described in further detail below with reference to the accompanying drawings and the embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Referring to fig. 1, a flowchart illustrating a first step of a game data processing method according to an embodiment of the present invention is shown, which may specifically include the following steps:
step 101, obtaining game log data;
in a specific implementation, the embodiments of the present invention may be applied to a mobile terminal, such as a mobile phone, a tablet computer, a personal digital assistant, a wearable device (such as glasses, a watch, etc.), a desktop computer, and so on.
In the embodiment of the present invention, the operating system of the mobile terminal may include Android (Android), IOS, Windows Phone, Windows, and the like.
In another preferred embodiment of the present invention, the embodiment of the present invention may also be applied to a server, where the server may include a PC (Personal Computer) server, a mainframe, a mini-machine, and a cloud server, and the embodiment of the present invention does not specifically limit the type and number of the server.
In particular, the game log data may include data regarding game user behavior while the game application is running; the game log data may include various log fields, timestamps, matching information, transaction information, game information, and the like, which is not limited in this embodiment of the present invention.
Further applied to the embodiment of the present invention, the mobile terminal may obtain the game log data from the game servers, that is, the game log data may be stored in one or more game servers, and the mobile terminal may be connected to the game servers through a network and obtain the game log data through the network.
When the embodiment of the present invention is applied to the server, the server may include a game server itself, and the game server may call a preset process to obtain game log data stored in a memory thereof, and execute a related data processing process described below; the following description will be given taking a mobile terminal as an example.
102, dividing game users according to the game log data to obtain at least one user group information;
specifically, in the embodiment of the present invention, the mobile terminal may divide game users according to the game log data to obtain at least one user group information; specifically, the mobile terminal may extract accumulated online time length data in the game log data, divide game users according to the accumulated online time length data, and obtain at least one user group information.
It should be noted that the accumulated online time length data may be an accumulated time length within a preset time period after the user creates the game account, for example, the accumulated online time length data may be an accumulated time length within 24 hours after the user creates the game account.
For example, when the accumulated online duration data of the game user is not less than a preset time period, the game user can be marked as a retention user, otherwise, the game user is marked as a loss user; the accumulated online time data can be not less than 5 minutes, not less than 10 minutes and not less than 30 minutes; accordingly, the user population information may be divided into a 5 minute retained user population, a 5 minute attrition user population, a 10 minute retained user population, a 10 minute attrition user population, a 30 minute retained user population, and a 30 minute attrition user population, among others.
Further, the mobile terminal can mark the game user as a user group reserved on the next day if the game user has the login behavior on the next day when the game user creates the game account, otherwise, mark the game user as a user group lost on the next day.
In a preferred embodiment of the present invention, the game log data may further include a number of items purchased, an amount of money of items purchased, time consumed for level promotion of the game user, and the like, and the mobile terminal may be divided into different user groups for the game user according to the game log data.
In another specific example of the embodiment of the present invention, the number of purchased props in the game log data may be 0, not less than 2, not less than 5, not less than 8, not less than 10, not less than 15, and the like, and the game users are divided into user groups according to the number of purchased props, so as to obtain different remaining user groups and losing user groups.
In another specific example of the embodiment of the present invention, the game user level promotion consumed time in the game log data may be not less than 1 hour, not less than 6 hours, not less than 12 hours, not less than 24 hours, not less than 48 hours, not less than 96 hours, and the like, and the game user is divided into user groups according to the game user level promotion consumed time, so as to obtain different remaining user groups and lost user groups.
The user group division is only a few examples of the embodiment of the present invention, and different retained user groups and lost user groups may also be obtained by dividing the game user into user groups according to other data in the game log data, which is not limited in the embodiment of the present invention.
103, extracting feature dimension data corresponding to the user group information from the game log data;
in practical application to the embodiment of the present invention, the mobile terminal may extract feature dimension data corresponding to the user group information from the game log data; it should be noted that the game log data may include a plurality of initial dimension data, and the feature dimension data is extracted from the initial dimension data; for example, the initial dimension data may include login times, online duration, total amount of virtual items, and the like, which is not limited in this embodiment of the present invention.
It should be noted that each user group information may include Identifications (IDs) corresponding to a plurality of game users; the game user's ID is associated with the game log data, which may include a plurality of initial dimension data; the initial dimension data is the statistic or time characteristic of each game user; the feature dimension data is the statistics or time features of each game user after being filtered.
Because the game user's ID is associated with the game log data, the user group information may have a mapping relationship with the initial dimension data, i.e., the user group information may also have a mapping relationship with the feature dimension data.
For example, the user group information may be a 5-minute saved user group, and the feature dimension data corresponding to the 5-minute saved user group may include: number of task completions per day, highest experience value on the day, duration of novice guidance, etc.
The embodiment of the present invention does not specifically limit the types of the feature dimension data in the game log data.
Step 104, inputting the user group information and the corresponding characteristic dimension data into a decision tree model to obtain a trained decision tree model and a corresponding model file;
further applied to the embodiment of the invention, after the user group information and the corresponding feature dimension data are obtained, the mobile terminal can input the user group information and the corresponding feature dimension data into the decision tree model to obtain the trained decision tree model and the corresponding model file.
It should be noted that the types of the decision Tree model may include an ID3(Iterative dichotomizer) model, a C4.5 model, a CART (Classification and Regression Tree) model, and the like, and the embodiment of the present invention is not limited thereto.
And training the decision tree model by taking the user group information and the corresponding feature dimension data as samples to obtain the trained decision tree model.
For example, the mobile terminal may input the above-mentioned 5-minute retained user group, 5-minute lost user group, 10-minute retained user group, 10-minute lost user group, 30-minute retained user group, and 30-minute lost user group and their corresponding feature dimension data into the above-mentioned CART model to obtain a trained CART model.
After the trained decision tree model is obtained, the trained decision tree model is analyzed and stored as another file with a specific format, for example, a json format file, which is the model file.
And 105, extracting decision path data in the model file.
In practical application to the embodiment of the invention, the mobile terminal can extract decision path data in the model file; specifically, the model file may be analyzed to obtain a large amount of decision path data, and some screening conditions may be set to screen or prune the large amount of decision path data.
It should be noted that each piece of decision path data includes a plurality of nodes, and each node represents a certain decision rule, that is, each piece of decision path data is composed of a plurality of decision rules.
Preferably, the method may further comprise: and outputting the decision path data.
Specifically, when the decision path data is obtained, the mobile terminal may output the decision path data in a file with a plurality of formats, for example, the decision path data may be output in a table format.
In the embodiment of the invention, game log data are obtained; dividing game users according to the game log data to obtain at least one user group information; extracting feature dimension data corresponding to the user group information from the game log data; inputting the user group information and the corresponding characteristic dimension data into a decision tree model to obtain a trained decision tree model and a corresponding model file; extracting decision path data in the model file; outputting the decision path data; important performance behaviors of user loss are analyzed from a behavior log of a game user, and importance sequencing is carried out on the behaviors, so that the high efficiency and the expandability of the scheme are embodied; the decision tree algorithm can preferentially select the feature with the strongest judging performance from the multi-dimensional features and place the feature close to the root node, so that researchers are helped to analyze the importance of the loss reason; compared with the existing method, the method greatly reduces the labor cost, improves the efficiency and enhances the reliability.
Referring to fig. 2, a flowchart illustrating steps of a second game data processing method embodiment of the present invention is shown, which may specifically include the following steps:
step 201, obtaining game log data;
the embodiment of the invention can be applied to a mobile terminal or a server, the game log data can comprise various log fields, timestamps, matching information, transaction information, game information and the like, the mobile terminal can acquire the game log data from the game server, namely the game log data can be stored in the game server, and the mobile terminal can acquire the game log data through a network.
When the embodiment of the present invention is applied to the server, the server may include a game server itself, and the game server may call a preset process to obtain game log data stored in a memory of the game server itself, and a mobile terminal is taken as an example for description below.
Step 202, extracting the accumulated online time of the game log data within a preset time period;
specifically, the mobile terminal may identify an accumulated online duration within a preset time period of the game log data; for example, the expected time period may be 24 hours or 48 hours, and the accumulated online time of the game user within 24 hours or 48 hours is calculated.
Step 203, dividing game users according to the accumulated online time to obtain the user group information;
further, the mobile terminal can divide game users according to accumulated online time to obtain the user group information;
for example, the accumulated online time data may be not less than 5 minutes, not less than 10 minutes, not less than 30 minutes, and the like, and the user group information may be divided into a 5-minute retained user group, a 5-minute attrition user group, a 10-minute retained user group, a 10-minute attrition user group, a 30-minute retained user group, and a 30-minute attrition user group.
Step 204, extracting initial dimension data corresponding to the user group information from the game log data;
the user group information and the initial dimension data have a mapping relation, and the mobile terminal can extract the initial dimension data corresponding to the user group information from the game log data according to the mapping relation.
Step 205, extracting feature dimension data in the initial dimension data;
further applied to the embodiment of the invention, the mobile terminal can extract the characteristic dimension data from the initial dimension data; specifically, the mobile terminal receives some preset threshold values, and extracts feature dimension data in the initial dimension data according to the threshold values.
Step 206, inputting the user group information and the corresponding feature dimension data into a decision tree model to obtain a trained decision tree model;
it should be noted that the types of the decision Tree model may include an ID3(Iterative dichotomizer) model, a C4.5 model, a CART (Classification and Regression Tree) model, and the like, and the embodiment of the present invention is not limited thereto.
And training the decision tree model by taking the user group information and the corresponding feature dimension data as samples to obtain the trained decision tree model.
Step 207, analyzing the trained decision tree model to obtain the model file;
in a specific example of the embodiment of the present invention, after obtaining the trained decision tree model, the trained decision tree model is analyzed and stored as a file with some specific format as the model file, for example, the file with the specific format may be a json format file.
In a preferred embodiment of the embodiments of the present invention, the method further includes: and visualizing the trained decision tree model to obtain a model visualization result.
Step 208, screening the decision path data in the model file to obtain screened decision path data;
specifically, the mobile terminal may filter some of the set filtering conditions for the large amount of decision path data, for example, the filtering conditions may include that the ratio of the number of lost users or the ratio of the number of retained users on a certain node reaches a first threshold (e.g., 0.85); or the number of lost users on a certain node is increased sharply compared with the number of lost users on the previous node, and the total number of lost users on the node is larger than a second threshold value.
It should be noted that the above-mentioned screening conditions are merely examples of the present invention, and the screening conditions are not specifically limited in the present invention.
In a preferred embodiment of the present invention, pruning operations may be performed on the decision path data,
and step 209, outputting the decision path data.
In the embodiment of the present invention, after obtaining the decision path data, the mobile terminal may output the decision path data in a file with a plurality of formats, for example, the decision path data may be output in a table format, for example, the decision path data may be output to a table with an xls or xlsx format.
In the embodiment of the invention, game log data are obtained; extracting the accumulated online time within the preset time period of the game log data; dividing game users according to the accumulated online time to obtain the user group information; extracting initial dimension data corresponding to the user group information from the game log data; extracting characteristic dimension data in the initial dimension data; inputting the user group information and the corresponding feature dimension data into a decision tree model to obtain a trained decision tree model; analyzing the trained decision tree model to obtain the model file; screening the decision path data in the model file to obtain screened decision path data; outputting the decision path data; important performance behaviors of user loss are analyzed from a behavior log of a game user, and importance sequencing is carried out on the behaviors, so that the high efficiency and the expandability of the scheme are embodied; the decision tree algorithm can preferentially select the feature with the strongest judging performance from the multi-dimensional features and place the feature close to the root node, so that researchers are helped to analyze the importance of the loss reason; compared with the existing method, the method greatly reduces the labor cost, improves the efficiency and enhances the reliability; the behavior of the game user can be known and analyzed in time, and the loss reason can be generated, so that a game developer can know the shortage in the game in time, a game development part can adjust the content of the game in time, and the attraction and the commercial value of the game to the user are improved.
In order to enable those skilled in the art to better understand the embodiments of the present invention, a specific example is illustrated.
The method comprises the following steps: data acquisition
And extracting game log data of the game user from the game account creation, wherein the game log data comprises various log fields, time stamps and related detailed log information, and is convenient for further data processing.
Step two: group partitioning
Different attrition problems are taken as an example for explanation, and users can be divided into two types of attrition groups and retention groups for different attrition problems. For example, for the 5-minute churn problem, counting the accumulated online time within 24 hours since the game account is created in the user log, and if the accumulated online time is not less than 5 minutes, the user can be marked as a retention user; otherwise, the label is lost user. The same may result in a 10 minute attrition user population, a 10 minute retention user population, a 30 minute attrition user population, and a 30 minute retention user population. For the next day attrition problem, if a newly-built user has a login behavior the next day when a game account is created, dividing the user into a retention group of the next day, otherwise, dividing the user into an attrition group of the next day.
Step three: feature engineering
For the 5-minute churn problem, the 10-minute churn problem and the 30-minute churn problem, all game log data of a corresponding churn user group within 24 hours of the created game account and game log data of the retained users within the time ranges of 5 minutes, 10 minutes and 30 minutes can be extracted, and feature engineering is performed, namely feature dimension data is extracted from initial dimension data in the game log data. For the next day churn problem, extracting a churn user group and keeping all game log data of the current day of the game account created by the user group. After the game log data is obtained, feature dimension data in the game log data is extracted, wherein the feature dimension data can include statistics such as matching information, transaction information and game information of a user and time features, and has 60 dimensions, and is specifically shown in table 1.
Figure BDA0001946407070000121
Figure BDA0001946407070000131
Figure BDA0001946407070000141
Figure BDA0001946407070000151
Figure BDA0001946407070000161
Figure BDA0001946407070000171
Table 1: characteristic dimension data table of the embodiment of the invention
Step four: attrition prediction
In order to analyze the loss reason, loss prediction needs to be performed first, and a CART model is used to analyze information carried in game log data of a user to predict whether the user will lose or not. The CART model algorithm is a decision tree algorithm for classification and regression, which is composed of feature selection, tree generation and pruning, and selects an optimal feature by calculating a kini index (GINI value) when solving a regression problem, determines an optimal binary segmentation point of the feature, recursively divides each feature, divides a feature space into a limited number of units, and determines a predicted probability distribution on the units, that is, a conditional probability distribution output under a given condition is input. Wherein the kini index reflects the degree of misordering of the data set. When the kini index is larger, the current data is more chaotic, and the node is less pure, so the CART selects the attribute of making the GINI value of the child node small as a splitting scheme. When the algorithm is applied to the regression problem, the principle of feature selection is slightly changed, and the sum of mean square deviations of the divided samples is the minimum.
And dividing a data set with balanced positive and negative samples into a training set and a testing set, fitting the training set to obtain the decision tree model, and finishing the training of the loss model by adopting precision, call and f1 core as evaluation indexes, namely inputting the user group information and the corresponding characteristic dimension data into the decision tree model to finish the training of the decision tree model.
Step five: model visualization
The two-class prediction by using the decision tree model is actually a decision to decompose the prediction process into a plurality of sub-problems according to the characteristics or attributes of data, so as to carry out a 'divide and conquer' reasoning task. The rule division mechanism of the whole decision tree model is huge, so that the internal structure of the decision tree model is visualized after the loss prediction based on the decision tree model is completed for facilitating analysis, understanding and delivery, and a model visualization result is obtained, so that the decision process of the decision tree model in the prediction process is analyzed.
Step six: formation of cause of attrition
After the decision tree model is obtained, the decision tree model is analyzed through analyzing leaf nodes, and therefore data and relevant decision information of each decision path from a root node to a leaf node are obtained. However, as the depth of the decision tree model deepens, the number of paths exponentially increases, and the whole decision system is gradually large and complex, which is not beneficial to further analysis. Therefore, the decision tree model needs to be analyzed, the decision path is controlled to be additionally provided with the screening condition, the output quantity of the decision path is controlled, and therefore efficient and practical classification characteristic information is extracted.
After the training of the loss prediction model is completed, the maximum depth of the decision tree model needs to be controlled, and the model file is stored as a json structure, wherein the model file stores information of each node in the decision tree model, such as node number, positive and negative sample number, Gini index and the like. Each series data structure is used for storing information corresponding to each node and father node and son node information corresponding to each node, extracting and storing text information in the model file, and restoring the decision tree model from a text perspective, so that the tasks of text reading and analysis of the decision tree model are completed.
After the analysis task of the decision tree model is completed, aiming at the problems of huge data quantity and complex rule system of the decision path, one or more screening conditions can be preset to screen and prune the decision path, and the current simpler and more effective screening conditions are as follows:
1. the lost user number ratio or the reserved user number ratio on a certain node reaches a first threshold (such as 0.85), a series of decision conditions from the root node examination to the node are extracted and recorded as decision paths, and therefore the user group can better distinguish lost users from reserved users after reaching the node from the initial 0.5 through a series of dichotomy selections.
2. And if the number of lost users on a certain node is increased steeply compared with that of the lost users on the previous node and the total number of the lost users on the node is larger than a second threshold value, a series of decision conditions from the examination of the root node to the node are extracted and recorded as decision paths, the classification performance of the group meeting certain conditions on the node is better, namely the loss retention performance of the node on the users of the type is greatly influenced.
3. And if the loss sample ratio of a certain node is greater than the retention sample ratio and the retention sample ratio of the previous node is greater than the loss sample ratio, extracting a series of decision conditions from the root node examination to the node, and recording the decision conditions as a decision path, which indicates that the judgment conditions of the node have great influence on the loss and retention of a user group meeting certain conditions.
And after pruning the decision tree model through the screening condition, reconstructing a new decision tree model. Starting from the root node r of the new model, a series of new leaf nodes are found and are marked as L ═ L1,l2,...li...ln}. From each leaf node liThe starting and backtracking to the root node r can obtain a branch node liTo lead toStoring the paths of the root node r, arranging the nodes in the paths from shallow to deep according to the depth of the nodes, and finally obtaining a series of decision paths W ═ W1,w2,...wi,...wnThe decision path data of the decision tree after pruning is obtained.
And outputting the reconstructed model information including the decision rule represented by each node, the number of users on each node and other information into an xlsx file, wherein each file contains the data of the multiple decision paths, and the output form of each path in the xlsx file is correspondingly shown in a table 2.
Figure BDA0001946407070000191
Table 2: examples of decision Path data in embodiments of the present invention
As table 2 shows the decision rule of a certain decision path in the decision tree model, each row corresponds to a decision rule, i.e. a node in the decision tree model except for a leaf node. The first column of each row corresponds to a decision condition, followed by a retention quantity, a loss quantity, a retention population fraction, a loss population fraction, a retention sample population fraction, a loss sample population fraction, the feature retention full quantity sample fraction, and the feature loss full quantity sample fraction in that order. According to the screening and simplified dividing conditions and the automatic calculation and comparative analysis of the statistical values, the automatic generation of the loss reasons based on the decision tree can be completed, the working flow and the strength of manual analysis are greatly simplified, and high efficiency and practicability are achieved.
It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments of the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.
Referring to fig. 3, a block diagram of a game data processing apparatus according to an embodiment of the present invention is shown, which may specifically include the following modules:
a game log data acquisition module 301, configured to acquire game log data;
a user group information obtaining module 302, configured to divide game users according to the game log data to obtain at least one user group information;
a feature dimension data extraction module 303, configured to extract feature dimension data corresponding to the user group information from the game log data;
a training module 304, configured to input the user group information and the corresponding feature dimension data into a decision tree model, and obtain a trained decision tree model and a corresponding model file;
a decision path data extraction module 305, configured to extract decision path data in the model file.
Preferably, the method further comprises the following steps:
and the model visualization result obtaining module is used for visualizing the trained decision tree model to obtain a model visualization result.
Preferably, the user group information obtaining module includes:
the accumulated online time extraction submodule is used for extracting the accumulated online time within the preset time period of the game log data;
and the user group information obtaining submodule is used for dividing game users according to the accumulated online time to obtain the user group information.
Preferably, the feature dimension data extraction module includes:
an initial dimension data extraction submodule, configured to extract initial dimension data corresponding to the user group information from the game log data;
and the characteristic dimension data extraction submodule is used for extracting the characteristic dimension data in the initial dimension data.
Preferably, the training module comprises:
the decision tree model obtaining submodule is used for inputting the user group information and the corresponding characteristic dimension data into a decision tree model to obtain a trained decision tree model;
and the model file obtaining submodule is used for analyzing the trained decision tree model to obtain the model file.
Preferably, the module connected to the decision path data extraction module further includes:
and the obtaining module is used for screening the decision path data in the model file to obtain the screened decision path data.
Preferably, the decision path data extraction module includes:
and the extraction submodule is used for extracting the screened decision path data.
The embodiment of the invention also discloses electronic equipment which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the step of processing the game data is realized when the processor executes the program.
The embodiment of the invention also discloses a computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, and the computer program realizes the steps of processing the game data when being executed by a processor.
For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.
The game data processing method and the game data processing device provided by the invention are described in detail, and specific examples are applied in the text to explain the principle and the implementation of the invention, and the description of the above embodiments is only used to help understanding the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (8)

1. A method for processing game data, comprising:
obtaining game log data;
dividing game users according to the game log data to obtain at least one user group information;
extracting feature dimension data corresponding to the user group information from the game log data;
inputting the user group information and the corresponding characteristic dimension data into a decision tree model to obtain a trained decision tree model and a corresponding model file;
screening the decision path data in the model file according to screening conditions to obtain screened decision path data, and extracting the screened decision path data, wherein the screened decision path data comprises decision rules represented by each node and quantity information of users on each node; the screening condition comprises that the lost user number ratio or the reserved user number ratio on a node reaches a first threshold value, or the lost user number ratio on the node is larger than the specified number compared with the lost user number of the previous node and the total number on the node is larger than a second threshold value, or the node lost sample ratio is larger than the reserved sample ratio and the reserved sample ratio of the previous node is larger than the lost sample ratio;
and determining the loss reason according to the decision rule and the corresponding number information of the users.
2. The method of claim 1, further comprising:
and visualizing the trained decision tree model to obtain a model visualization result.
3. The method according to claim 1 or 2, wherein the step of dividing game users according to the game log data to obtain at least one user group information comprises:
extracting the accumulated online time within the preset time period of the game log data;
and dividing game users according to the accumulated online time to obtain the user group information.
4. The method according to claim 1 or 2, wherein the step of extracting feature dimension data corresponding to the user group information from the game log data includes:
extracting initial dimension data corresponding to the user group information from the game log data;
and extracting characteristic dimension data in the initial dimension data.
5. The method according to claim 1 or 2, wherein the step of inputting the user group information and the corresponding feature dimension data into a decision tree model to obtain a trained decision tree model and a corresponding model file comprises:
inputting the user group information and the corresponding feature dimension data into a decision tree model to obtain a trained decision tree model;
and analyzing the trained decision tree model to obtain the model file.
6. A game data processing apparatus, comprising:
the game log data acquisition module is used for acquiring game log data;
the user group information acquisition module is used for dividing game users according to the game log data to acquire at least one user group information;
the characteristic dimension data extraction module is used for extracting characteristic dimension data corresponding to the user group information from the game log data;
the training module is used for inputting the user group information and the corresponding characteristic dimension data into a decision tree model to obtain a trained decision tree model and a corresponding model file;
the decision path data extraction module is used for screening the decision path data in the model file according to screening conditions to obtain screened decision path data and extracting the screened decision path data, and the screened decision path data comprises decision rules represented by each node and quantity information of users on each node; and determining the loss reason according to the decision rule and the corresponding user quantity information; the screening condition comprises that the lost user number ratio or the reserved user number ratio on the node reaches a first threshold value, or the lost user number ratio on the node is larger than the specified number compared with the lost user number of the previous node and the total number on the node is larger than a second threshold value, or the node lost sample ratio is larger than the reserved sample ratio and the reserved sample ratio of the previous node is larger than the lost sample ratio.
7. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of processing of game data according to any of claims 1 to 5 when executing the program.
8. A computer-readable storage medium, characterized in that a computer program is stored thereon, which, when being executed by a processor, carries out the steps of the processing of game data according to any one of claims 1 to 5.
CN201910037504.3A 2019-01-15 2019-01-15 Game data processing method and device Active CN109767269B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910037504.3A CN109767269B (en) 2019-01-15 2019-01-15 Game data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910037504.3A CN109767269B (en) 2019-01-15 2019-01-15 Game data processing method and device

Publications (2)

Publication Number Publication Date
CN109767269A CN109767269A (en) 2019-05-17
CN109767269B true CN109767269B (en) 2022-02-22

Family

ID=66452946

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910037504.3A Active CN109767269B (en) 2019-01-15 2019-01-15 Game data processing method and device

Country Status (1)

Country Link
CN (1) CN109767269B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111111193B (en) * 2019-12-25 2023-09-22 北京奇艺世纪科技有限公司 Game control method and device and electronic equipment
CN111632384B (en) * 2020-05-29 2023-04-28 网易(杭州)网络有限公司 Game online number detection method, device, equipment and storage medium
CN111722720B (en) * 2020-06-22 2022-10-14 芯盟科技有限公司 Man-machine interaction method, device and terminal
CN111803957B (en) * 2020-07-17 2024-02-09 网易(杭州)网络有限公司 Method, device, computer equipment and medium for predicting players of online games
CN111861588B (en) * 2020-08-06 2023-10-31 网易(杭州)网络有限公司 Training method of loss prediction model, player loss reason analysis method and player loss reason analysis device
CN113457166A (en) * 2021-07-20 2021-10-01 网易(杭州)网络有限公司 Game player churn information processing method, device, equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104111920A (en) * 2013-04-16 2014-10-22 华为技术有限公司 Decision-making tree based prediction method and device
CN104679777A (en) * 2013-12-02 2015-06-03 中国银联股份有限公司 Method and system for detecting fraudulent trading
CN107545360A (en) * 2017-07-28 2018-01-05 浙江邦盛科技有限公司 A kind of air control intelligent rules deriving method and system based on decision tree
CN107609708A (en) * 2017-09-25 2018-01-19 广州赫炎大数据科技有限公司 A kind of customer loss Forecasting Methodology and system based on mobile phone games shop
CN108229986A (en) * 2016-12-14 2018-06-29 腾讯科技(深圳)有限公司 Feature construction method, information distribution method and device in Information prediction
CN108268624A (en) * 2018-01-10 2018-07-10 清华大学 User data method for visualizing and system
CN108665277A (en) * 2017-03-27 2018-10-16 阿里巴巴集团控股有限公司 A kind of information processing method and device

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9183497B2 (en) * 2012-02-23 2015-11-10 Palo Alto Research Center Incorporated Performance-efficient system for predicting user activities based on time-related features
CN105930934B (en) * 2016-04-27 2018-08-14 第四范式(北京)技术有限公司 It shows the method, apparatus of prediction model and adjusts the method, apparatus of prediction model
CN107230133B (en) * 2017-05-26 2020-12-22 努比亚技术有限公司 Data processing method, equipment and computer storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104111920A (en) * 2013-04-16 2014-10-22 华为技术有限公司 Decision-making tree based prediction method and device
CN104679777A (en) * 2013-12-02 2015-06-03 中国银联股份有限公司 Method and system for detecting fraudulent trading
CN108229986A (en) * 2016-12-14 2018-06-29 腾讯科技(深圳)有限公司 Feature construction method, information distribution method and device in Information prediction
CN108665277A (en) * 2017-03-27 2018-10-16 阿里巴巴集团控股有限公司 A kind of information processing method and device
CN107545360A (en) * 2017-07-28 2018-01-05 浙江邦盛科技有限公司 A kind of air control intelligent rules deriving method and system based on decision tree
CN107609708A (en) * 2017-09-25 2018-01-19 广州赫炎大数据科技有限公司 A kind of customer loss Forecasting Methodology and system based on mobile phone games shop
CN108268624A (en) * 2018-01-10 2018-07-10 清华大学 User data method for visualizing and system

Also Published As

Publication number Publication date
CN109767269A (en) 2019-05-17

Similar Documents

Publication Publication Date Title
CN109767269B (en) Game data processing method and device
US10664374B2 (en) Event analysis device, event analysis system, event analysis method, and event analysis program
CN103646086B (en) Junk file cleaning method and device
US10997637B2 (en) Method and system for determining quality of application based on user behaviors of application management
CN107689008A (en) A kind of user insures the method and device of behavior prediction
CN111127105A (en) User hierarchical model construction method and system, and operation analysis method and system
US11887013B2 (en) System and method for facilitating model-based classification of transactions
CN109815631A (en) A kind for the treatment of method and apparatus of game data
CN110347724A (en) Abnormal behaviour recognition methods, device, electronic equipment and medium
Guruler et al. Modeling student performance in higher education using data mining
CN111681049A (en) User behavior processing method, storage medium and related equipment
CN109978575B (en) Method and device for mining user flow operation scene
CN115422464A (en) Method and device for determining number of persons participating in sequence event and storage medium
CN114942971A (en) Extraction method and device of structured data
CN105405051B (en) Financial event prediction method and device
CN109754290A (en) A kind for the treatment of method and apparatus of game data
Ng et al. Forecasting topic activity with exogenous and endogenous information signals in Twitter
CN115759250A (en) Attribution analysis method, attribution analysis device, electronic equipment and storage medium
JP6748474B2 (en) Decision support system and decision support method
CN110941608B (en) Method, device and equipment for generating buried point analysis and funnel analysis report
US20160335300A1 (en) Searching Large Data Space for Statistically Significant Patterns
CN113869423A (en) Marketing response model construction method, equipment and medium
CN109617734A (en) Network operation capability analysis method and device
CN111353860A (en) Product information pushing method and system
US11727002B2 (en) Segment trend analytics query processing using event data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant