CN111773732A

CN111773732A - Target game user detection method, device and equipment

Info

Publication number: CN111773732A
Application number: CN202010918617.7A
Authority: CN
Inventors: 唐昊阳; 刘雨林; 郭松林; 阙志伟; 吴超杰; 赵海明
Original assignee: Perfect World Beijing Software Technology Development Co Ltd
Current assignee: Perfect World Beijing Software Technology Development Co Ltd
Priority date: 2020-09-04
Filing date: 2020-09-04
Publication date: 2020-10-16
Anticipated expiration: 2040-09-04
Also published as: CN111773732B; CN112494952B; CN112494952A

Abstract

The application discloses a method, a device and equipment for detecting a target game user, and relates to the technical field of data processing. The method comprises the following steps: firstly, carrying out numerical preprocessing on role behavior characteristics and role attribute characteristics of a user to be identified in a game to obtain first preprocessing data; performing feature extraction on the first preprocessed data by using a random forest algorithm to obtain second preprocessed data; and inputting the second preprocessing data into a classification model, and judging whether the user to be identified is the game studio user or not by referring to a classification result output by the classification model, wherein the classification model is obtained by training based on the character behavior characteristics and the character attribute characteristics of the game studio user in the game. By applying the scheme, the detection accuracy of the game studio user can be improved, and the game experience of other normal players cannot be influenced. The manual dependence is reduced, and the detection of the game studio user can be automatically completed.

Description

Target game user detection method, device and equipment

Technical Field

The present application relates to the field of data processing technologies, and in particular, to a method, an apparatus, and a device for detecting a target game user.

Background

Illegal game studios are currently enriched in various network games. By writing a program script, a large number of accounts and characters of 'abnormal robots' are manufactured, thereby obtaining a large number of illegal properties in a game in an illegal manner. The normal operation of the in-game ecosystem is destroyed, and the legal rights and interests of game makers and game players are infringed.

At present, robot detection can be performed through Turing tests in modes of verification codes and the like. For example, the game system may send some verification codes for the player to identify when logging in, and detect whether the player belongs to the robot; or in the course of game, adding verification code to make anti-external hanging robot detection.

However, the turing test methods such as verification code depend not only on the state of the object to be tested, but also on the level of the detection means, and thus the detection accuracy of the game studio user is low. For example, in the case that the verification code is very difficult to identify, even if the tested object is a normal player, a verification error still occurs, and in the current technical means, part of the verification code can be cracked by means of image identification and collection of a verification code library.

Disclosure of Invention

In view of this, the present application provides a method, an apparatus, and a device for detecting a target game user, and mainly aims to solve the technical problem that the detection accuracy of a game studio user is low when the game studio user is detected by using a turing test method such as a validation code.

According to an aspect of the present application, there is provided a method for detecting a target game user, the method including:

acquiring role behavior characteristics and role attribute characteristics of a user to be identified in a game;

carrying out numerical preprocessing on the role behavior characteristics and the role attribute characteristics to obtain first preprocessing data;

performing feature extraction on the first preprocessed data by using a random forest algorithm to obtain second preprocessed data;

and inputting the second preprocessing data into a classification model, and judging whether the user to be identified is a game studio user or not by referring to a classification result output by the classification model, wherein the classification model is obtained by training based on the role behavior characteristics and the role attribute characteristics of the game studio user in the game, and the classification model is a decision tree model.

According to another aspect of the present application, there is provided a target game user detection apparatus, including:

the acquisition module is used for acquiring the role behavior characteristics and the role attribute characteristics of the user to be identified in the game;

the first preprocessing module is used for carrying out numerical preprocessing on the role behavior characteristics and the role attribute characteristics to obtain first preprocessing data;

the second preprocessing module is used for extracting the characteristics of the first preprocessed data by using a random forest algorithm to obtain second preprocessed data;

and the judging module is used for inputting the second preprocessing data into a classification model, and judging whether the user to be identified is a game studio user or not by referring to a classification result output by the classification model, wherein the classification model is obtained by training based on the role behavior characteristics and the role attribute characteristics of the game studio user in the game, and the classification model is a decision tree model.

According to yet another aspect of the present application, there is provided a storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described target game user detection method.

According to still another aspect of the present application, there is provided a target game user detection device, including a storage medium, a processor, and a computer program stored on the storage medium and executable on the processor, wherein the processor implements the target game user detection method when executing the program.

By means of the technical scheme, the method, the device and the equipment for detecting the target game user can accurately judge whether the user to be identified is the game studio user or not by utilizing the classification model of the decision tree according to the role behavior characteristic and the role attribute characteristic of the user to be identified in the game and by combining the role behavior characteristic and the role attribute characteristic of the user judged to be the game studio user in the game. Compared with the prior Turing test modes such as verification codes, the method and the device do not need to issue the verification codes for verification, cannot be cracked easily, can improve the detection accuracy of the game studio users, and cannot influence the game experience of other normal players. The manual dependence is reduced, the detection of the game studio users can be automatically completed, and the detection efficiency of the game studio users is improved. And the reference basis of the judgment is the relevant characteristic data of the role of the game player in the game, and the privacy data of the player in the real life is not related, so that the user privacy is protected.

The foregoing description is only an overview of the technical solutions of the present application, and the present application can be implemented according to the content of the description in order to make the technical means of the present application more clearly understood, and the following detailed description of the present application is given in order to make the above and other objects, features, and advantages of the present application more clearly understandable.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

FIG. 1 is a flow chart illustrating a method for detecting a target game user according to an embodiment of the present application;

FIG. 2 is a flow chart illustrating another method for detecting a target game user according to an embodiment of the present disclosure;

fig. 3 shows a schematic structural diagram of a detection device for a target game user according to an embodiment of the present application.

Detailed Description

The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

The technical problem that the detection accuracy of a game studio user is low when the game studio user is detected by using Turing test modes such as verification codes and the like at present is solved. The embodiment provides a method for detecting a target game user, as shown in fig. 1, the method includes:

101. and acquiring the character behavior characteristics and the character attribute characteristics of the user to be identified in the game.

Wherein the user to be identified is a game player user and currently needs to determine whether it is a game studio user.

The character behavior features mainly describe the behavior of the player character in the game and the behavior of the account in the game platform. Character attribute features primarily describe some of the player's inherent attributes within the game. For example, the role behavior characteristics may specifically include one or more of an online duration of a role period, an online and/or offline time within the role period, and historical recharging information of an account where the role is located, and the role attribute characteristics may specifically include one or more of role grade information, a server ID where the role is located, role race information, role occupation information, and a number of roles under the account where the role is located. For example, as shown in table 1, the online duration of the role cycle may be the online duration of the role every day, the online time in the role cycle may be the online time of the role every day, the offline time in the role cycle may be the offline time of the role every day, and the historical recharge information of the account of the role may include the historical accumulated recharge amount of the account of the role and the annual recharge amount of the account of the role.

TABLE 1

Wherein, the online time of each day of the role is as follows: refers to the number of online long minutes per day for a character. The studio role tends to have very long online times for the greatest benefit per day, and the number of online times per day is quite fixed. Standard deviation of up/down line time in role period: refers to the time at which the character logs in/out of the game daily. Since the game studio characters are typically automatically controlled by scripts, the time of their up-and-down-taking is very mechanical. The smaller the standard deviation, the more regularly the character registration is, the more easily it is determined as an abnormal character (robot). And (3) historical accumulated recharging amount of the account of the role: the method refers to the recharge amount which is counted in the history in the account number to which the role belongs. Because game studios seek very low cost profit, the account is rarely recharged. The account number of the role is charged in the current year: the total recharging amount of the account number to which the role belongs in the year is referred. Because game studios seek very low cost benefits, there is little act of recharging accounts.

Role grade: refers to the number of levels of a character in the game. Because of the difficulty in promoting the role level, the studio may only promote the role to a level suitable for profit-making. Role server ID: refers to the server ID number of the game in which the character is located. Because different servers present different environments and have different profits, the studio prefers to choose certain servers for profit-making. Character ethnicity: refers to the race type that a character has selected in a game. Studios are more inclined to select certain ethnicities because certain profitability matches these ethnicities more closely. The role is professional: refers to the type of occupation that the character has selected in the game. The studio is more inclined to select certain occupations, as certain earnings may be better performed by these occupations. The number of roles under the account where the role is located: the number of the roles is the account number of the role. Studios often create a large number of characters using a certain account number in order to make a large amount of profit at a low cost.

It should be noted that, in the method of this embodiment, detection of a game studio user is performed based on character feature data of the user in a game, instead of features of a user account, for example, privacy information in the user account (such as user age, gender, province, city, occupation, and the like). Compared with the detection of the game studio users based on the characteristics of the user accounts in the prior art, the reference basis for the judgment of the game studio users by the method is the relevant characteristic data of the game players in the game, and the privacy data of the individual players in real life are not involved, so that the user privacy is protected.

The execution main body of the embodiment can be a device or equipment for detecting whether the game user is a game studio user, and can be configured on the server side or the client side, so that the detection accuracy of the game studio user can be improved.

102. And carrying out numerical preprocessing on the character behavior characteristics and the character attribute characteristics of the user to be recognized in the game to obtain first preprocessing data.

Because some of the character behavior features and the character attribute features are not numerical data, in order to facilitate inputting the data into the classification model for classification, the present embodiment needs to perform numerical preprocessing on the character behavior features and the character attribute features of the user to be recognized in the game.

103. And performing feature extraction on the first preprocessed data by using a random forest algorithm to obtain second preprocessed data.

Because some data which is relatively not important for model classification also exist in the role behavior characteristics and the role attribute characteristics, in order to improve classification accuracy and classification efficiency, the importance of a single characteristic variable can be calculated by using a random forest algorithm, and then the character behavior characteristics and the role attribute characteristics after numerical processing are subjected to characteristic extraction according to the importance of each characteristic.

104. And inputting the second preprocessing data into the classification model, and judging whether the user to be identified is the game studio user or not according to the classification result output by the classification model.

The classification model is obtained by training based on the character behavior characteristics and the character attribute characteristics of the game studio users in the game, and can be a decision tree model. Wherein the game studio user has been determined to be the game player user as well, but has been determined to be the game studio user.

The character behavior signature that has been determined to be a game studio user describes the behavior of the game studio user in the game and the account in the game platform. And the character attribute features determined to be game studio users describe some inherent attributes of the game studio users that have been determined to be in the game. The two specific characteristics include the same content as the character behavior characteristics and the character attribute characteristics in step 101, and the difference is that the two characteristics are the corresponding characteristics determined as the game studio users in the game, and thus the details are not described again.

In this embodiment, the record of the stopping can be performed by querying the account/character whose history is determined as the game studio, so as to obtain the character behavior characteristics and the character attribute characteristics corresponding to the game studio user in the game.

In addition to the method of determining whether the user to be identified is the game studio user by using the classification model, the method of the embodiment may also determine whether the user is the game studio user by using a similarity calculation method, for example, by comparing the similarities between the features, if the similarities between the corresponding features (character behavior features and character attribute features) of the user to be identified in the game and the corresponding features (character behavior features and character attribute features) of the user determined as the game studio user in the game are greater than or equal to a certain threshold, it is indicated that the behavior features and the attribute features of the user to be identified in the game are very close to the game studio user, and thus it may be determined that the user to be identified is likely to be the game studio user.

According to the detection method for the game studio, whether the user to be identified is the game studio user can be accurately judged according to the role behavior characteristic and the role attribute characteristic of the user to be identified in the game and by combining the role behavior characteristic and the role attribute characteristic of the user judged to be the game studio user in the game. Compared with the prior Turing test modes such as verification codes, the embodiment does not need to issue the verification codes for verification, can not be cracked easily, can improve the detection accuracy of the game studio users, and can not influence the game experience of other normal players. The manual dependence is reduced, the detection of the game studio users can be automatically completed, and the detection efficiency of the game studio users is improved. And the reference basis of the judgment is the relevant characteristic data of the role of the game player in the game, and the privacy data of the player in the real life is not related, so that the user privacy is ensured.

Further, as a refinement and an extension of the specific implementation of the foregoing embodiment, in order to fully describe the implementation of this embodiment, this embodiment further provides another target game user detection method, as shown in fig. 2, where the method includes:

201. and acquiring character behavior characteristics and character attribute characteristics of the game studio users in the game.

For the embodiment, in order to reduce the dependence on the manual work and realize the automatic detection of the users in the game studio, the classification model of machine learning can be used for intelligent classification, and then the users in the game studio can be quickly and accurately identified, so that the labor is saved and the manual labor value is improved. For example, taking the classification model of the decision tree as an example, the processes shown in steps 202 to 208 may be specifically performed.

202. And carrying out numerical preprocessing on the character behavior characteristics and the character attribute characteristics of the game studio users in the game.

In order to obtain a classification model with accurate classification, firstly, an accurate model training process is required, in this embodiment, it is determined that the character behavior characteristics and the character attribute characteristics of the game studio user in the game can be used as sample data, and some of the characteristics are not numerical data, so that numerical preprocessing is required for the model training.

Optionally, the process of the numerical preprocessing specifically includes: calculating the average value of the online time of the diagonal period; calculating a standard deviation of the online or offline time in the diagonal color period; calculating the accumulated recharging amount and/or the recharging amount in the statistical period according to the historical recharging information of the account of the role; acquiring the grade number of the role in the game according to the role grade information; acquiring race type numbers of the characters in the game according to the race information of the characters; and acquiring the occupation type number of the character in the game according to the occupation information of the character.

For example, the role periodic online duration can be the role daily online duration, and the average daily online duration of the available roles can be minutes

And as shown in formula one, the numerical value is integer.

(formula one)

Wherein the content of the first and second substances,

the number of online hours of the character on the ith day, N is the total number of days for collecting the features, and N represents an integer.

Standard deviation of up/down time within a role cycle: e.g. taking standard deviation of daily character on-line time

(as shown in formula II), standard deviation of offline time every day

(formula three, as above), are floating point values.

(formula two)

(formula three)

Wherein the content of the first and second substances,

the time when the character on the ith day comes on line,

the time the character on day i comes on line, and n is the total number of days the features are collected.

The average value of the time of the online of the role in the period is shown as the formula IV,

mean values of the time when the angles are plotted down in the period:

(formula four)

(formula five)

Acquiring the historical recharging amount total number (total) of the account number of the role as an integer numerical value according to the historical recharging information of the account number of the role; and acquiring the total number of the recharging amount of the account of the role in the current year, wherein cyear is an integer numerical value. Role grade: and acquiring the level of the role level number, which is an integer numerical value. Role server ID: and (5) numbering the server where the role is located, wherein the number is an integer numerical value. Character ethnicity: and taking the race number raceid corresponding to the role as an integer numerical value. The role is professional: and taking the role corresponding to the occupational number occid as an integer numerical value.

Through the preprocessing mode, the characteristic data can be converted into numerical data, and the training process of the model is conveniently realized by machine reading. However, as the gaming environment changes, more different models may be added to the method. Old features may need to be deleted and new features added. But the model does not depend on too many features, so the embodiment chooses to select features with random forests, i.e. performs the process shown in step 203.

203. And (4) performing feature extraction on the sample feature data subjected to numerical preprocessing by using a random forest algorithm.

The random forest refers to a classifier which trains and predicts a sample by using a plurality of trees and can calculate the importance of a single characteristic variable. Based on such features, it can be used to select features in the dataset with importance. Correspondingly, optionally, step 203 may specifically include: firstly, calculating the importance of each feature in feature data after numerical preprocessing by using a random forest algorithm; and then, according to the importance of each feature, performing feature extraction on the feature data subjected to the numerical preprocessing. Through the method, the sample data used by the classification model training is more simplified and accurate, the influence on the accuracy of model classification caused by the fact that too much invalid sample data is used for model training is avoided, and the detection efficiency can be improved through more accurate feature selection.

Illustratively, calculating the importance of each feature in the feature data after the numerical preprocessing by using a random forest algorithm may specifically include: firstly, generating a plurality of feature subsets by using the feature data subjected to numerical preprocessing; then constructing a plurality of decision trees according to the plurality of feature subsets; then calculating a first error of the out-of-bag data corresponding to each decision tree, wherein the out-of-bag data can be data which does not participate in the decision tree construction when the decision tree is constructed; then, randomly selecting target characteristics in the data outside the bag, and after adding random noise interference to the target characteristics, calculating a second error of the data outside the bag corresponding to each decision tree again; respectively calculating the difference value between the first error and the second error of the corresponding out-of-bag data of each decision tree; and finally, summing the difference values corresponding to each decision tree, and dividing by the number of the decision trees to obtain the importance of the target characteristics. Through the alternative mode, the importance of each feature in the feature data after the numerical preprocessing can be accurately calculated, so that the important feature extraction can be carried out by referring to the importance in the following.

For example, taking the feature data after the quantization preprocessing as the current feature set, first, M feature subsets are generated for the current feature setThe size and the size of the subset can be adjusted according to the actual scene, so that the decision tree is constructed. Selecting Out of Bag data (Out of Bag, OOB) with proper scale for each decision tree, calculating corresponding Out of Bag data error, and recording as

And the data outside the bag refers to the data which does not participate in the establishment of the decision tree when the decision tree is established. Then randomly selecting the characteristic F in the OOB, adding random noise interference, calculating the error outside the bag again, and recording the error as

. The finally obtained feature F importance I is shown in formula six:

(formula six)

And N is the number of decision trees in the forest.

For example, according to the importance of each feature, the feature extraction of the feature data after the numerical preprocessing may specifically include: firstly, sorting each feature in feature data according to the importance of each feature; deleting the ranked features of the importance in the ranking according to a preset deletion proportion to obtain new feature data; then calculating the importance of each feature in the new feature data by using a random forest algorithm; according to the importance of each feature in the new feature data, repeatedly executing the processes of feature sorting, feature deleting and importance calculating until the latest obtained feature data meets the preset quality condition; and finally, determining the extracted characteristic data according to the characteristic data meeting the preset quality condition.

The preset quality condition may be preset according to an actual requirement, for example, when all the new features obtained by screening are applicable to the game environment of the latest version, or the number of the features obtained by screening is less than or equal to a certain threshold, or the importance of the features obtained by screening is greater than or equal to a certain threshold, and the like, it is determined that the feature data obtained by screening meets the preset quality condition. Through the optional mode, the finally extracted feature data can be ensured to meet the requirements.

Further optionally, determining the extracted feature data according to the feature data meeting the preset quality condition may specifically include: generating a plurality of feature subsets according to the feature data meeting the preset quality condition; constructing a plurality of decision trees according to the plurality of feature subsets; calculating a third error of the corresponding out-of-bag data of each decision tree; and selecting the feature subset corresponding to the decision tree with the lowest third error as the extracted feature data. Through the optional mode, the importance of each feature can be accurately utilized, the features in the data set are selected, the finally extracted feature data are all feature data capable of improving the accuracy of actual classification, and the model training efficiency can be improved.

For example, the process of feature extraction may be as follows (a) to (e):

(a) and sorting the characteristic variables needing to be screened by I.

(b) And determining the proportion of the features to be deleted according to specific requirements, and deleting the features to obtain a new feature set.

(c) And establishing a random forest by using the newly obtained feature set, and recalculating the feature importance I in the set.

(d) And (c) repeating the steps (a) to (c) until the feature set meets the requirement.

(e) And constructing a corresponding random forest according to the finally obtained feature set, and calculating a corresponding out-of-bag error, so that the lowest feature set is selected as a final feature set.

204. And establishing a training set according to the extracted characteristic data, and training by utilizing a decision tree algorithm to obtain a classification model.

Various algorithms can be used for the decision tree algorithm, such as ID3, C4.5, CART and the like. Alternatively, the game mentioned in this embodiment may be a Massively Multiplayer Online Role Playing Game (MMORPG) type. For this type of game, in order to reduce the performance impact on the game server in the game studio user detection process and improve the detection efficiency, further optionally, in this embodiment, a C4.5 algorithm may be selected for sample classification, that is, a classification model (decision tree model) obtained by training is a C4.5 algorithm model. C4.5 is a series of supervised learning algorithms used in machine learning and data mining. For a given data set, each tuple can be described by a set of attribute values, and each tuple belongs to a certain class in a mutually exclusive class. The objective of the algorithm is to find the mapping relation from the attribute value to the corresponding category through supervised learning, and to construct a decision tree by using the relation, so as to classify the new unknown data.

The C4.5 algorithm has major advantages including: a. the operation efficiency is high; b. the model is simple and easy to understand and prune; c. richness with respect to processable data types; d. insensitive to missing values.

The matching between the application requirement of the embodiment and the C4.5 algorithm mainly includes the following steps:

(1) and matching degree of the running efficiency and the application scene. The embodiment is mainly applied to player character behavior verification at the server end, has certain requirements on the operating efficiency of the server, and does not occupy excessive platform resources, thereby reducing the cost. And the C4.5 algorithm is more friendly to reuse, and the pruning cost for modifying the model is lower compared with other algorithms.

(2) The method of the embodiment is mainly operated by operators. The method of the embodiment mainly uses the personnel to plan the related business personnel for the game. Because the related service personnel often do not have professional algorithm related knowledge, the learning and operating costs of the service personnel need to be reduced. The C4.5 algorithm constructs the decision tree, so that the decision tree is easy to understand by related personnel, the learning cost is low, and business personnel can easily and directly prune the decision tree to meet the actual production requirement. And on the basis of reducing the understanding cost of service personnel, the operating efficiency is not sacrificed too much.

(3) The main application scenario of the present embodiment. The embodiment can be mainly applied to MMORPG type games, and the natural behavior types of players are more, so that the operation of data preprocessing is more. Whereas C4.5 is more satisfied with such requirements. Wherein, the nature behavior type is multi-finger: features of movement, release skills, daily behavior sequences, time-of-day behavior sequences, interaction with other players, Non-Player characters (NPCs), natural scenes are many.

(4) The extent of data preprocessing. In the embodiment, the feature selection is performed after the data preprocessing, and in practice, various complex data environments may occur in different application scenarios. The present embodiment therefore requires a scheme that is insensitive to data missing values. The complex environment refers to a situation involving a large number of behavior types, a large number of selectable data sets, and different behavior data generated by players in new clothes and old clothes, especially the interactions (community interactions) between the players and the players in the community. The insensitivity of the data missing value means that the effective characteristic is assumed to be abcde, if only abc exists, the decision tree can run, and the performance influence is small. However, some algorithms such as Support Vector Machine (SVM) require the completed feature abcde to run well.

Therefore, based on the above descriptions of (1), (2), (3), and (4), C4.5 is an algorithm more suitable for the present embodiment.

The parameters involved in the C4.5 algorithm are explained as follows:

information entropy is a measure representing uncertainty of random variables in information theory. The larger the information entropy, the larger the random variable uncertainty. X is a random variable with a finite value, and the probability distribution of the random variable is shown in formula seven:

(formula seven)

The entropy H (X) of X is defined as shown in formula eight:

(formula eight)

Conditional entropy: representing the uncertainty of the random variable Y given the random variable X. There are random variables (X, Y) whose joint probability distribution is shown in equation nine:

(formula nine)

Then under the given X condition, the conditional probability distribution conditional entropy of Y is shown by equation ten:

(formula ten)

Information gain: indicating the degree of uncertainty reduction in classification of data set D after knowledge of feature a. The calculation mode is shown in formula eleven:

(formula eleven)

Based on the C4.5 algorithm, step 205 may specifically include: firstly, configuring a sample label corresponding to the extracted feature data; adding the extracted feature data and the sample label corresponding to the feature data into a training set; if the sample labels of all sample data in the training set belong to the first category, the decision tree is a single-node tree, the categories of the nodes in the decision tree are marked according to the target category, and the classification model of the decision tree is returned; if the sample data corresponding to the extracted feature data in the training set is empty, the decision tree is a single-node tree, the class of the node in the decision tree is marked according to the second class with the largest number of samples in the training set, and the classification model of the decision tree is returned; if the sample data corresponding to the extracted feature data in the training set is not empty, calculating the feature with the largest information gain rate in the sample data corresponding to the extracted feature data; when the information gain rate of the features with the largest information gain rate is smaller than a preset threshold value, judging that the decision tree is a single-node tree, marking the classes of the nodes in the decision tree according to a second class with the largest number of samples in a training set, and returning to a classification model of the decision tree; when the information gain rate of the features with the maximum information gain rate is larger than or equal to a preset threshold value, dividing the training set into a plurality of non-empty subsets according to all possible values of the features with the maximum information gain rate, taking the third class with the maximum number of samples in each non-empty subset as a mark, constructing child nodes of the decision tree to realize the construction of the decision tree, and returning to a classification model of the decision tree.

For example, the decision tree construction algorithm:

inputting: training data set D (which may include example data for positive and/or negative examples), feature set A (created from feature data from the feature extraction step described above), and threshold

(preset threshold);

and (3) outputting: a classification model of the decision tree T;

step 1: if all instances in D belong to the same class

And T is a single-node tree. Will be provided with

Marking as the class of the node, and returning to T;

step 2: if it is not

And T is a single-node tree. Class with the largest number of instances in D

Marking as the class of the node, and returning to T;

step 3: otherwise, calculating the characteristic of the maximum information gain rate in the A

；

Step 4: if it is not

Has an information gain rate less than

And T is a single-node tree. Class with the largest number of instances in D

Marking as the class of the node, and returning to T;

step 5: otherwise, it is to

All possible values of

To do so by

Dividing D into non-empty subsets

. Will be provided with

And the class with the maximum number of the middle instances is used as a mark, child nodes are constructed, and a tree T is constructed and returned.

Through the optional mode, an accurate decision tree classification model can be created, wherein ten-fold cross validation can be performed during classification model training, and indexes such as True Positive Rate (TPR), False Positive Rate (FPR), Precision (Precision), Recall Rate (Recall), F1 values and the like are selected as reference indexes for model training reaching the standard. Where the true rate is used to predict the ratio of the correct number of positive samples to the total number of positive samples. The false positive rate is used to predict the correct ratio of negative to total negative samples. The accuracy is used to predict the ratio of true positive samples among the positive samples. The recall ratio is used to predict the ratio of the number of correct positive samples to the total positive samples. The F1 value is the harmonic mean of accuracy and recall.

For example, with the method of the present embodiment, ten-fold cross-validation is performed as a data set by the game studio blocking account records at a certain time of year. The data set has 34072 total number of colors and 272576 total number of features. The results of the experiment are shown in table 2 below:

TABLE 2

205. When the user to be identified needs to carry out game studio user detection, the role behavior characteristics and the role attribute characteristics of the user to be identified in the game are obtained.

206. And carrying out numerical preprocessing on the character behavior characteristics and the character attribute characteristics of the user to be recognized in the game to obtain first preprocessing data.

The role behavior characteristics comprise one or more of role cycle online time, online and/or offline time in the role cycle and historical recharging information of an account where the role is located, and the role attribute characteristics comprise one or more of role grade information, a server ID where the role is located, role race information, role occupation information and role number under the account where the role is located.

Optionally, step 206 may specifically include: calculating the average value of the periodic online time of the roles; calculating a standard deviation of the online or offline time in the role period; calculating the accumulated recharging amount and/or the recharging amount in the statistical period according to the historical recharging information of the account of the role; acquiring the grade number of the role in the game according to the role grade information; acquiring race type numbers of the characters in the game according to the race information of the characters; and acquiring the occupation type number of the role in the game according to the occupation information of the role.

207. And performing feature extraction on the first preprocessed data by using a random forest algorithm to obtain second preprocessed data.

Optionally, step 207 may specifically include: calculating the importance of each feature in the first preprocessed data by using a random forest algorithm; and according to the importance of each feature, performing feature extraction on the first preprocessing data to obtain second preprocessing data.

Optionally, calculating the importance of each feature in the first preprocessed data by using a random forest algorithm, specifically including: generating a plurality of feature subsets from the first preprocessed data; constructing a plurality of decision trees according to the plurality of feature subsets; calculating a first error of the out-of-bag data corresponding to each decision tree, wherein the out-of-bag data is data which does not participate in the decision tree construction when the decision tree is constructed; randomly selecting target characteristics in the data outside the bag, and after adding random noise interference to the target characteristics, calculating a second error of the data outside the bag corresponding to each decision tree again; respectively calculating the difference value between the first error and the second error of the corresponding out-of-bag data of each decision tree; and adding and summing the difference values corresponding to each decision tree, and dividing the sum by the number of the decision trees to obtain the importance of the target feature.

Optionally, according to the importance of each feature, performing feature extraction on the first preprocessed data to obtain second preprocessed data, which specifically includes: sorting each feature in the feature data according to the importance of each feature; deleting the ranked features of the importance in the ranking according to a preset deletion proportion to obtain new feature data; calculating the importance of each feature in the new feature data by using a random forest algorithm; according to the importance of each feature in the new feature data, repeatedly executing the processes of feature sorting, feature deleting and importance calculating until the latest obtained feature data meets the preset quality condition; and determining the second preprocessing data according to the characteristic data meeting the preset quality condition.

Optionally, determining the second preprocessing data according to the characteristic data meeting the preset quality condition specifically includes: generating a plurality of feature subsets according to the feature data meeting the preset quality conditions; constructing a plurality of decision trees according to the plurality of feature subsets; calculating a third error of the corresponding out-of-bag data of each decision tree; and selecting the feature subset corresponding to the decision tree with the lowest third error as the second preprocessing data.

In this embodiment, the specific implementation process of

steps

206 and 207 is similar to the feature processing process in the model training, and refer to the processes shown in

steps

202 and 203 specifically, which are not described herein again.

208. And inputting the second preprocessing data into the classification model, and judging whether the user to be identified is the game studio user or not according to the classification result output by the classification model.

Optionally, step 208 may specifically include: if the character output by the classification model is abnormal, determining that the user to be identified is a game studio; and then, limiting the game account corresponding to the user to be identified or abnormal characters under the game account. For example, a user login is restricted for a game account or a character under the game account (one or more of a plurality of characters under the account), a number is sealed, and the login can be performed only by unsealing.

For example, the character behavior features and the character attribute features of the user to be recognized in the game are sequentially subjected to the numerical preprocessing shown in step 206 and the feature extraction shown in step 207, then the extracted features are input into the decision tree classification model obtained in step 204, and whether the user to be recognized is a game studio user is determined by referring to the classification result output by the classification model. By the method, automatic detection of the users in the game studio can be realized, intelligent classification can be performed by using a classification model learned by a machine, and the users in the game studio can be quickly and accurately identified, so that labor is saved, and the manual labor value is improved.

Further optionally, if it is determined that the user to be identified is the user in the game studio, the method in this embodiment may further include: and expanding the training set according to the role behavior characteristics and the role attribute characteristics of the user to be identified so as to update the training classification model by using the expanded training set. By means of the automatic updating and learning mode of the machine, the classification model can be updated accurately, classification accuracy is improved, and accuracy of detecting users in the game studio can be improved.

The method of the embodiment can be deployed in an operation server and automatically acquired according to the required feature data and the matching script. And the required feature quantity can be preset, and the training set is automatically updated to train the model. To achieve higher timeliness.

Further, if it is determined that the user to be identified is the user in the game studio, the method of this embodiment may further include: outputting corresponding alarm information; and/or, adding the user to be identified to a blacklist, wherein the user in the blacklist is restricted from logging in the game. For example, after discovering a game studio user, the game studio user is warned in the form of text, audio, video, etc., so as to inform game managers, etc. in time. Or the user is added into a blacklist to limit the user to log in the game, so that automatic protection is achieved.

The method for detecting the game studio users provided by the embodiment can improve the accuracy and precision of detection by quickly constructing the decision tree, so as to improve the F1 value and the recall rate. The embodiment improves the precision of non-manual detection to a great extent, thereby greatly improving the detection hit rate. Through more automation mechanized operations, reduce artifical dependency, promote automatic detection efficiency. The embodiment can reduce the dependence of the detection mode on the manpower, thereby saving the manpower and improving the labor value of the manpower. And the detection timeliness is improved. Compared with the traditional detection mode, the method and the device can improve timeliness and reduce damage of the robot to the game environment. By collecting the game character related information more accurately, the privacy of the player is emphasized, and the related sensitive information is not collected. The embodiment does not collect unnecessary sensitive information, and protects the privacy of the player. Through more accurate feature selection, the efficiency of detecting game studio users can be improved.

Further, as a specific implementation of the method shown in fig. 1 and fig. 2, the embodiment provides a detection apparatus for a target game user, as shown in fig. 3, the apparatus includes: the device comprises an acquisition module 31, a first preprocessing module 32, a second preprocessing module 33 and a judgment module 34.

The acquiring module 31 is configured to acquire a role behavior characteristic and a role attribute characteristic of a user to be identified in a game;

a first preprocessing module 32, configured to perform numerical preprocessing on the role behavior characteristics and the role attribute characteristics to obtain first preprocessed data;

the second preprocessing module 33 is configured to perform feature extraction on the first preprocessed data by using a random forest algorithm to obtain second preprocessed data;

the determining module 34 is configured to input the second preprocessing data into a classification model, and determine whether the user to be identified is a game studio user by referring to a classification result output by the classification model, where the classification model is obtained by training based on a role behavior feature and a role attribute feature of the game studio user in the game, and the classification model is a decision tree model.

In a specific application scenario, the second preprocessing module 33 is specifically configured to calculate the importance of each feature in the first preprocessed data by using a random forest algorithm; and according to the importance of each feature, performing feature extraction on the first preprocessing data to obtain second preprocessing data.

In a specific application scenario, the second preprocessing module 33 is further configured to generate a plurality of feature subsets from the first preprocessed data; constructing a plurality of decision trees according to the plurality of feature subsets; calculating a first error of the out-of-bag data corresponding to each decision tree, wherein the out-of-bag data is data which does not participate in the decision tree construction when the decision tree is constructed; randomly selecting target characteristics in the data outside the bag, and after adding random noise interference to the target characteristics, calculating a second error of the data outside the bag corresponding to each decision tree again; respectively calculating the difference value between the first error and the second error of the corresponding out-of-bag data of each decision tree; and adding and summing the difference values corresponding to each decision tree, and dividing the sum by the number of the decision trees to obtain the importance of the target feature.

In a specific application scenario, the second preprocessing module 33 is further configured to sort each feature in the feature data according to the importance of each feature; deleting the ranked features of the importance in the ranking according to a preset deletion proportion to obtain new feature data; calculating the importance of each feature in the new feature data by using a random forest algorithm; according to the importance of each feature in the new feature data, repeatedly executing the processes of feature sorting, feature deleting and importance calculating until the latest obtained feature data meets the preset quality condition; and determining the second preprocessing data according to the characteristic data meeting the preset quality condition.

In a specific application scenario, the second preprocessing module 33 is further configured to generate a plurality of feature subsets according to the feature data meeting the preset quality condition; constructing a plurality of decision trees according to the plurality of feature subsets; calculating a third error of the corresponding out-of-bag data of each decision tree; and selecting the feature subset corresponding to the decision tree with the lowest third error as the second preprocessing data.

In a specific application scenario, optionally, the role behavior characteristics include one or more of a role cycle online time, a role cycle online and/or offline time, and role account historical recharging information, and the role attribute characteristics include one or more of role grade information, a role server ID, role race information, role occupation information, and role number under a role account;

the first preprocessing module 32 is specifically configured to calculate an average value of the online time of the role period; calculating a standard deviation of the online or offline time in the role period; calculating the accumulated recharging amount and/or the recharging amount in the statistical period according to the historical recharging information of the account of the role; acquiring the grade number of the role in the game according to the role grade information; acquiring race type numbers of the characters in the game according to the race information of the characters; and acquiring the occupation type number of the role in the game according to the occupation information of the role.

In a specific application scenario, the apparatus further comprises: a training module;

the first preprocessing module 32 is further configured to perform numerical preprocessing on the character behavior characteristics and the character attribute characteristics of the game studio user in the game before the second preprocessed data is input into the classification model;

the second preprocessing module 33 is further configured to perform feature extraction on the sample feature data subjected to the numerical preprocessing by using a random forest algorithm;

and the training module is used for creating a training set according to the extracted characteristic data and training by utilizing a decision tree algorithm to obtain the classification model.

In a specific application scenario, the training module is specifically configured to configure a sample label corresponding to the extracted feature data; adding the extracted feature data and a sample label corresponding to the feature data into a training set; if the sample labels of all sample data in the training set belong to the first category, the decision tree is a single-node tree, the categories of the nodes in the decision tree are marked according to the target category, and a classification model of the decision tree is returned; if the sample data corresponding to the extracted feature data in the training set is empty, the decision tree is a single-node tree, the class of the node in the decision tree is marked according to the second class with the largest number of samples in the training set, and the classification model of the decision tree is returned; if the sample data corresponding to the extracted feature data in the training set is not empty, calculating the feature with the largest information gain rate in the sample data corresponding to the extracted feature data; when the information gain rate of the features with the maximum information gain rate is smaller than a preset threshold value, judging that the decision tree is a single-node tree, marking the classes of the nodes in the decision tree according to the second class with the maximum number of samples in the training set, and returning to the classification model of the decision tree; and when the information gain rate of the features with the maximum information gain rate is greater than or equal to the preset threshold, dividing the training set into a plurality of non-empty subsets according to all possible values of the features with the maximum information gain rate, taking the third class with the maximum sample number in each non-empty subset as a mark, constructing child nodes of the decision tree to realize the construction of the decision tree, and returning to the classification model of the decision tree.

In a specific application scenario, the training module is further configured to, if it is determined that the user to be identified is a game studio user, expand the training set according to the role behavior characteristics and the role attribute characteristics of the user to be identified, so as to update and train the classification model using the expanded training set.

In a specific application scenario, the judgment module is specifically configured to determine that the user to be identified is a game studio if the output of the classification model is that the character is abnormal; and limiting the game account corresponding to the user to be identified or the abnormal role under the game account.

In a specific application scenario, optionally, the game is a massively multiplayer online role playing game MMORPG type, and the decision tree algorithm is a C4.5 algorithm.

It should be noted that other corresponding descriptions of the functional units related to the detection device for the target game user provided in this embodiment may refer to the corresponding descriptions in fig. 1 and fig. 2, and are not repeated herein.

Based on the above-mentioned methods shown in fig. 1 and fig. 2, correspondingly, the present embodiment further provides a storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the above-mentioned method for detecting a target game user shown in fig. 1 and fig. 2.

Based on such understanding, the technical solution of the present application may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.), and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method of the embodiments of the present application.

Based on the method shown in fig. 1 and fig. 2 and the virtual device embodiment shown in fig. 3, in order to achieve the above object, an embodiment of the present application further provides a detection device for a target game user, which may specifically be a personal computer, a server, a smart phone, or other network devices, and the device includes a storage medium and a processor; a storage medium for storing a computer program; a processor for executing a computer program to implement the above-described target game user detection method as shown in fig. 1 and 2.

Optionally, the entity device may further include a user interface, a network interface, a camera, a Radio Frequency (RF) circuit, a sensor, an audio circuit, a WI-FI module, and the like. The user interface may include a Display screen (Display), an input unit such as a keypad (Keyboard), etc., and the optional user interface may also include a USB interface, a card reader interface, etc. The network interface may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), etc.

It will be understood by those skilled in the art that the above-described physical device structure provided in the present embodiment is not limited to the physical device, and may include more or less components, or combine some components, or arrange different components.

The storage medium may further include an operating system and a network communication module. The operating system is a program that manages the hardware and software resources of the above-described physical devices, and supports the operation of the information processing program as well as other software and/or programs. The network communication module is used for realizing communication among components in the storage medium and communication with other hardware and software in the information processing entity device.

Through the above description of the embodiments, those skilled in the art will clearly understand that the present application can be implemented by software plus a necessary general hardware platform, and can also be implemented by hardware. By applying the scheme of the embodiment, whether the user to be identified is the game studio user can be accurately judged according to the role behavior characteristics and the role attribute characteristics of the user to be identified in the game and by combining the role behavior characteristics and the role attribute characteristics of the user judged to be the game studio user in the game. Compared with the prior Turing test modes such as verification codes, the embodiment does not need to issue the verification codes for verification, can not be cracked easily, can improve the detection accuracy of the game studio users, and can not influence the game experience of other normal players. The manual dependence is reduced, the detection of the game studio users can be automatically completed, and the detection efficiency of the game studio users is improved. And the reference basis of the judgment is the relevant characteristic data of the role of the game player in the game, and the privacy data of the player in the real life is not related, so that the user privacy is ensured.

Those skilled in the art will appreciate that the figures are merely schematic representations of one preferred implementation scenario and that the blocks or flow diagrams in the figures are not necessarily required to practice the present application. Those skilled in the art will appreciate that the modules in the devices in the implementation scenario may be distributed in the devices in the implementation scenario according to the description of the implementation scenario, or may be located in one or more devices different from the present implementation scenario with corresponding changes. The modules of the implementation scenario may be combined into one module, or may be further split into a plurality of sub-modules.

The above application serial numbers are for description purposes only and do not represent the superiority or inferiority of the implementation scenarios. The above disclosure is only a few specific implementation scenarios of the present application, but the present application is not limited thereto, and any variations that can be made by those skilled in the art are intended to fall within the scope of the present application.

Claims

1. A method for detecting a target game user, comprising:

2. The method according to claim 1, wherein the performing feature extraction on the first preprocessed data by using a random forest algorithm to obtain second preprocessed data specifically comprises:

calculating the importance of each feature in the first preprocessed data by using a random forest algorithm;

and according to the importance of each feature, performing feature extraction on the first preprocessing data to obtain second preprocessing data.

3. The method according to claim 2, wherein the calculating the importance of each feature in the first preprocessed data by using a random forest algorithm specifically comprises:

generating a plurality of feature subsets from the first preprocessed data;

constructing a plurality of decision trees according to the plurality of feature subsets;

calculating a first error of the out-of-bag data corresponding to each decision tree, wherein the out-of-bag data is data which does not participate in the decision tree construction when the decision tree is constructed;

randomly selecting target characteristics in the data outside the bag, and after adding random noise interference to the target characteristics, calculating a second error of the data outside the bag corresponding to each decision tree again;

respectively calculating the difference value between the first error and the second error of the corresponding out-of-bag data of each decision tree;

and adding and summing the difference values corresponding to each decision tree, and dividing the sum by the number of the decision trees to obtain the importance of the target feature.

4. The method according to claim 2, wherein the extracting features from the first preprocessed data according to the importance of each feature to obtain second preprocessed data specifically comprises:

sorting each feature in the feature data according to the importance of each feature;

deleting the ranked features of the importance in the ranking according to a preset deletion proportion to obtain new feature data;

calculating the importance of each feature in the new feature data by using a random forest algorithm;

according to the importance of each feature in the new feature data, repeatedly executing the processes of feature sorting, feature deleting and importance calculating until the latest obtained feature data meets the preset quality condition;

and determining the second preprocessing data according to the characteristic data meeting the preset quality condition.

5. The method according to claim 4, wherein the determining the second preprocessing data according to the feature data meeting the preset quality condition specifically includes:

generating a plurality of feature subsets according to the feature data meeting the preset quality conditions;

calculating a third error of the corresponding out-of-bag data of each decision tree;

and selecting the feature subset corresponding to the decision tree with the lowest third error as the second preprocessing data.

6. The method of claim 1, wherein the role behavior characteristics include one or more of a role cycle online time, a role cycle online and/or offline time, and historical recharging information of an account where a role is located, and the role attribute characteristics include one or more of role grade information, a role server ID, role race information, role occupation information, and a role number under an account where a role is located;

the performing numerical preprocessing on the role behavior characteristics and the role attribute characteristics to obtain first preprocessing data specifically includes:

calculating the average value of the periodic online time of the roles;

calculating a standard deviation of the online or offline time in the role period;

calculating the accumulated recharging amount and/or the recharging amount in the statistical period according to the historical recharging information of the account of the role;

acquiring the grade number of the role in the game according to the role grade information;

acquiring race type numbers of the characters in the game according to the race information of the characters;

and acquiring the occupation type number of the role in the game according to the occupation information of the role.

7. The method of claim 1, wherein prior to said inputting said second preprocessed data into a classification model, said method further comprises:

the method comprises the steps of numerically preprocessing role behavior characteristics and role attribute characteristics of a game studio user in a game;

carrying out feature extraction on the sample feature data subjected to numerical preprocessing by using a random forest algorithm;

and establishing a training set according to the extracted feature data, and training by utilizing a decision tree algorithm to obtain the classification model.

8. The method according to claim 7, wherein the creating a training set according to the extracted feature data and training by using a decision tree algorithm to obtain the classification model specifically comprises:

configuring a sample label corresponding to the extracted feature data;

adding the extracted feature data and a sample label corresponding to the feature data into a training set;

if the sample labels of all sample data in the training set belong to the first category, the decision tree is a single-node tree, the categories of the nodes in the decision tree are marked according to the target category, and a classification model of the decision tree is returned;

if the sample data corresponding to the extracted feature data in the training set is empty, the decision tree is a single-node tree, the class of the node in the decision tree is marked according to the second class with the largest number of samples in the training set, and the classification model of the decision tree is returned;

if the sample data corresponding to the extracted feature data in the training set is not empty, calculating the feature with the largest information gain rate in the sample data corresponding to the extracted feature data;

when the information gain rate of the features with the maximum information gain rate is smaller than a preset threshold value, judging that the decision tree is a single-node tree, marking the classes of the nodes in the decision tree according to the second class with the maximum number of samples in the training set, and returning to the classification model of the decision tree;

and when the information gain rate of the features with the maximum information gain rate is greater than or equal to the preset threshold, dividing the training set into a plurality of non-empty subsets according to all possible values of the features with the maximum information gain rate, taking the third class with the maximum sample number in each non-empty subset as a mark, constructing child nodes of the decision tree to realize the construction of the decision tree, and returning to the classification model of the decision tree.

9. The method of claim 7, wherein if it is determined that the user to be identified is a game studio user, the method further comprises:

and expanding the training set according to the role behavior characteristics and the role attribute characteristics of the user to be recognized so as to update and train the classification model by using the expanded training set.

10. The method according to claim 1, wherein the determining whether the user to be identified is a game studio user with reference to the classification result output by the classification model specifically comprises:

if the character output by the classification model is abnormal, determining that the user to be identified is a game studio;

and limiting the game account corresponding to the user to be identified or the abnormal role under the game account.

11. The method of any one of claims 1 to 10, wherein the game is of a massively multiplayer online role-playing game type and the decision tree model is a C4.5 algorithm model.

12. A target game user detection apparatus, comprising:

13. A storage medium on which a computer program is stored, which program, when being executed by a processor, is adapted to carry out the method of any one of claims 1 to 11.

14. A target game user detection device comprising a storage medium, a processor and a computer program stored on the storage medium and executable on the processor, wherein the processor implements the method of any one of claims 1 to 11 when executing the program.