CN107281755B

CN107281755B - Detection model construction method and device, storage medium and terminal

Info

Publication number: CN107281755B
Application number: CN201710576568.1A
Authority: CN
Inventors: 范长杰; 胡志鹏; 程龙; 刘柏
Original assignee: Netease Hangzhou Network Co Ltd
Current assignee: Netease Hangzhou Network Co Ltd
Priority date: 2017-07-14
Filing date: 2017-07-14
Publication date: 2020-05-05
Anticipated expiration: 2037-07-14
Also published as: CN107281755A

Abstract

The invention discloses a method and a device for constructing a detection model, a storage medium, a processor and a terminal. The method comprises the following steps: acquiring a plurality of information classes from data to be verified, wherein the data to be verified is extracted from newly added log data in a preset range, the plurality of information classes are used for determining a detection model to be used, and the detection model is used for detecting abnormal behaviors in a game; the detection model is constructed by comparing the information gain of each of a plurality of information classes. The invention solves the technical problems that the game plug-in detection scheme provided by the related technology needs to acquire complete game data, has large calculation amount and cannot meet the increment change requirement of the game data.

Description

Detection model construction method and device, storage medium and terminal

Technical Field

The invention relates to the field of computers, in particular to a method and a device for constructing a detection model, a storage medium and a terminal.

Background

The detection of the cheating game is generally determined to be an anomaly detection problem, namely, anomaly data is extracted from a series of log data. Anomaly detection is of great interest in the field of data mining and plays an important role in many practical application scenarios. For example: credit card fraud detection, network intrusion detection, and other abnormal behavior detection. Currently, a number of prior art techniques have been successfully developed and dedicated to anomaly detection studies, such as: a graph-based anomaly detection method and a tensor-based anomaly detection method.

User behavior analysis is a method commonly used in the industry today for conducting game script robot detection. This approach uses game player log data extracted from the game server to detect some unconventional closely designed game cheats using data mining. The unconventional plug-in is different from the traditional plug-in and can be discriminated by simple client detection. The unconventional plug-in has stronger concealment and is more difficult to be discovered, and plug-in designers are also familiar with how to take timely countermeasures against the problem that the plug-in is forced to stop using, such as: changes the specific content contained in the plug-in to avoid the plug-in detection mechanism of the game company. For this reason, an improvement is introduced in the related art, which is to analyze log data from different angles, such as: analyzing social activity, economic activity or motion tracks in a virtual map in a game environment. Common data mining techniques are utilized, such as: classification (including, but not limited to, support vector machines, linear regression) or clustering techniques (including, but not limited to, k-means clustering, hierarchical clustering, etc.) mines player characteristic data that is selected by experience or domain knowledge.

However, the solutions proposed in the related art require the collection of complete game data, and do not consider that the game data itself may change over time. Particularly, when the total amount of game data is large, more hardware resources are required to be consumed, the processing time is long, and the processing complexity is high.

In view of the above problems, no effective solution has been proposed.

Disclosure of Invention

At least one embodiment of the invention provides a method, a device, a storage medium and a terminal for constructing a detection model, which are used for solving the technical problems that a game plug-in detection scheme provided in the related technology needs to acquire complete game data, the calculated amount is large, and the increment change requirement of the game data cannot be met.

According to an embodiment of the present invention, a method for constructing a detection model is provided, including:

acquiring a plurality of information classes from data to be verified, wherein the data to be verified is extracted from newly added log data in a preset range, the plurality of information classes are used for determining a detection model to be used, and the detection model is used for detecting abnormal behaviors in a game; the detection model is constructed by comparing the information gain of each of a plurality of information classes.

Optionally, the obtaining a plurality of information classes from the data to be verified includes: extracting log data from the game server; selecting data to be verified from the log data according to a preset condition, wherein the data to be verified comprises: the method comprises the steps that a plurality of types of action data executed by a plurality of currently collected game characters in the same time dimension are respectively set as an information class.

Optionally, the constructing the detection model by comparing the information gain of each of the plurality of information classes comprises: a comparison step: comparing the information gain of each information class in a plurality of information classes, and selecting a first information class and a second information class, wherein the information gain of the first information class is the maximum value of the information gains of the information classes, and the information gain of the second information class is the second largest value of the information gains of the information classes; the processing steps are as follows: when the difference value between the information gain of the first information class and the information gain of the second information class is larger than a preset threshold value, setting action data corresponding to the first information class as a current construction element of the detection model; splitting: and determining the next construction element to be generated according to the splitting condition corresponding to the first information class, and returning to the comparison step.

Optionally, before the constructing the detection model by comparing the information gain of each of the plurality of information classes, the method further includes: respectively calculating the information entropy corresponding to each information class according to the occupation proportion of each information class in a plurality of information classes; setting whether each information class is characterized as abnormal behavior in the game as a judgment condition, and respectively calculating a condition entropy corresponding to each information class; and calculating the information gain of each information class by adopting the information entropy corresponding to each information class and the conditional entropy corresponding to each information class.

Optionally, the processing step further comprises: creating a surrogate model associated with the detection model if it is determined at the current build element that the distribution of the action data contained in the plurality of information classes has changed, wherein the surrogate model comprises: and detecting all generated construction elements in the model.

Optionally, determining, at the current construction element, that the distribution of the action data contained in the plurality of information classes has changed comprises: adding an action attribute value corresponding to each game role in the plurality of game roles to the sliding window, wherein the action attribute value represents whether the action executed by each game role is abnormal or not; dividing the sliding window into a first part of sub-windows and a second part of sub-windows; and when the absolute value of the difference value between the first parameter value and the second parameter value is determined to be greater than or equal to the preset threshold, continuously discarding the data which is newly added to the sliding window, and determining that the distribution of the action data changes until the absolute value is less than the preset threshold, wherein the first parameter value is the average value of the data in the first part of sub-windows, and the second parameter value is the average value of the data in the second part of sub-windows.

Optionally, after creating the surrogate model, further comprising: and comparing the detection model with the substitution model by adopting a preset judgment index, wherein the preset judgment index comprises at least one of the following indexes: recall rate, accuracy; and when the judgment result shows that the substitution model is superior to the detection model, replacing the detection model by using the substitution model.

Optionally, the splitting step further comprises: after determining the next building element to be generated, acquiring a control command, wherein the control command is used for indicating that the splitting is stopped at the next building element to be generated.

According to an embodiment of the present invention, there is also provided a device for constructing a detection model, including:

the system comprises an acquisition module, a verification module and a verification module, wherein the acquisition module is used for acquiring a plurality of information classes from data to be verified, the data to be verified is extracted from newly added log data in a preset range, the information classes are used for determining a detection model to be used, and the detection model is used for detecting abnormal behaviors in a game; and the construction module is used for constructing a detection model by comparing the information gain of each of a plurality of information classes.

Optionally, the obtaining module includes: an extraction unit for extracting log data from the game server; the device comprises an acquisition unit and a verification unit, wherein the acquisition unit is used for selecting data to be verified from log data according to preset conditions, and the data to be verified comprises: the method comprises the steps that a plurality of types of action data executed by a plurality of currently collected game characters in the same time dimension are respectively set as an information class.

Optionally, the building block comprises: the first comparison unit is used for comparing the information gain of each of the plurality of information classes and selecting a first information class and a second information class, wherein the information gain of the first information class is the maximum value of the information gains of the plurality of information classes, and the information gain of the second information class is the second largest value of the information gains of the plurality of information classes; the processing unit is used for setting the action data corresponding to the first information class as a current construction element of the detection model when the difference value of the information gain of the first information class and the information gain of the second information class is larger than a preset threshold value; and the splitting unit is used for determining the next construction element to be generated according to the splitting condition corresponding to the first information class and returning to the first comparison unit.

Optionally, the apparatus further comprises: the first calculation module is used for respectively calculating the information entropy corresponding to each information class according to the occupation proportion of each information class in a plurality of information classes; the second calculation module is used for setting whether each information class is characterized as abnormal behavior in the game as a judgment condition and respectively calculating the condition entropy corresponding to each information class; and the third calculating module is used for calculating the information gain of each information class by adopting the information entropy corresponding to each information class and the conditional entropy corresponding to each information class.

Optionally, the processing unit is further configured to create a surrogate model associated with the detection model if it is determined at the current construction element that the distribution of the action data contained in the plurality of information classes changes, wherein the surrogate model includes: and detecting all generated construction elements in the model.

Optionally, the processing unit comprises: the adding subunit is used for adding an action attribute value corresponding to each game character in the plurality of game characters to the sliding window, wherein the action attribute value represents whether the action executed by each game character is abnormal or not; a dividing subunit, configured to divide the sliding window into a first part of sub-windows and a second part of sub-windows; and the processing subunit is configured to, when it is determined that the absolute value of the difference between the first parameter value and the second parameter value is greater than or equal to the preset threshold, continuously discard the data newly added to the sliding window, and determine that the motion data distribution changes until the absolute value is smaller than the preset threshold, where the first parameter value is an average value of data in the first part of the sub-windows, and the second parameter value is an average value of data in the second part of the sub-windows.

Optionally, the building block further comprises: a second comparing unit, configured to compare the detection model with the substitution model by using a preset determination index, where the preset determination index includes at least one of: recall rate, accuracy; and the replacing unit is used for replacing the detection model by adopting the substitution model when the judgment result shows that the substitution model is better than the detection model.

Optionally, the splitting unit is further configured to obtain a control command after determining a next to-be-generated building element, where the control command is used to instruct that the splitting is stopped at the next to-be-generated building element.

According to an embodiment of the present invention, there is further provided a storage medium including a stored program, where the storage medium is controlled to execute the method for constructing the detection model.

According to an embodiment of the present invention, there is further provided a processor, configured to execute a program, where the program executes the method for constructing the detection model.

According to an embodiment of the present invention, there is also provided a terminal, including: one or more processors, a memory, a display device, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs comprising a build method for performing the above-described detection model.

In at least one embodiment of the invention, a mode of obtaining a plurality of information classes from data to be verified and establishing a detection model by comparing information gain of each information class in the plurality of information classes is adopted, and the purpose of continuously reconstructing the detection model for detecting the game plug-in along with time change is achieved by utilizing the data to be verified extracted from newly added log data in a preset range, so that the technical effects of reducing the calculation amount of the detection model and meeting the requirement of incremental change of game data are achieved, and the technical problems that the game plug-in detection scheme provided in the related technology needs to obtain complete game data, the calculation amount is large and the requirement of incremental change of the game data cannot be met are solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

FIG. 1 is a method for constructing a test model according to one embodiment of the present invention;

FIG. 2 is a block diagram of an apparatus for constructing a detection model according to an embodiment of the present invention;

fig. 3 is a block diagram of a construction apparatus of a detection model according to a preferred embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

In accordance with one embodiment of the present invention, there is provided an embodiment of a method for building a test model, wherein the steps illustrated in the flowchart of the figure may be performed in a computer system such as a set of computer-executable instructions, and wherein, although a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in an order different than that illustrated.

Fig. 1 is a method for constructing a detection model according to an embodiment of the present invention, as shown in fig. 1, the method includes the following steps:

step S12, obtaining a plurality of information classes from the data to be verified, wherein the data to be verified is extracted from newly added log data in a preset range, the plurality of information classes are used for determining a detection model to be used, and the detection model is used for detecting abnormal behaviors in the game (for example, a plug-in is used in the game);

in step S16, a detection model is constructed by comparing the information gain of each of a plurality of information classes.

Through the steps, a mode of obtaining a plurality of information classes from the data to be verified and establishing a detection model by comparing the information gain of each information class in the plurality of information classes is adopted, and the purpose that the detection model for detecting the game plug-in is continuously reconstructed along with the time change is achieved by utilizing the data to be verified extracted from the newly added log data in the preset range, so that the technical effects of reducing the calculation amount of the established detection model and meeting the incremental change requirement of the game data are achieved, and the technical problems that the game plug-in detection scheme provided in the related technology needs to obtain complete game data, the calculation amount is large and the incremental change requirement of the game data cannot be met are solved.

Optionally, in step S12, the obtaining of the plurality of information classes from the data to be verified may include the following steps:

step S121, extracting log data from the game server;

step S122, selecting data to be verified from the log data according to a preset condition, wherein the data to be verified comprises: the method comprises the steps that a plurality of types of action data executed by a plurality of currently collected game characters in the same time dimension are respectively set as an information class.

The log data may be log data provided on a game server. Data to be verified may generally include, but is not limited to: logging in a game, exiting the game, accepting a task, completing the task, entering a map, exiting the map, drawing multiple experiences, releasing skills, a game character A killing a game character B, a game character killing a monster, using an article, dropping the article, dropping equipment, trading a commodity, entering a copy, exiting the copy, obtaining money, using the money, upgrading records and experience change records.

Each game player controlled character may generate a variety of actions. In order to capture the motion characteristics of these motions, it is first necessary to use a feature vector to represent the motion generated by the game character over each time period. There are many statistical methods (corresponding to the preset conditions), and the simplest method is to count the occurrence frequency of each action. In addition to the absolute value of the frequency, a relative value (for example, the frequency of the action of a specific game character divided by the sum of the frequencies of the actions of all game characters) may be used.

The time period can be customized by a user, the level can be used as a time axis according to a natural time axis, namely year, month, day, hour, minute and second, or virtual time in the game, and the frequency of actions of the game role at each level or continuous multiple levels, such as 1-5 levels, 6-10 levels, 11-15 levels and the like, is adopted. In the process of selecting data, it is usually necessary to ensure that the time span of the selected game character is consistent for subsequent processing. For example: selecting game characters which generate game data within 30 days of 1/2017-30, and counting indexes such as action frequency and the like according to a period of one-day time length, or selecting game characters which generate game data within 1-45 levels.

It should be noted that, since the number of plug-ins is usually less than the number of normal game characters, the number of categories may be balanced by sampling first. For example: in the field of data mining, a preset sample set is divided into two types, namely a plug-in game role and a normal game role. If the ratio difference between the two types of game characters is too large, assuming that only 5 game characters in 200 game characters are add-on game characters and the other 195 game characters are normal game characters, then the add-on game characters can be repeatedly added into the preset sample set, for example: the 5 external game characters are repeatedly put into the preset sample set for 39 times, and then subsequent training operations are performed.

Alternatively, the step S16, by comparing the information gain of each of the plurality of information classes, constructing the detection model may include the following steps:

step S161, comparing the information gain of each of the plurality of information classes, and selecting a first information class and a second information class, where the information gain of the first information class is the maximum value of the information gains of the plurality of information classes, and the information gain of the second information class is the second largest value of the information gains of the plurality of information classes;

step S162, when the difference value between the information gain of the first information class and the information gain of the second information class is larger than a preset threshold value, setting the action data corresponding to the first information class as a current construction element of the detection model;

step S163, determining the next construction element to be generated according to the splitting condition corresponding to the first information class, and returning to step S161.

To further explain the detection model as an incremental decision binary tree as an example, first, a root node (which is also obtained by comparing the information gain of each of a plurality of information classes) needs to be established, and a statistic is initialized at the root node and marked as a_ijk. For each instance (x, y), x represents a feature vector of the action of a particular game character over a particular time period, and y represents the gameAnd (3) judging whether the game role belongs to the judgment value of the plug-in game role, wherein 0 can be used for representing a normal game role, and 1 can be used for representing the plug-in game role. Then, the game role is tested by adopting a decision tree under an initialization condition, and then the following Hoeffding self-adaptive tree growing step is executed. The test here is to obtain the test results such as accuracy and recall so as to output them as the judgment effect.

Firstly, the above example (x, y) is first passed through a decision path along the existing decision tree to reach a specific leaf node, and then the estimator a of all the nodes and leaf nodes passed by the path is updated_ijk. If there is an alternative tree T for the current leaf node_altThen this replaceable tree also needs to perform the corresponding Hoeffding adaptive tree growing step. The information gain G for each action is calculated and its effect is to evaluate whether the leaf node's present condition is suitable for performing the splitting operation. When calculating the information gain, a discretization step is required. Discretization is a common operation in decision trees, whether traditional or incremental, that is primarily directed to non-discrete value attributes or game character actions. So-called discrete values are just as well as men and women, or primary school, junior middle school, high school, college and the like, which can take on values marked with limited numbers. When the decision tree is split, discretization operation is needed to be carried out on the attribute of the continuous value so as to obtain different child nodes.

A robust incremental gaussian discretization method can be used if the value of the action is continuous rather than discrete. After applying this incremental gaussian discretization method, the continuous values can be discretized. If the information gain of the attribute (the number of times the game character takes action) whose information gain is the largest, the value of the information gain minus the second largest attribute is larger than

Then the action of maximum information gain is split and an estimator is initialized for each split branch after discretization. By adopting twoThe division method divides the continuous value interval into two parts which are greater than the splitting point and less than or equal to the splitting point.

Optionally, before the step S16, the method may further include, before the step of building the detection model by comparing the information gain of each of the plurality of information classes, the following steps:

step S13, respectively calculating the information entropy corresponding to each information class according to the occupation ratio of each information class in a plurality of information classes;

step S14, setting whether each information class is characterized as abnormal behavior in the game as a judgment condition, and respectively calculating the condition entropy corresponding to each information class;

step S15, using the information entropy corresponding to each information class and the conditional entropy corresponding to each information class, calculates the information gain of each information class.

In a preferred embodiment, the information gain is calculated as follows: for a distribution, for example: a particular data set includes three information classes: class a, class B and class C, each of which accounts for p (a) 0.2, p (B) 0.3, and p (C) 1-0.2-0.3-0.5, respectively, then the information entropy H ═ Σ_ip_ilog(p_i) I.e., 0.2log (0.2) +0.3log (0.3) +0.5log (0.5) — 0.301. in general, the smaller the entropy, the more ordered the data set. When the data set has only one category, the entropy is 0, which is the minimum of the entropy, indicating that the data set is fully ordered. On the basis, a condition information entropy, namely H (Y | X) ∑ X ∈ xp (X) H (Y | X ═ X), is used for judging whether the game character is the plug-in game character or not, and the condition entropy is a description divided for a specific action (for example, the number of times of entering the copy in the game). The difference H (X) -H (Y | X) of the entropy before and after the division for a particular action is the information gain.

Optionally, in step S162, after the action data corresponding to the first information class is set as the current building element of the detection model, the following steps may be further performed:

step S164, if it is determined at the current construction element that the distribution of the action data included in the plurality of information classes changes, creating a substitution model associated with the detection model, wherein the substitution model includes: and detecting all generated construction elements in the model.

If the change detector determines that the data distribution has changed. Assume that action a, action B, and action C currently exist. If the game character controlled by the plug-in is not all executed, the action (such as action A) which takes a long time period and has high profit is selected to execute. But if it is found that the game character repeatedly performs action a to be handled by a game store, it is likely to encounter a capping penalty. Therefore, the game plug-in starts to control the game character to repeatedly execute the action B and the action C so as to achieve the purpose of avoiding penalties. Thus, the transition from repeatedly performing action a to repeatedly performing actions B and C causes the data distribution to change.

If the leaf node has no replaceable tree, a replaceable tree T is created at the leaf node_alt(ii) a If there is already an alternative tree that is more accurate, then the leaf node l is replaced with an alternative tree T_alt。

It should be noted that, before creating the leaf nodes of the replaceable tree, the original tree and the replaceable tree contain the same contents of the nodes, and from the creation of the leaf nodes of the replaceable tree, the original tree and the replaceable tree grow independently, wherein both may contain the nodes with the same contents.

Optionally, in step S164, determining that the distribution of the action data included in the plurality of information classes changes at the current construction element may include performing the steps of:

step S1641, add an action attribute value corresponding to each game character of the plurality of game characters to the sliding window, where the action attribute value indicates whether the action performed by each game character is abnormal (e.g., is manipulated by a game plug-in);

step S1642, dividing the sliding window into a first part of sub-windows and a second part of sub-windows;

step S1643, when it is determined that the absolute value of the difference between the first parameter value and the second parameter value is greater than or equal to the preset threshold, continuously discarding the data newly added to the sliding window, and determining that the motion data distribution changes until the absolute value is smaller than the preset threshold, where the first parameter value is an average value of data in the first part of sub-windows, and the second parameter value is an average value of data in the second part of sub-windows.

In a specific implementation, the sliding window W may be initialized first. Each sliding window contains a sequence consisting of 0 and 1, wherein 1 represents that the corresponding feature vector is a plug-in feature vector, and 0 represents that the corresponding feature vector is a normal feature vector. Whenever a new feature vector is acquired, 0 or 1 can be inserted in the sliding window to calculate the variance and total using 0 and 1 in the window.

Then, the sliding window W is arbitrarily divided so that W becomes W0+ W1 if the condition cannot be satisfied

The motion attribute value corresponding to the last feature vector in the sliding window needs to be discarded until the above condition is satisfied

Wherein the content of the first and second substances,

and

are the average of the data within windows W0 and W1, and n0 and n1 represent the length of the windows.

And updating the statistical information every time a new feature vector is acquired, and dynamically adjusting the size of the sliding window to ensure that the sliding window is as large as possible and the data distribution in the sliding window tends to be consistent. Additionally, if there are instances of discarding, the change detector may be notified of the changed data distribution.

Optionally, after creating the surrogate model in step S164, the following steps may be further included:

step S165, comparing the detection model with the substitution model by adopting a preset judgment index, wherein the preset judgment index comprises at least one of the following indexes: recall rate, accuracy;

and step S166, when the judgment result shows that the substitution model is better than the detection model, replacing the detection model by using the substitution model.

The recall ratio is the proportion of the number of correctly detected stores to the number of all stores that should be detected. Assuming that the number of preset feature vectors to be detected is 500, the number of normal feature vectors is 420, and the number of plug-in feature vectors is 80, if the number of the finally detected suspected plug-in feature vectors is 100, where 60 feature vectors are plug-in feature vectors and 40 feature vectors are plug-in feature vectors, the recall rate is the ratio of 60 to 80, that is, 75%.

The accuracy rate is the proportion of the number of correctly detected plug-ins to the number of actually detected plug-ins. If the number of the preset feature vectors to be detected is 500, the number of the normal feature vectors is 420, and the number of the plug-in feature vectors is 80, if the number of the finally detected suspected plug-in feature vectors is 100, the accuracy is the ratio of 80 to 100, namely 80%.

The quality between the substitution model and the detection model can be judged through at least one of the recall rate and the accuracy rate, and then whether the substitution model needs to be adopted to replace the detection model is determined.

Optionally, in step S163, after determining the next building element to be generated, the following steps are further included:

in step S167, a control command is obtained, where the control command is used to instruct to stop splitting at the next building element to be generated.

If the effect of stopping the splitting of the node (i.e. the next building element to be generated) needs to be achieved, only a specific condition needs to be set at the splitting judgment position, so that the splitting can be stopped when the judgment result is always false (equivalent to the control command). For example: if the number of times that the game character a daily enters the copy is 5 times, but suddenly increases to 50 times in some period of time, then it is highly likely that the game character a performs the copy entry operation by the game cheater. However, if a particular prize is to be awarded within the game for a particular festival (e.g., spring festival, national day festival), for example: the probability of obtaining the best equipment in the copy is increased, or a double empirical value can be obtained in the copy, or a rare BOSS is arranged in the copy, so that the phenomenon that the number of times of entering the copy in the time period is suddenly increased from 5 times to 50 times belongs to normal operation without continuously performing the splitting operation.

According to an embodiment of the present invention, an embodiment of a device for constructing a detection model is further provided, and fig. 2 is a block diagram of a structure of the device for constructing a detection model according to an embodiment of the present invention. As shown in fig. 2, the apparatus may include: the system comprises an acquisition module 10, a verification module and a verification module, wherein the acquisition module is used for acquiring a plurality of information classes from data to be verified, the data to be verified is extracted from newly added log data in a preset range, the information classes are used for determining a detection model to be used, and the detection model is used for detecting abnormal behaviors in a game; and a construction module 20, configured to construct a detection model by comparing the information gain of each of the plurality of information classes.

Optionally, the obtaining module 10 includes: an extracting unit (not shown in the figure) for extracting log data from the game server; an obtaining unit (not shown in the figure) configured to select data to be verified from the log data according to a preset condition, where the data to be verified includes: the method comprises the steps that a plurality of types of action data executed by a plurality of currently collected game characters in the same time dimension are respectively set as an information class.

Optionally, the building module 20 may include: a first comparing unit (not shown in the figure) for comparing the information gain of each of the plurality of information classes to select a first information class and a second information class, wherein the information gain of the first information class is the maximum value of the information gains of the plurality of information classes, and the information gain of the second information class is the second largest value of the information gains of the plurality of information classes; a processing unit (not shown in the figure) configured to set, when a difference between an information gain of the first information class and an information gain of the second information class is greater than a preset threshold, motion data corresponding to the first information class as a current construction element of the detection model; and a splitting unit (not shown in the figure) for determining the next construction element to be generated according to the splitting condition corresponding to the first information class, and returning to the first comparing unit.

Alternatively, fig. 3 is a block diagram of a device for constructing a detection model according to a preferred embodiment of the present invention. As shown in fig. 3, the apparatus may further include: the first calculating module 30 is configured to calculate an information entropy corresponding to each information class according to an occupation ratio of each information class in the plurality of information classes; the second calculating module 40 is configured to set, as a judgment condition, whether each information class is characterized as an abnormal behavior in the game, and calculate a conditional entropy corresponding to each information class respectively; and a third calculating module 50, configured to calculate an information gain of each information class by using the information entropy corresponding to each information class and the conditional entropy corresponding to each information class.

Optionally, the processing unit may include: an adding subunit (not shown in the figure) configured to add, to the sliding window, an action attribute value corresponding to each of the plurality of game characters, where the action attribute value indicates whether an action performed by each of the game characters is abnormal; a dividing subunit (not shown in the figure) for dividing the sliding window into a first part of sub-windows and a second part of sub-windows; and a processing subunit (not shown in the figure), configured to, when it is determined that the absolute value of the difference between the first parameter value and the second parameter value is greater than or equal to the preset threshold, continuously discard the data newly added to the sliding window, and determine that the motion data distribution changes until the absolute value is smaller than the preset threshold, where the first parameter value is an average value of data in the first part of the sub-windows, and the second parameter value is an average value of data in the second part of the sub-windows.

Optionally, the building module 20 may further include: a second comparing unit (not shown in the figure) for comparing the detection model and the substitution model by using a preset judgment index, wherein the preset judgment index comprises at least one of the following: recall rate, accuracy; and a replacing unit (not shown in the figure) for replacing the detection model with the substitution model when the determination result shows that the substitution model is better than the detection model.

According to an embodiment of the present invention, there is further provided a storage medium including a stored program, where the apparatus on which the storage medium is located is controlled to execute the method for constructing the detection model when the program runs. The storage medium may include, but is not limited to: various media capable of storing program codes, such as a U disk, a Read Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.

According to an embodiment of the present invention, there is further provided a processor, where the processor is configured to execute a program, and the program executes the method for constructing the detection model. The processor may include, but is not limited to: a Microprocessor (MCU) or a programmable logic device (FPGA), etc.

According to an embodiment of the present invention, there is also provided a terminal, including: one or more processors, a memory, a display device, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the programs comprising instructions for performing the above-described method of constructing a detection model. In some embodiments, the terminal may be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palmtop computer, and a Mobile Internet Device (MID), a PAD, and the like. The display device may be a touch screen type Liquid Crystal Display (LCD) that enables a user to interact with a user interface of the terminal. In addition, the terminal may further include: an input/output interface (I/O interface), a Universal Serial Bus (USB) port, a network interface, a power source, and/or a camera.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A construction method of a detection model is characterized by comprising the following steps:

acquiring a plurality of information classes from data to be verified, wherein the data to be verified is extracted from newly added log data in a preset range, the information classes are used for determining a detection model to be used, and the detection model is used for detecting abnormal behaviors in a game;

constructing the detection model by comparing the information gain of each of the plurality of information classes;

wherein the constructing the detection model by comparing the information gain of each of the plurality of information classes comprises: a comparison step: comparing the information gain of each information class in the plurality of information classes, and selecting a first information class and a second information class, wherein the information gain of the first information class is the maximum value of the information gains of the plurality of information classes, and the information gain of the second information class is the second largest value of the information gains of the plurality of information classes; the processing steps are as follows: when the difference value between the information gain of the first information class and the information gain of the second information class is larger than a preset threshold value, setting the action data corresponding to the first information class as a current construction element of the detection model; splitting: and determining the next construction element to be generated according to the splitting condition corresponding to the first information class, and returning to the comparison step.

2. The method of claim 1, wherein obtaining the plurality of information classes from the data to be verified comprises:

extracting the log data from a game server;

selecting the data to be verified from the log data according to a preset condition, wherein the data to be verified comprises: the method comprises the steps that a plurality of types of action data executed by a plurality of currently collected game characters in the same time dimension are respectively set as an information class.

3. The method of claim 1, further comprising, prior to constructing the detection model by comparing information gains for each of the plurality of information classes:

respectively calculating the information entropy corresponding to each information class according to the occupation proportion of each information class in the plurality of information classes;

setting whether each information class is characterized as abnormal behavior in the game as a judgment condition, and respectively calculating a condition entropy corresponding to each information class;

and calculating the information gain of each information class by adopting the information entropy corresponding to each information class and the conditional entropy corresponding to each information class.

4. The method of claim 1, wherein the processing step further comprises:

creating a surrogate model associated with the detection model if a change in the distribution of action data contained in the plurality of information classes is determined at the current build element, wherein the surrogate model comprises: and detecting all generated construction elements in the model.

5. The method of claim 4, wherein determining, at the current build element, that a change has occurred in a distribution of action data contained in the plurality of information classes comprises:

adding an action attribute value corresponding to each game role in a plurality of currently collected game roles to a sliding window, wherein the action attribute value represents whether an action executed by each game role is abnormal or not;

dividing the sliding window into a first part of sub-windows and a second part of sub-windows;

when the absolute value of the difference value between the first parameter value and the second parameter value is determined to be greater than or equal to a preset threshold value, continuously discarding the latest data added to the sliding window, and determining that the motion data distribution changes until the absolute value is smaller than the preset threshold value, wherein the first parameter value is the average value of the data in the first part of sub-windows, and the second parameter value is the average value of the data in the second part of sub-windows.

6. The method of claim 4, after creating the surrogate model, further comprising:

comparing the detection model with the substitution model by adopting a preset judgment index, wherein the preset judgment index comprises at least one of the following indexes: recall rate, accuracy;

and when the judgment result shows that the substitution model is better than the detection model, replacing the detection model by adopting the substitution model.

7. The method of claim 1, wherein the splitting step further comprises:

after determining the next building element to be generated, acquiring a control command, wherein the control command is used for indicating that the splitting is stopped at the next building element to be generated.

8. An apparatus for constructing a test model, comprising:

the system comprises an acquisition module, a verification module and a verification module, wherein the acquisition module is used for acquiring a plurality of information classes from data to be verified, the data to be verified is extracted from newly added log data in a preset range, the information classes are used for determining a detection model to be used, and the detection model is used for detecting abnormal behaviors in a game;

the construction module is used for constructing the detection model by comparing the information gain of each information class in the plurality of information classes;

wherein the building block comprises: a first comparing unit, configured to select a first information class and a second information class by comparing an information gain of each of the plurality of information classes, where the information gain of the first information class is a maximum value of the information gains of the plurality of information classes, and the information gain of the second information class is a second largest value of the information gains of the plurality of information classes; the processing unit is used for setting the action data corresponding to the first information class as a current construction element of the detection model when the difference value of the information gain of the first information class and the information gain of the second information class is larger than a preset threshold value; and the splitting unit is used for determining the next construction element to be generated according to the splitting condition corresponding to the first information class and returning the construction element to the first comparison unit.

9. The apparatus of claim 8, wherein the obtaining module comprises:

an extracting unit configured to extract the log data from a game server;

an obtaining unit, configured to select the data to be verified from the log data according to a preset condition, where the data to be verified includes: the method comprises the steps that a plurality of types of action data executed by a plurality of currently collected game characters in the same time dimension are respectively set as an information class.

10. The apparatus of claim 8, further comprising:

the first calculation module is used for respectively calculating the information entropy corresponding to each information class according to the occupation proportion of each information class in the plurality of information classes;

the second calculation module is used for setting whether each information class is characterized as abnormal behavior in the game as a judgment condition and respectively calculating the condition entropy corresponding to each information class;

and the third calculating module is used for calculating the information gain of each information class by adopting the information entropy corresponding to each information class and the conditional entropy corresponding to each information class.

11. The apparatus of claim 8, wherein the processing unit is further configured to create a surrogate model associated with the detection model if it is determined at the current build element that a distribution of action data included in the plurality of information classes has changed, wherein the surrogate model comprises: and detecting all generated construction elements in the model.

12. The apparatus of claim 11, wherein the processing unit comprises:

the adding subunit is used for adding an action attribute value corresponding to each game role in the plurality of currently collected game roles to the sliding window, wherein the action attribute value represents whether the action executed by each game role is abnormal;

a dividing subunit, configured to divide the sliding window into a first part of sub-windows and a second part of sub-windows;

and the processing subunit is configured to, when it is determined that an absolute value of a difference between a first parameter value and a second parameter value is greater than or equal to a preset threshold, continuously discard the latest data added to the sliding window, and determine that the motion data distribution changes until the absolute value is smaller than the preset threshold, where the first parameter value is an average value of data in the first part of sub-windows, and the second parameter value is an average value of data in the second part of sub-windows.

13. The apparatus of claim 11, wherein the building module further comprises:

a second comparing unit, configured to compare the detection model and the surrogate model by using a preset determination index, where the preset determination index includes at least one of: recall rate, accuracy;

and the replacing unit is used for replacing the detection model by adopting the substitution model when the judgment result shows that the substitution model is better than the detection model.

14. The apparatus of claim 8, wherein the splitting unit is further configured to obtain a control command after determining the next to-be-generated building element, wherein the control command is configured to instruct to stop splitting at the next to-be-generated building element.

15. A storage medium, characterized in that the storage medium comprises a stored program, wherein when the program runs, a device processor in the storage medium is controlled to execute the method for constructing the detection model according to any one of claims 1 to 7.

16. A terminal, comprising: one or more processors, a memory, a display device, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs comprising instructions for performing the method of constructing the detection model of any of claims 1 to 7.