CN107866072B - System for detecting plug-in by adopting incremental decision tree - Google Patents

System for detecting plug-in by adopting incremental decision tree Download PDF

Info

Publication number
CN107866072B
CN107866072B CN201711045371.1A CN201711045371A CN107866072B CN 107866072 B CN107866072 B CN 107866072B CN 201711045371 A CN201711045371 A CN 201711045371A CN 107866072 B CN107866072 B CN 107866072B
Authority
CN
China
Prior art keywords
tree
decision tree
model
player
panel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711045371.1A
Other languages
Chinese (zh)
Other versions
CN107866072A (en
Inventor
陈为
陆俊华
巫英才
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201711045371.1A priority Critical patent/CN107866072B/en
Publication of CN107866072A publication Critical patent/CN107866072A/en
Application granted granted Critical
Publication of CN107866072B publication Critical patent/CN107866072B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/70Game security or game management aspects
    • A63F13/75Enforcing rules, e.g. detecting foul play or generating lists of cheating players
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/55Controlling game characters or game objects based on the game progress
    • A63F13/58Controlling game characters or game objects based on the game progress by computing conditions of game characters, e.g. stamina, strength, motivation or energy level
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/80Special adaptations for executing a specific game genre or game mode
    • A63F13/822Strategy games; Role-playing games
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F2300/00Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
    • A63F2300/80Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game specially adapted for executing a specific type of game
    • A63F2300/807Role playing or strategy games

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Business, Economics & Management (AREA)
  • Computer Security & Cryptography (AREA)
  • General Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a system for detecting plug-in by adopting an incremental decision tree, which comprises the following steps: the data preprocessing module is used for cleaning the original data of the actions of the player and extracting the feature vectors; the model generation and interaction module is used for generating and outputting a model by taking the characteristic vector of the data preprocessing module as the input of the model, and receiving feedback to adjust the model; the high-level view visualization module is used for generating a dynamic tree diagram, a recommendation and display panel and an accuracy/recall rate line diagram according to the output model of the model generation and interaction module; the invention utilizes the dynamic decision tree to show decision processes in different time periods, analyzes the characteristics of the plug-in and the reason of being judged as the plug-in, and finds out some characteristics obviously distinguished from normal players; and because of the model characteristics, some characteristics of plug-in evolution can be explored; in addition, the user can also add own knowledge to prune the decision tree and perform other analysis processes through combination of various views.

Description

System for detecting plug-in by adopting incremental decision tree
Technical Field
The invention relates to the technical field of game plug-in detection, in particular to a system for plug-in detection by adopting an incremental decision tree.
Background
Multiplayer online role-playing games create a virtual social world in which players participate from various corners around the world. Players can interact with each other and spend a great deal of time, effort, and money in developing upgrades to their game characters. Surveys have found that multi-player online role-playing games worldwide generate revenue in the region of $ 198 million, only 2016. However, it also becomes a hotbed for some cyber crimes due to the fire heat of the game. One action that is important is the unofficially approved, approved cash transaction, which is the transaction of certain virtual items in real currency. Some advanced gaming cheats now specialize in such related violations as profit margins, but severely impact game balance, hurting game operations, gaming company revenues and player experience, game economics, and even the sustainable development of the entire game.
Therefore, game operators take a lot of measures, and common methods for detecting plug-ins include three methods, namely client side, network side and server side detection methods. The client side detects the plug-ins by using the principle similar to antivirus software embedded in a user host game client, but some elaborated plug-ins cannot be detected by the plug-ins at present; the network side needs to analyze the network traffic, which may cause problems of excessive network load, network delay, etc.; the server-side detection method analyzes the log data of the player on the server. The method is a popular method at present, and the external hanging detection problem is often regarded as an abnormal detection problem by the method. However, the existing method has some disadvantages, such as the plug-in design is more complex, and the input and output of a simple algorithm are difficult to understand for an analyst; on the other hand, the advanced plug-in also has an updating and upgrading function, so that plug-in detection becomes more difficult. At this time, the introduction of visual analysis technology is very important.
There are some visual analysis systems that are concerned with anomaly detection. For example, the # FluxFlow system of Jian Zhao et al visualizes the forwarding behavior of the Twitter user, and combines context information such as user attributes to research the characteristics of spreading of abnormal information on the Twitter. The TargetVue of Nan Cao et al helps analysts learn about potential social robots on the social media platform (such as those robotic accounts sending spam) including their communication activities, behavioral characteristics, social interactions, etc. through a model that gives users suspicion over time. These systems optimize the process of anomaly detection by integrating human domain knowledge, but still view the internal processes of the algorithmic model as a black box, and analysts cannot learn the relevant information of the algorithmic model through their system.
Although the existing methods provide rich context information, researchers still cannot deeply participate in the process of model building. Therefore, a desirable system would allow users to participate in model building to better aid in understanding the model, looking for potential explanations for some phenomena.
Disclosure of Invention
The invention provides a system for detecting the plug-in by adopting an incremental decision tree, which can help an analyst to detect the plug-in, understand a plug-in detection model, discover the plug-in and the characteristics of the behavior or action of a normal player changing along with time, and enable the analyst to interact with the model to perform interactive visual analysis and exploration.
A system for detecting cheating using incremental decision trees, comprising:
the data preprocessing module is used for cleaning the original data of the actions of the player and extracting the feature vectors;
the model generation and interaction module is used for generating and outputting a model by taking the characteristic vector of the data preprocessing module as the input of the model, and receiving feedback to adjust the model;
the high-level view visualization module is mainly used for representing the tree structure of the decision tree which changes along with the change of ice column diagrams which are arranged side by side, and generating a dynamic tree diagram, a recommendation and display panel and an accuracy/recall rate line diagram according to the output model of the model generation and interaction module;
the dynamic tree diagram displays the tree structure of the decision tree in a compact mode (ice column diagram is adopted), each node in the ice column diagram represents a split node of the decision tree, and a plurality of ice column diagrams are arranged in parallel to reflect the change of the decision tree along with time;
in the recommendation and presentation panel, the information of all nodes (rectangles of the decision tree) of the icicle graph(s) selected in the dynamic tree graph is presented in the recommendation panel by using a table which can be sorted, wherein the information comprises accuracy, recall rate, precision rate, occurrence frequency and the like; switching to the condition that if one node is selected in the dynamic tree graph in the display panel, displaying the condition that the node contains the player in a radar distribution graph mode;
the accuracy/recall ratio line graph and the dynamic tree graph are arranged up and down according to a time corresponding relation and represent the accuracy/recall ratio of each decision tree prediction along with the change of time.
The raw data of player actions is stored according to different actions, each action is a file (such as 1 month and 1 day in 2017, log) every day, and the game has hundreds of actions. The frequency of each person's actions on different timeslices over the time period of interest is counted. Note that the time period here may be natural time (year, month, day, hour, minute, and second) or game time (what action is performed on each of the level 1 and level 2 …, and what frequency is corresponding to the action). Thus each player has a corresponding feature vector for each time slice. In addition, because the number of actions is too large, the actions are classified (task related, attribute related, battle related and article related). This classification can be either a user's (e.g., game analysis expert) or some of the well-established inductive classification methods in some existing literature.
The decision tree shows a decision flow, each non-leaf node from the root node to the next is a judgment condition, and whether a certain attribute of the node meets the condition of the non-leaf node is judged according to an incoming instance. After the instance has gone to the leaf node, the leaf node will give a label for the class to tell you which class the instance belongs to. The conventional training process of the decision tree is to recursively determine, for each leaf node of the generated decision tree, what attribute is used by the child nodes below the leaf node as the split according to some indicators (such as information gain, Gini index, and the like).
As shown in fig. 1, the decision tree means that there is a decision flow, for example, to consider that a mountain climbing process is not going out today. Each box is a node under which several decision conditions are followed, and decisions are made based on the different conditions. Nodes of the tree can be represented by rectangles, but each node has several options, such as high humidity or normal humidity, strong wind or breeze, and the non-climbing path is (weather: rain) → non-climbing, (weather: sunny) → (humidity: large) → non-climbing, (weather: cloudy → windy: strong wind) → non-climbing along the tree all the way down until the leaf nodes have a judgment result.
The model which can be adopted by the invention is a Hoeffing adaptive decision tree (Hoeffing adaptive tree with Gaussian differentiation) which utilizes Gaussian discretization. The method is an online algorithm, and the decision tree can be trained online by utilizing the characteristics of the Hoeffding boundary, namely, the decision tree can be trained once data comes and is used in the decision tree only once; rather than requiring an entire batch of data, as in conventional decision trees, each of which is used to determine the splitting condition multiple times.
The Hoeffding community states that a random variationAmount, in the range of R, that the true mean deviates from its estimated value after n independent observations with a probability of 1-delta not being greater than
Figure GDA0002398095170000031
When judging that a node needs to use the attribute for splitting, two attributes with the largest information gain and the second largest information gain are found, the difference of the information gains is calculated, and if the information gain is larger than the element, an active tree node splitting effect can be guaranteed. Such a boundary can help train the tree out when only a portion or a small amount of data is available, rather than having to wait until all data is available to train. Such a decision tree is called a Hoeffding decision tree.
On the basis of the method, the invention also makes some improvements by utilizing the prior art. Firstly, because the Hoeffding decision tree only supports the discrete value attribute, a robust incremental Gaussian discretization method is adopted, so that the continuous value attribute is supported. Secondly, the data has the characteristic of concept drift, and the concept drift refers to the condition that: data generation may not be smooth and the generation process may vary. In the game data, the behavior of the plug-in may also change, because the plug-in will perceive the own plug-in number, change some characteristics to prevent to be checked and sealed the plug-in continuously. In this regard, adaptive winwing (ADWIN) is adopted, and a window and a corresponding estimator are added to the original hoeffing decision tree, and a detector is changed, so as to discover the concept drift phenomenon mentioned above.
Thus, the whole method is called a Hoeffding adaptive decision tree using Gaussian discretization. By using the method, the decision tree can grow continuously with the continuous inflow of data, and when some obvious changes are detected, some subtrees of the decision tree can be replaced and changed into another subtree. Thus, a tree that changes with time is obtained, and one person has a judgment result (0 normal player, 1 cheater) that it is judged to be cheating or a normal person at each time slice.
Specifically, the model building process is as follows:
step 1: initializing the state of the decision tree:
establishing a root node, and initializing a statistic named A at the root nodeijkThis statistic is part of the ADWIN method (at step 2). For each instance (x, y), x is the feature vector of a certain person's action in a time period, and the decision value y of whether it is a store, 0 is a normal person, 1 is a store), the generated tree is used to test first, and then the Hoeffding adaptive tree growing step of the following step 2 is performed.
Step 2, growth of decision tree
Firstly, the above (x, y) is firstly classified to a certain leaf node through a decision path along the existing decision tree, and all the nodes passed by the path and the estimator A of the leaf node are updatedijk. If the current leaf node l has an alternative tree TaltThis replaceable tree also performs the corresponding Hoeffding adaptive tree growing step (i.e., the current Hoeffding adaptive tree for this step). The information gain G for each action is computed (which may of course be a common indicator in other decision trees), and the role of this step is to evaluate whether the leaf node is now conditioned for split.
The information gain is calculated by first having a common concept called information entropy, and for a distribution, for example, having three classes a, B, and C, each class having a ratio of p (a) to 0.2, p (B) to 0.3, and p (C) to 1-0.2-0.3 to 0.5, and then having an information entropy H- Σipilog(pi). Here, 0.2log (0.2) +0.3log (0.3) +0.5log (0.5) — 0.301. In general, the smaller the entropy, the more ordered: for example, when the data set has only one category, the entropy is 0, which is the minimum value of the entropy, indicating that the data set is completely ordered. On the basis of the above, there is a conditional information entropy, H (Y | X) ═ Sigmax∈Xp (X) H (Y | X ═ X). Where Y is the judgment of the store, the conditional entropy is a description of the division, and X is some action, such as the number of times a copy is entered in the game. For the difference between the entropy before and after the division of a certain motion, H (X) -H (Y | X) is the information gain. The above i is a traversal method, i represents three categories of A, B and C, and it is common that
Figure GDA0002398095170000051
Σ if the set N is {1,2,3, …, N }, theni∈Ni is equal to 1+2+3+ … + n.
When calculating the information gain, a discretization step must be used, which facilitates the splitting if the value of the action is a continuous value instead of a discrete value. A robust incremental gaussian discretization method is used, after which the continuous values are discretized for the following splitting. If the information gain of the attribute having the largest information gain is subtracted by the second largest information gain, the value of the information gain is larger than
Figure GDA0002398095170000052
Then the action of maximum information gain is split and an estimator is initialized for each split branch. The splitting adopts a bisection method, namely, a continuous value interval is divided into two parts, namely a splitting point and a splitting point which are less than or equal to the splitting point. The concept of Hoeffding boundary is used, which means a random variable with the range of R, and the real mean value deviates from the estimated value after n independent observations with the probability of 1-delta not being larger than that of the estimated value
Figure GDA0002398095170000053
If the change detector (also adopts ADWIN as change detector) finds that the distribution of data generation has changed, if the leaf node l has no replaceable tree, a replaceable tree T is created at the leaf nodealt(ii) a If there is already an alternative tree that is more accurate, then the current node l is replaced by an alternative tree Talt
Step 3 ADWIN method
A sliding window W, variance, total number is initialized.
a. When a new instance comes, add into window W;
b. for arbitrary division of the window W, W-W0 + W1 if
Figure GDA0002398095170000054
If not, the last element in the window is discarded until the equation is satisfied, wherein,
Figure GDA0002398095170000061
(n0and n1The harmonic mean of),
Figure GDA0002398095170000062
and
Figure GDA0002398095170000063
is the average of the data in windows W0 and W1, n0And n1Is the length of the window.
c. If step b has a throw-away instance, then as a change detector, the external program is told that the data distribution has changed.
According to the above model, and at each time slice, one person has a judgment result of judging as a store or a normal person. The suspicion degree is the average of all the judgment results of the current time slice. For example, if the determination result of a person in time slices 1 to 7 is 0, 0, 0, 0, 1, 1, 0, then the suspicion degree of the person in time slice 7 is 2/7.
Tables in the recommendation tags in the recommendation and presentation panel, each column representing information, accuracy, recall, precision, frequency of occurrence, etc.; each row represents a certain node in the selected icicle diagram. The length of the gray band in the table is proportional to the relative size of its value in this column. And the columns can be sorted from high to low or from low to high by clicking the head-up of the table, namely the accuracy, the recall rate, the precision and the occurrence frequency.
Algorithm of radar profile:
the projection is some high-dimensional points (i.e. multi-dimensional vectors, which is said to be high-dimensional because things generally exceeding three-dimension cannot be directly drawn, and a person cannot imagine four-dimension and more than four-dimension, so that the projection is converted into low-dimensional points to be drawn for the person to see), and the high-dimensional points are intended to be drawn on a plane in two-dimension, and each point represents a multi-dimensional feature vector of a player in a certain time period.
Constraint 1: the distance in the high-dimensional space and the distance in the low-dimensional space are ensured to be as close as possible. The distance is typically calculated as Euclidean distance, assuming, for example, two 4-dimensional points, x1(x11,x12,x13,x14),x2(x21,x22,x23,x24) Their distance is
Figure GDA0002398095170000064
The distance they project to become a two-dimensional point is a well-known way of expressing distance in ordinary times.
Constraint 2: in addition, the radar projection is considered to be circular, and the distance between a point and the center of the circle is the size of the index of suspicion degree. The projection is subject to such a radius constraint.
Consider a point of a two-dimensional plane after projection in polar coordinates:
pi=(ri·s(ki),θi)
the polar coordinate is (r)ii) The former is the radius and the latter is the central angle. The conversion into xy coordinates (Cartesian coordinates) is (r)icosθi,risinθi) But here there are more things, here s (k)i) This is because "to ensure that the distance in the high dimensional space and the distance in the low dimensional space are as close as possible", which is as close as possible, it is somewhat difficult to do so because there is also a radius constraint, so this is a perturbation term of the polar coordinates. Here, the
Figure GDA0002398095170000071
Soft is a parameter that directs the size of the disturbance, here 0.2, and can also be selected as desired.
Projection calculation mode simulating multidimensional scaling (MDS) so as to projectThe difference between the distance between the shadowed point and the original high dimensional space distance is minimized, i.e. this function is minimized: sigmaij(distij-|pi-pj||)2
Here | | | pi-pjI is also a distance, here in Euclidean distance, so expressed to distinguish the above distij. The polar coordinates are substituted and finally the above function to be minimized becomes the same.
Figure GDA0002398095170000072
Then k can be obtained by gradient descent methodiiThe value of (c) can be further plotted on a two-dimensional circular surface having a Cartesian coordinate system with coordinates (i.e., xy coordinates) of (r)is(ki)cosθi,ris(ki)sinθi)。
The distance from a point on the radar chart to the center of a circle generally represents the risk, but it is also mentioned above that, because constraint 1 and constraint 2 are satisfied simultaneously, some points may deviate from the actual position of an accurate suspicion index. It is then sometimes possible to see points (clusters of points) that are closer to the center of the circle but not exactly on the center of the circle, which can be the subject of observation.
Preferably, there are many points in the radar distribution map, each point represents a state of a player at the current time, each action frequency of the player is taken as a multi-dimensional vector, and the player is marked with a sign of whether the player is a store (0 or 1, 0 represents a normal player, and 1 represents a store);
projecting the multidimensional vectors on a two-dimensional circular surface for showing the relationship among the players and the danger degree of the players;
on the circular surface, keeping the Euclidean distance between players as far as possible to keep the distance of the original multi-dimensional vector; the distance from the player to the circle center is represented by subtracting the suspicion degree from 1 (the suspicion degree is a value between 0 and 1 and is the average value of the marker values of the players output by the model at the current time period);
preferably, the dots are round dots and are provided with transparency. The dots are translucent so that the color is darker where the dots gather more. Such a design actually uses military radar as the radar is more dangerous as it is closer to the center of the circle.
Preferably, the high-level view visualization module further comprises a thumbnail of a tree as a thumbnail of the dynamic tree map disposed between the dynamic tree map and the accuracy/recall line map. (useful when there are many decision trees); the thumbnail of the tree can be seen as a time axis, and a user can swipe different thumbnails on the upper side, so that the upper tree is limited to the range of the framed trees.
Preferably, the system also comprises a detail view visualization module, which comprises a personal panel and a grouping panel; a plurality of columns of interrelated interactive views;
the personal panel is used for showing the condition that the behavior, the action and the suspicion degree of an individual player change along with time;
the grouping panel is used to show the time-varying distribution of values of attributes of two selected groups of players (e.g., one group is a cheating player, one group is a normal player, or two selected groups in a radar profile).
In order to better show data and facilitate users to find out externally hung players, preferably, each person in the personal panel is represented by a rectangular box and an internal bar graph thereof, each person changes information along with time and is represented by a whole line, one bar of each bar graph represents the number value accumulation of various actions under a behavior, and each bar graph has 4 bars representing player behaviors and is divided into 4 types.
In order to better show the data and facilitate the user to find out the plug-in player, preferably, the bar graph of a certain color is double clicked on the personal panel, so that a detailed line graph of each action of the actions classified under the action is shown, and the change condition of each action along with the time is shown.
In order to better display data and facilitate users to find out plug-in players, preferably, in the grouping panel, each row represents an action, two groups of players change along with time, each group of players is a horizontal bar, and the interval of the maximum value and the minimum value is displayed.
The operation content of the system of the invention is as follows:
user interaction
And the focus and the context, namely, brushing and selecting on the tree thumbnail (time axis), and changing the display range of the dynamic tree. And selecting several trees by a mouse box in the dynamic tree graph, wherein the trees are transversely enlarged, and the unselected trees are transversely reduced, so that the details are conveniently observed. And the recommendation panel may then present various information for the nodes in the selected tree. After double-click in the radar distribution map, starting a local radial amplification mode; points around the mouse are enlarged in the radial direction and other points are compressed accordingly.
When mouse is hovered over a node in a certain tree, the nodes consistent with the mouse are connected by a connecting line
Clicking on the gray dot in the personal panel can collapse (collapsing into a gray band). When the bar graph is expanded, the bar graph with a certain color is double-clicked, so that the situation of all actions under the action of more detail can be shown, and the corresponding color broken line graph is used for representing the actions. All actions are designated by the id of the first action.
Searching: the top right search box may enter the id of one or more players (commas in the middle) and then be presented in the bottom left panel.
And (3) filtering: in the personal panel, clicking a bar graph with a certain color can shade the bars with other colors, and the current color is unchanged, so that an analyst can conveniently and simply compare the behavior change along with time and the relationship between people.
Dragging: in the personal panel, for comparison, the small gray point drag can be held down, and the positions can be exchanged to compare two or more players interested by the analyst.
View linkage: when the mouse is hovered over the node, the node corresponding to the right display panel is also displayed.
Amplification: when a left key clicks one node in the dynamic tree graph, corresponding individuals and grouping panels are displayed; while radar profiles may appear on the display panel.
Interacting with the model: the system supports the function of interacting with the model-pruning. Decision tree pruning is a common interaction that controls the progress of tree growth by stopping the splitting of certain nodes.
The analysis content of the system of the invention is as follows:
1. showing the dynamic evolution process of the decision tree and showing the decision flow of the decision tree, and how a person can be judged to be plug-in or non-plug-in.
2. And analyzing the evolution situation of the player behaviors and actions with different granularities.
3. There is timely feedback, prompt, and appropriate contextual information for analyst interaction. Based on this information, the user can be helped to find the cause of some patterns.
4. The analysts may interact with the model itself. The expertise of the analyst should be added to the decision analysis process for tuning the model. While also helping to analyze other patterns.
The operation process of the system of the invention comprises the following steps:
model training is entered first.
After training, a series of trees are generated, and the dynamic tree graph is arranged from left to right.
When the mouse is hovered at a tree node, the existence of the same node in different trees can be seen; by framing an area, it can be seen that the selected tree is enlarged and the surrounding trees are reduced. A row of small tree thumbnails is located below the tree, which may be provided with a flexible time selection box along the time axis to assist in selecting the tree to be presented in the dynamic tree view.
For the enlarged tree selected in the box, the right recommendation panel shows various indicators of the nodes in the tree, such as precision, recall, F1 (the calculation method is
Figure GDA0002398095170000101
) And frequency of occurrence, etc.
Clicking a node in a certain tree, the radar distribution map appears on the display panel with edges, and the player-player relationship information and the player danger degree information appearing in the node are displayed. At the same time, the individual panels and group panels are updated. The contents of the display are described above. When in use, the time axes are aligned, so that the comparison is very convenient. Meanwhile, in the radar distribution map selection area, the personal panel corresponds to the player information in the display area.
The group panel generally shows the aggregated information about the plug-ins in the individual panel and the common players, which are distributed frequently at each action. However, if the compare button is clicked in the personal panel, one more block can be framed in the radar map, and then another detailed view of the persons appears in the personal panel, which is compared with the rectangular frame in fig. 8.
The invention has the beneficial effects that:
the system for detecting the plug-in by adopting the increment decision tree shows decision processes in different time periods by utilizing the dynamic decision tree, analyzes the characteristics of the plug-in and the reason of judging the plug-in, and finds some characteristics obviously distinguished from normal players; and because of the model characteristics, some characteristics of plug-in evolution can be explored; in addition, the user can also add own knowledge to prune the decision tree and perform other analysis processes through combination of various views.
Drawings
FIG. 1 is an imaging schematic diagram of a dynamic tree view, a thumbnail of a tree, and an accuracy/recall line view in a high level view visualization module of the system of the present invention.
FIG. 2 is a schematic view of the imaging of a personal panel of the system of the present invention.
FIG. 3 is a diagram illustrating the results of a line graph of the actions shown in the double-click bar chart of FIG. 2.
FIG. 4 is a schematic diagram of imaging a grouped panel of the system of the present invention.
FIG. 5 is a schematic diagram of imaging a recommended panel of the system of the present invention.
FIG. 6 is a schematic diagram of imaging of a display panel of the system of the present invention.
FIG. 7 is an enlarged schematic view of three trees within the selected black box of FIG. 1.
FIG. 8 is a schematic diagram of the imaging of a radar profile with certain data for a display panel of the system of the present invention.
Detailed Description
The objects and effects of the present invention will become more apparent from the following detailed description of the present invention when taken in conjunction with the accompanying drawings.
As shown in fig. 1 to 7, the structure and structure of the system for detecting a plug-in by using an incremental decision tree according to the present embodiment include the following steps and contents:
step 1: visual design:
the data preprocessing module is used for cleaning original log data and extracting feature vectors;
the model generation and interaction module is used for generating and outputting a model by taking the above feature vector as the input of the model; meanwhile, the model can be adjusted by receiving the feedback of the visualization module;
a high-level view visualization module for representing a tree structure of a decision tree varying from place to place mainly by side ice column diagrams, comprising:
dynamic tree diagrams, the upper half of fig. 1, the main part, show the tree structure of the decision tree in a compact manner (icicle diagram), where each node represents a split node of the decision tree; a plurality of icicle diagrams are arranged in parallel, and the change of the decision tree along with time is reflected, as shown in fig. 7.
A recommendation and presentation panel, as shown in fig. 5, which presents information of all nodes of the selected tree(s) in the dynamic tree graph with a table that can be sorted, including accuracy, recall, precision, frequency of occurrence, and the like; as shown in fig. 6, switching to the display panel, and if a node is selected in the dynamic tree diagram, displaying the relationship, the risk degree, and the like of the players included in the node; each rectangle (i.e., node) on each tree is equivalent to an action, such as 5300040 being an acceptance task. The number with the colon followed corresponds to the above-mentioned option. If the 5300040 times are more than 6.4, the classification is carried out, and if the 5300040 times are less than 6.4, the classification is carried out.
The thumbnails of the tree, the middle part of fig. 1, can be viewed as a time axis, and the user can swipe different thumbnails above, so that the upper tree is also limited to the range of the framed trees;
the accuracy/recall line graph, the lower half of fig. 1, shows the predicted accuracy/recall of each decision tree over time, and these values are plotted;
a detailed view visualization module for multiple columns of interrelated interactive views, comprising:
the personal panel, as shown in FIG. 2, is used to show the behavior, action and suspicion of the individual players over time.
Double-clicking a bar of a certain color on the personal panel classifies the detailed each-action line graph of the actions under this behavior, showing the time-varying each action, as shown in fig. 3.
The grouping panel, as shown in FIG. 4, is used to show the time variation of the distribution of the values of the attributes of two groups of players (e.g., one group is a cheater, one group is a normal player, or two groups selected in the radar distribution map).
The raw data is stored in various different actions, one file per action per day (e.g., 1 month 1 day 2017, log), and the game has hundreds of actions. The frequency of each person's actions on different timeslices over the time period of interest is counted. Note that the time period here may be natural time (year, month, day, hour, minute, and second) or game time (what action is performed on each of the level 1 and level 2 …, and what frequency is corresponding to the action). Thus each player has a corresponding feature vector for each time slice. In addition, because the number of actions is too large, the actions are classified (task related, attribute related, battle related and article related). The classification can be specified by the user, or can be a complete inductive classification method in some existing documents.
The decision tree shows a decision flow, each non-leaf node from the root node to the next is a judgment condition, and whether a certain attribute of the node meets the condition of the non-leaf node is judged according to an incoming instance. After the instance has gone to the leaf node, the leaf node will give a label for the class to tell you which class the instance belongs to. The conventional training process of the decision tree is to recursively determine, for each leaf node of the generated decision tree, what attribute is used by the child nodes below the leaf node as the split according to some indicators (such as information gain, Gini index, and the like).
The model used here is a Hoeffding adaptive decision tree (Hoeffding adaptive tree with Gaussian discretization) using Gaussian discretization. The method is an online algorithm, and the decision tree can be trained online by utilizing the characteristics of the Hoeffding boundary, namely, the decision tree can be trained once data comes and is used in the decision tree only once; rather than requiring an entire batch of data, as in conventional decision trees, each of which is used to determine the splitting condition multiple times.
The Hoeffding boundary states that a random variable with a range of R is used, and the true mean value deviates from the estimated value after n independent observations with a probability of 1-delta not larger than that of the estimated value
Figure GDA0002398095170000121
When judging that a node needs to use the attribute for splitting, two attributes with the largest information gain and the second largest information gain are found, the difference of the information gains is calculated, and if the information gain is larger than the element, an active tree node splitting effect can be guaranteed. Such a boundary can help train the tree out when only a portion or a small amount of data is available, rather than having to wait until all data is available to train. Such a decision tree is called a Hoeffding decision tree.
On the basis of the method, some improvements are made by utilizing the prior art. Firstly, because the Hoeffding decision tree only supports the discrete value attribute, a robust incremental Gaussian discretization method is adopted, so that the continuous value attribute is supported. Secondly, the data has the characteristic of concept drift, and the concept drift refers to the condition that: data generation may not be smooth and the generation process may vary. In the game data, the behavior of the plug-in may also change, because the plug-in will perceive the own plug-in number, change some characteristics to prevent to be checked and sealed the plug-in continuously. In this regard, adaptive winwing (ADWIN) is adopted, and a window and a corresponding estimator are added to the original hoeffing decision tree, and a detector is changed, so as to discover the concept drift phenomenon mentioned above.
Thus, the whole method is called a Hoeffding adaptive decision tree using Gaussian discretization. By using the method, the decision tree can grow continuously along with the continuous inflow of data, and when some obvious changes are detected, some subtrees of the decision tree can be replaced and changed into another subtree. Thus, a tree that changes with time is obtained, and one person has a judgment result (0 normal player, 1 cheater) that it is judged to be cheating or a normal person at each time slice.
According to the above model, and at each time slice, one person has a judgment result of judging as a store or a normal person. The suspicion degree is the average of all the judgment results of the current time slice. For example, if the determination result of a person in time slices 1 to 7 is 0, 0, 0, 0, 1, 1, 0, then the suspicion degree of the person in time slice 7 is 2/7.
Tables in the recommendation tags in the recommendation and presentation panel, each column representing information, accuracy, recall, precision, frequency of occurrence, etc.; each row represents a certain node in the selected icicle diagram. The length of the gray band in the table is proportional to the relative size of its value in this column. And the columns can be sorted from high to low or from low to high by clicking the head-up of the table, namely the accuracy, the recall rate, the precision and the occurrence frequency.
Recommending and presenting a radar map in a presentation label in a panel, the radar map having a plurality of points, each point representing a player's status at a current time. Each player's various motion frequencies are taken as multidimensional vectors and are provided with a mark (0 or 1, 0 represents a normal player and 1 represents a store). These multidimensional vectors are then projected on a two-dimensional circular surface for revealing the relationship between the players, as well as the degree of danger of the players. On the circular surface, the Euclidean distance between players keeps the distance of the original multi-dimensional vector as much as possible; and the distance from the player to the center of the circle is represented by subtracting the suspicion degree from 1 (the suspicion degree is a value between 0 and 1 and is the average value of the marker values of the players output by the model at the current time period).
Such a design actually uses military radar as the radar is more dangerous as it is closer to the center of the circle. The design is used for reference of military radars.
And the detailed view visualization module is used for displaying the information of each person in a whole line along with the time variation of each person. One bar per bar represents the number value accumulation of various actions under one action, and 4 bars per bar represent player actions divided into 4 types.
Double-clicking a bar of a certain color on the personal panel classifies the detailed each-action line graph of the actions under this behavior, showing the time-varying condition of each action.
In the grouping panel, each row represents a motion, and the conditions of two groups of players change along with time, each group of players is a horizontal bar, and the left end and the right end of the bar respectively show the minimum and maximum values of the value of a certain motion of the group of players. Intervals exhibiting a maximum and a minimum.
Step 2: user interaction
And the focus and the context, namely, brushing and selecting on the tree thumbnail (time axis), and changing the display range of the dynamic tree. And selecting several trees by a mouse box in the dynamic tree graph, wherein the trees are transversely enlarged, and the unselected trees are transversely reduced, so that the details are conveniently observed. And the recommendation panel may then present various information for the nodes in the selected tree. After double-click in the radar distribution map, starting a local radial amplification mode; points around the mouse are enlarged in the radial direction and other points are compressed accordingly.
When mouse is hovered over a node in a certain tree, the nodes consistent with the mouse are connected by a connecting line
Clicking on the gray dot in the personal panel can collapse (collapsing into a gray band). When the bar graph is expanded, the bar graph with a certain color is double-clicked, so that the situation of all actions under the action of more detail can be shown, and the corresponding color broken line graph is used for representing the actions. All actions are designated by the id of the first action.
Searching: the top right search box may enter the id of one or more players (commas in the middle) and then be presented in the bottom left panel.
And (3) filtering: in the personal panel, clicking a bar graph with a certain color can shade the bars with other colors, and the current color is unchanged, so that an analyst can conveniently and simply compare the behavior change along with time and the relationship between people.
Dragging: in the personal panel, for comparison, the small gray point drag can be held down, and the positions can be exchanged to compare two or more players interested by the analyst.
View linkage: when the mouse is hovered over the node, the node corresponding to the right display panel is also displayed.
Amplification: when a left key clicks one node in the dynamic tree graph, corresponding individuals and grouping panels are displayed; while radar profiles may appear on the display panel.
Interacting with the model: the system supports the function of interacting with the model-pruning. Decision tree pruning is a common interaction that controls the progress of tree growth by stopping the splitting of certain nodes.
And step 3: analysis task
1. Showing the dynamic evolution process of the decision tree and showing the decision flow of the decision tree, and how a person can be judged to be plug-in or non-plug-in.
2. And analyzing the evolution situation of the player behaviors and actions with different granularities.
3. There is timely feedback, prompt, and appropriate contextual information for analyst interaction. Based on this information, the user can be helped to find the cause of some patterns.
4. The analysts may interact with the model itself. The expertise of the analyst should be added to the decision analysis process for tuning the model. While also helping to analyze other patterns.
Actual procedure
Model training is entered first.
After training, a series of trees are generated, and the dynamic tree graph is arranged from left to right.
When the mouse is hovered at a tree node, the existence of the same node in different trees can be seen; by framing an area, it can be seen that the selected tree is enlarged and the surrounding trees are reduced. A row of small tree thumbnails is located below the tree, which may be provided with a flexible time selection box along the time axis to assist in selecting the tree to be presented in the dynamic tree view.
For the enlarged tree selected in the box, the right recommendation panel shows various indicators of the nodes in the tree, such as precision, recall, F1 (the calculation method is
Figure GDA0002398095170000151
) And frequency of occurrence, etc.
If a node in a certain tree is clicked, a radar distribution diagram appears on the edge display panel, and one-player relationship information and one-player danger degree information of players appearing in the node are displayed. At the same time, the individual panels and group panels are updated. The contents of the display are described above. When in use, the time axes are aligned, so that the comparison is very convenient. And the details of all the layers are revealed. Meanwhile, in the area selected by the radar distribution map, the personal panel corresponds to the player information in the display area.
The group panel generally shows the aggregated information about the plug-ins in the individual panel and the common players, which are distributed frequently at each action. However, if the compare button is clicked in the personal panel, one more block can be framed in the radar map, and then another detailed view of the persons appears in the personal panel, which is compared with the rectangular frame in fig. 8.
The first method for detecting external hanging comprises the following steps:
actions are first selected on the control panel and classified into 4 categories. After the model runs, a series of trees can be seen whose structure evolves over time. The tree will grow slowly and if it is not good enough to reach a certain stage, part of the subtrees will be replaced. Some information can be obtained from above, such as possible policies for a plug-in to change its behavior pattern.
By floating the mouse on the tree nodes and seeing the red lines, some nodes can be found to appear all the time and some nodes disappear after appearing for a period of time. The nodes at the high level generally last for a long time, which shows that the nodes are better attributes which can be used for judgment, such as tasks and the like, the tasks often obtain wealth, and the obtained wealth is beneficial to the profit of plug-ins, so the plug-ins tend to accept a large number of tasks, and therefore, the judgment effect on the aspect may be obvious.
And selecting a node which disappears for a period of time, and thinking about what causes the node. Clicking on a node shows the panel on the right to reveal a radar profile, as shown in fig. 8, with two clusters of points on top (a cluster is a cluster of points clustered together). A double click may be used to initiate a locally magnified interaction and then study both clusters.
The personal panel comparison function is turned on, and the two external stores are seen to have different behavior and change mode of action along with time after starting at a certain time point. The node is originally used for distinguishing the plug-in from the normal person (only the normal person and the plug-in are used as defaults), and the distinguishing capability is reduced due to the appearance of the new plug-in. Similar conclusions can be drawn in panel groups, such as cheating and normal human coverage.
A hypothesis is made that because there are two clusters of stores, or stores with a less uniform behavior pattern, the behavior pattern of the newly incoming stores may be less uniform, and as more and more new stores are introduced, the actions that could otherwise be distinguished may be progressively disabled (with insufficient information gain), thus leading to this situation.
The second method for detecting external hanging comprises the following steps:
for the plug-ins where the clusters are all centered in the middle of the radar, it appears dark. Since these dots are transparent, they must be overlaid together to make them very dark. Illustrating that there are many players gathered.
Both from the projection results and from the actual selection data (boxed over the radar map) (bar chart of the individual panel) are highly consistent, and the joins are still relatively similar as a batch product. Similar conclusions can be obtained even if the line graph of the individual panel is observed; whereas for normal players the relative scatter distribution is relatively small.
Since the external stores are generally batch products, it is also generally reasonable that they are similar in all respects. Because mass production is required in the external hanging design for the purpose of maximizing the efficiency. Whereas, relatively speaking, normal players are relatively sporadic.

Claims (6)

1. A system for detecting a store-in using an incremental decision tree, comprising:
the data preprocessing module is used for cleaning the original data of the actions of the player and extracting the feature vectors;
the model generation and interaction module is used for generating and outputting a model by taking the characteristic vector of the data preprocessing module as the input of the model, and receiving feedback to adjust the model;
the high-level view visualization module is used for generating a dynamic tree diagram, a recommendation and display panel and an accuracy/recall rate line diagram according to the output model of the model generation and interaction module; the high level view visualization module includes a thumbnail of a tree as a thumbnail of the dynamic tree graph disposed between the dynamic tree graph and the accuracy/recall line graph;
the dynamic tree diagram shows the tree structure of the decision tree by adopting an icicle diagram, each node in the icicle diagram represents a split node of the decision tree, and a plurality of icicle diagrams are arranged in parallel to show the change of the decision tree along with time;
in the recommendation and display panel, displaying information of all nodes of the icicle graph selected in the dynamic tree graph by using a table which can be sorted in the recommendation panel; if one node is selected in the dynamic tree graph in the display panel, displaying the condition that the node contains the player in a radar distribution graph mode;
the accuracy/recall rate line graph and the dynamic tree graph are arranged up and down according to a time corresponding relation and represent the accuracy/recall rate of each decision tree prediction along with the change of time;
the radar distribution map comprises a plurality of points, each point represents the state of a player at the current time, and various action frequencies of each player serve as multidimensional vectors and are provided with a mark for judging whether the player is externally hung;
projecting the multidimensional vectors on a two-dimensional circular surface for showing the relationship among the players and the danger degree of the players;
on the circular surface, the Euclidean distance between players keeps the distance of the original multidimensional vector, and the distance from the player to the center of the circle is represented by subtracting the suspicion degree from 1.
2. The system for detecting cheating by the adoption of an incremental decision tree as claimed in claim 1, wherein said points are dots, provided with transparency.
3. The system for detecting cheating with an incremental decision tree as claimed in claim 1 further comprising a detail view visualization module comprising a personal panel and a group panel;
the personal panel is used for showing the condition that the behavior, the action and the suspicion degree of an individual player change along with time;
the grouping panel is used for showing the condition that the distribution of the values of the two selected groups of player attributes changes along with time.
4. The system for detecting cheating with the use of incremental decision trees of claim 3 wherein each person in said personal panel is represented by a rectangular box and its internal bar graph for each time period, each person's information over time is represented by a whole row, and each bar graph has a bar representing the cumulative number of actions for a behavior.
5. The system for detecting cheating using incremental decision trees according to claim 4, wherein double clicking on a bar graph of a certain color on said personal panel shows a line graph of each action classified in detail under this action, showing the time-varying condition of each action.
6. The system for cheating in the detection of cheating by an incremental decision tree as claimed in claim 3 wherein in said group panel, each row represents an action, two groups of players vary over time, each group of players is a horizontal bar, exhibiting a maximum and minimum interval.
CN201711045371.1A 2017-10-31 2017-10-31 System for detecting plug-in by adopting incremental decision tree Active CN107866072B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711045371.1A CN107866072B (en) 2017-10-31 2017-10-31 System for detecting plug-in by adopting incremental decision tree

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711045371.1A CN107866072B (en) 2017-10-31 2017-10-31 System for detecting plug-in by adopting incremental decision tree

Publications (2)

Publication Number Publication Date
CN107866072A CN107866072A (en) 2018-04-03
CN107866072B true CN107866072B (en) 2020-06-16

Family

ID=61753546

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711045371.1A Active CN107866072B (en) 2017-10-31 2017-10-31 System for detecting plug-in by adopting incremental decision tree

Country Status (1)

Country Link
CN (1) CN107866072B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108905209B (en) * 2018-06-11 2022-03-22 网易(杭州)网络有限公司 Method and system for detecting plug-in game, electronic equipment and storage medium
CN111558226B (en) * 2020-04-28 2023-04-18 腾讯科技(成都)有限公司 Method, device, equipment and storage medium for detecting abnormal operation behaviors
CN115208737B (en) * 2022-07-08 2023-07-04 电子科技大学 Communication method based on maximum distance separable code and intelligent reflecting surface

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101187959B (en) * 2006-11-17 2012-05-16 中兴通讯股份有限公司 Game cheat detection method based on decision tree
US9501540B2 (en) * 2011-11-04 2016-11-22 BigML, Inc. Interactive visualization of big data sets and models including textual data
CN106991425B (en) * 2016-01-21 2020-10-02 阿里巴巴集团控股有限公司 Method and device for detecting commodity transaction quality
CN108960514B (en) * 2016-04-27 2022-09-06 第四范式(北京)技术有限公司 Method and device for displaying prediction model and method and device for adjusting prediction model
CN107281755B (en) * 2017-07-14 2020-05-05 网易(杭州)网络有限公司 Detection model construction method and device, storage medium and terminal

Also Published As

Publication number Publication date
CN107866072A (en) 2018-04-03

Similar Documents

Publication Publication Date Title
Huang et al. Walking through the forests of the future: using data-driven virtual reality to visualize forests under climate change
CN107866072B (en) System for detecting plug-in by adopting incremental decision tree
Li et al. Digitization and visualization of greenhouse tomato plants in indoor environments
US20070094041A1 (en) Simulating user immersion in data representations
CN109410313B (en) Meteorological three-dimensional information 3D simulation inversion method
CN111861588A (en) Training method of loss prediction model, player loss reason analysis method and player loss reason analysis device
CN113326472B (en) Pattern extraction and evolution visual analysis method based on time sequence multivariable data
CN106537387B (en) Retrieval/storage image associated with event
CN108399366A (en) It is a kind of based on the remote sensing images scene classification extracting method classified pixel-by-pixel
US20180276892A1 (en) Generating immersive media visualizations for large data sets
Bone et al. Integrating high resolution remote sensing, GIS and fuzzy set theory for identifying susceptibility areas of forest insect infestations
CN111159601B (en) Social contact method for visually displaying community based on feature data
Akçapınar et al. Modeling students’ academic performance based on their interactions in an online learning environment
Seebacher et al. Visual analysis of spatio-temporal event predictions: Investigating the spread dynamics of invasive species
Simoff et al. Visual data mining: An introduction and overview
Mirkin Core Concepts in Data Analysis: Summarization, Correlation, Visualization
Mitrović et al. Patterns of emotional blogging and emergence of communities: Agent-based model on bipartite networks
Pukkala Can Kohonen networks delineate forest stands?
WO2023181245A1 (en) Model analysis device, model analysis method, and recording medium
CN110175191B (en) Modeling method for data filtering rule in data analysis
Cornforth et al. Cluster evaluation, description, and interpretation for serious games: player profiling in Minecraft
Hougaard et al. Weighted overlap dominance–a procedure for interactive selection on multidimensional interval data
Chen et al. GameLifeVis: visual analysis of behavior evolutions in multiplayer online games
Zhang Interfaces and visual analytics for visualizing spatio-temporal data with micromaps
Martins et al. Picturing agreement between clustering solutions using multidimensional unfolding: An application to greenhouse gas emissions data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant