CN107866072B

CN107866072B - A system for plug-in detection using incremental decision tree

Info

Publication number: CN107866072B
Application number: CN201711045371.1A
Authority: CN
Inventors: 陈为; 陆俊华; 巫英才
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2017-10-31
Filing date: 2017-10-31
Publication date: 2020-06-16
Anticipated expiration: 2037-10-31
Also published as: CN107866072A

Abstract

The invention discloses a system for detecting plug-in by adopting an incremental decision tree, which comprises the following steps: the data preprocessing module is used for cleaning the original data of the actions of the player and extracting the feature vectors; the model generation and interaction module is used for generating and outputting a model by taking the characteristic vector of the data preprocessing module as the input of the model, and receiving feedback to adjust the model; the high-level view visualization module is used for generating a dynamic tree diagram, a recommendation and display panel and an accuracy/recall rate line diagram according to the output model of the model generation and interaction module; the invention utilizes the dynamic decision tree to show decision processes in different time periods, analyzes the characteristics of the plug-in and the reason of being judged as the plug-in, and finds out some characteristics obviously distinguished from normal players; and because of the model characteristics, some characteristics of plug-in evolution can be explored; in addition, the user can also add own knowledge to prune the decision tree and perform other analysis processes through combination of various views.

Description

A system for plug-in detection using incremental decision tree

技术领域technical field

本发明涉及游戏外挂检测技术领域，特别涉及一种采用增量决策树进行外挂检测的系统。The invention relates to the technical field of game plug-in detection, in particular to a system for plug-in detection using an incremental decision tree.

背景技术Background technique

多人在线角色扮演类游戏创建了来自全球各个角落玩家参与的虚拟社会世界。玩家之间可以相互交互，并在培养升级他们的游戏角色时花费大量时间、精力和金钱。调查发现仅2016年，全球多人在线角色扮演游戏带来198亿美元的收入。然而，正由于这种游戏的火热，它也成为了一些网络犯罪的温床。这之中很重要的一种行为是非官方允许、批准的现金交易，这种交易行为是用真实货币来对某些虚拟物品进行交易。一些高级的游戏外挂现在专门进行这一类相关的违规行为，获取利润，但却严重影响了游戏平衡，伤害游戏运营、游戏公司收入和玩家体验、游戏经济系统乃至整个游戏的可持续发展。Multiplayer online role-playing games create virtual social worlds in which players from all corners of the globe participate. Players can interact with each other and spend a lot of time, energy and money developing and leveling up their game characters. The survey found that in 2016 alone, global multiplayer online role-playing games brought in $19.8 billion in revenue. However, because of the popularity of the game, it has also become a hotbed for some cybercrimes. One of the most important behaviors is unofficially permitted, approved cash transactions, which are transactions for certain virtual items with real money. Some advanced game plug-ins now specialize in this type of related violations and gain profits, but they have seriously affected the game balance, hurting game operations, game company revenue and player experience, the game economic system, and even the sustainable development of the entire game.

为此，游戏运营商采取了大量措施，常见的检测外挂的方法有三大种，分为客户端侧、网络侧和服务器端侧检测方法。客户端侧用类似嵌入在用户主机游戏客户端中类似杀毒软件的原理来检测外挂，然而现在一些精心制作的外挂已无法被这种探测到；网络侧需要对网络流量进行分析，这又会引起网络负载过大以及网络延迟等问题；服务器端检测方法通过对服务器上玩家的日志数据进行分析。这种方法是现在比较流行的方法，他们往往将外挂检测问题看作是异常检测问题。然而现有方法存在一些缺点，比如外挂设计愈发复杂，单纯算法输入输出对于分析人员理解困难很大；另外一方面，先进的外挂还有更新升级功能，使得外挂检测变得更难。这时候，可视分析技术的引进就显得尤为重要。To this end, game operators have taken a lot of measures. There are three common detection methods for plug-ins, which are divided into client-side, network-side and server-side detection methods. The client side uses the principle similar to the antivirus software embedded in the game client of the user console to detect plug-ins. However, some well-made plug-ins can no longer be detected by this kind of detection; the network side needs to analyze the network traffic, which will cause Problems such as excessive network load and network delay; the server-side detection method analyzes the log data of players on the server. This method is a more popular method now, and they often regard the plug-in detection problem as an anomaly detection problem. However, the existing methods have some shortcomings. For example, the plug-in design is becoming more and more complex, and it is very difficult for analysts to understand the input and output of simple algorithms. On the other hand, advanced plug-ins also have the update and upgrade function, which makes plug-in detection more difficult. At this time, the introduction of visual analysis technology is particularly important.

已经有一些关于异常检测的可视分析系统。例如Jian Zhao等人的#FluxFlow系统将Twitter用户的转发行为可视化，结合用户属性等上下文信息，来研究异常信息在Twitter上扩散的特征。Nan Cao等人的TargetVue通过一个随着时间变化赋予用户嫌疑程度的模型来帮助分析人员了解社交媒体平台上潜在的社交机器人(比如那些发送垃圾广告的机器人账户)，包括他们的交流活动、行为特征和社交互动等。这些系统通过整合人的领域知识来优化异常检测的过程，但仍然把算法模型的内部过程视作一个黑盒子，分析人员不能通过他们的系统来了解算法模型的相关信息。There are already some visual analytics systems for anomaly detection. For example, the #FluxFlow system by Jian Zhao et al. visualizes the retweeting behavior of Twitter users, combined with contextual information such as user attributes, to study the characteristics of abnormal information diffusion on Twitter. TargetVue by Nan Cao et al. helps analysts understand potential social bots (such as those spamming bot accounts) on social media platforms, including their communication activities, behavioral characteristics, through a model that assigns user suspicion levels over time and social interaction, etc. These systems optimize the process of anomaly detection by integrating human domain knowledge, but still treat the internal process of the algorithmic model as a black box, and analysts cannot understand the relevant information of the algorithmic model through their systems.

虽然现有的方法提供了丰富的上下文信息，但研究人员仍然无法深入参与到模型构建的过程。因此，希望的系统能够允许用户参与到模型构建中去以更好的帮助理解模型，对于一些现象寻找潜在的解释。Although existing methods provide rich contextual information, researchers are still not deeply involved in the model building process. Therefore, the desired system can allow users to participate in model building to better help understand the model and find potential explanations for some phenomena.

发明内容SUMMARY OF THE INVENTION

本发明提供了一种采用增量决策树进行外挂检测的系统，可以帮助分析人员检测外挂，理解外挂检测模型，发掘外挂以及正常玩家的行为或者动作随时间变化的特征，还能让分析人员与模型交互，进行交互式可视分析探索。The present invention provides a system for detecting plug-ins using an incremental decision tree, which can help analysts detect plug-ins, understand plug-in detection models, discover characteristics of plug-ins and normal players' behavior or actions over time, and allow analysts to interact with Model interaction for interactive visual analysis and exploration.

一种采用增量决策树进行外挂检测的系统，包括：A system for plug-in detection using an incremental decision tree, including:

数据预处理模块，对玩家动作的原始数据进行清洗和特征向量提取；The data preprocessing module cleans and extracts feature vectors from the original data of player actions;

模型生成和交互模块，将所述数据预处理模块的特征向量作为模型的输入来生成模型并输出，同时接收反馈对模型进行调整；The model generation and interaction module uses the feature vector of the data preprocessing module as the input of the model to generate and output the model, and simultaneously receives feedback to adjust the model;

高层视图可视化模块，主要用并排放置的冰柱图来表示随之间变化的决策树的树结构，根据模型生成和交互模块的输出模型生成动态树图、推荐和展示面板以及准确率/召回率折线图；The high-level view visualization module mainly uses side-by-side icicle diagrams to represent the tree structure of the decision tree that changes with each other, and generates dynamic tree diagrams, recommendation and display panels, and precision/recall rates based on the output model of the model generation and interaction modules. line chart;

所述动态树图将决策树的树结构用一种紧凑的方式(采用冰柱图)展现，所述冰柱图里面每个节点表示一个决策树的分裂节点，多个冰柱图并列放置以体现决策树随时间的变化；The dynamic tree diagram displays the tree structure of the decision tree in a compact way (using an icicle diagram), where each node in the icicle diagram represents a split node of a decision tree, and multiple icicle diagrams are placed side by side to form a split node. Reflect the change of the decision tree over time;

所述推荐和展示面板中，在推荐面板用可排序的表格展示在所述动态树图中选中(一棵或多棵)的冰柱图的所有节点(决策树的矩形)的信息，包括准确率、召回率、精确率和出现频次等；切换到在展示面板中若在动态树图选中一个节点，用雷达分布图的形式展示这个节点中包含玩家的情况；In the recommendation and display panel, the recommendation panel uses a sortable table to display the information of all nodes (the rectangle of the decision tree) of the selected (one or more) icicle diagram in the dynamic tree diagram, including accurate information. rate, recall rate, precision rate, frequency of occurrence, etc.; switch to the display panel, if a node is selected in the dynamic tree diagram, the situation that this node contains players is displayed in the form of a radar distribution map;

所述准确率/召回率折线图与所述动态树图按照时间对应关系上下布置，表示随着时间的变化每棵决策树预测的准确率/召回率。The accuracy/recall rate line graph and the dynamic tree graph are arranged up and down according to the time correspondence, indicating the accuracy/recall rate predicted by each decision tree over time.

玩家动作的原始数据是按照各种不同动作来存储的，每天每种动作一个文件(比如2017年1月1日，登录日志)，该游戏一共有上百种动作。统计了在感兴趣的时间段内，每个人在不同时间片上的动作的频次。注意这里的时间段既可以是自然的时间(年月日时分秒)，也可以是游戏时间(1级、2级…分别做了什么动作，对应的频次是什么)。这样每个玩家每个时间片都有一个对应的特征向量。另外由于动作数量实在太多，对动作进行了一个分类(任务相关，属性相关，战斗相关，物品相关)。这个分类既可以是用户(比如游戏分析专家)自己指定、亦可以是某些现有文献中的一些完备的归纳分类方法。The raw data of the player's actions are stored according to various actions, and each action is one file per day (for example, the log on January 1, 2017). There are hundreds of actions in the game. The frequency of each person's actions on different time slices during the time period of interest is counted. Note that the time period here can be either natural time (year, month, day, hour, minute, second), or game time (level 1, level 2...what actions are performed, and what is the corresponding frequency). In this way, each player has a corresponding feature vector for each time slice. In addition, because the number of actions is too large, the actions are classified into a category (task-related, attribute-related, combat-related, and item-related). This classification can either be specified by users (such as game analysis experts), or can be some complete inductive classification methods in some existing literature.

决策树展现了一个决策的流程，从根节点往下每个非叶节点都是一个判断的条件，根据进来的实例会判断它的某个属性是否满足这个非叶结点的条件。实例一直走到叶节点以后，叶节点会给出一个分类的标签，来告诉你这个实例属于哪一类。传统的决策树的训练过程，就是不断地递归地对已经生成的决策树的叶节点，根据某些指标(比如信息增益、Gini指数等等)来决定这个叶节点下面的子节点用什么属性来作为分裂。The decision tree shows a decision-making process. From the root node down, each non-leaf node is a judgment condition. According to the incoming instance, it will be judged whether one of its attributes satisfies the condition of the non-leaf node. After the instance has reached the leaf node, the leaf node will give a classification label to tell you which class the instance belongs to. The traditional decision tree training process is to recursively and recursively determine the attributes of the child nodes under the leaf node according to certain indicators (such as information gain, Gini index, etc.) as a split.

如图1所示，决策树是指，比如要考虑今天出不出去爬山，会有这样一个决策流程。这里每个方框都是一个节点，他下面有几个判断条件，根据不同条件作出决策。可以通过矩形来代表树的节点，不过每个节点都还有几个选项，比如湿度高或湿度正常，大风或微风，沿着树一直往下直到叶节点就有了判断结果所以不爬的路径有(天气:雨)→不爬，(天气:晴)→(湿度:大)→不爬，(天气:阴→有风:大风)→不爬。As shown in Figure 1, the decision tree refers to such a decision-making process if you want to consider whether you can go out and climb a mountain today. Each box here is a node, and there are several judgment conditions under it, and decisions are made according to different conditions. The nodes of the tree can be represented by rectangles, but each node has several options, such as high humidity or normal humidity, strong wind or light wind, and go down the tree until the leaf node has the judgment result, so there is no path to climb. Yes (weather: rain) → no climbing, (weather: sunny) → (humidity: high) → no climbing, (weather: cloudy → windy: strong wind) → no climbing.

本发明可以采用的模型是一种利用高斯离散化的Hoeffding自适应决策树(Hoeffding adaptive tree with Gaussian discretization)。这是一个在线算法，他利用Hoeffding界的特性，使得决策树可以做到在线训练，即数据来了就可以拿来训练，并且只在决策树中使用一次；而不是像传统决策树那样需要一整批数据，每个数据都要被用于判断分裂条件多次。The model that can be used in the present invention is a Hoeffding adaptive tree with Gaussian discretization using Gaussian discretization. This is an online algorithm that uses the characteristics of the Hoeffding world to enable online training of decision trees, that is, the data can be used for training as soon as it comes, and it is only used once in the decision tree; instead of the traditional decision tree that requires a The entire batch of data, each of which is used to determine the split condition multiple times.

Hoeffding界说的是，一个随机变量，范围为R，在n次独立观测后真实均值偏离其估计值以概率1-δ不会大于

当在判定一个节点要用那种属性做分裂时候，找到信息增益最大的和第二大的两个属性，计算他们信息增益的差，若大于这个∈，则能保证一个积极的树节点分裂效果。这样一个界能够帮助在只有部分或者少量数据时候，训练树出来，而不必要非得等到所有数据都来全了才做训练。这种决策树被称为Hoeffding决策树。Hoeffding defines that, for a random variable in the range R, after n independent observations, the true mean deviates from its estimated value with probability 1-δ not greater than

When determining which attribute to use for splitting a node, find the two attributes with the largest information gain and the second largest, and calculate the difference between their information gains. If it is greater than this ∈, a positive tree node splitting effect can be guaranteed . Such a bound can help the training tree come out when there is only some or a small amount of data, instead of having to wait until all the data is available before training. Such decision trees are called Hoeffding decision trees.

在这个方法基础上，本发明又利用现有技术做了一些改进。首先，由于Hoeffding决策树只支持离散值属性，采用了稳健的增量式高斯离散化方法，使得支持连续值属性。其次，数据中还存在了概念漂移的特点，所谓概念漂移是指这样一种情况：数据生成可能并不是平稳的，其生成过程可能会发生变化。对应到游戏数据中，外挂的行为也可能会发生一些变化，因为外挂公司会察觉到自己的外挂被封号以后，改变某些特点，以防止继续被查封外挂。对此，采用了ADapative WINdowing(ADWIN)的技术，这种方法在原本的Hoeffding决策树中加入了一个窗口以及对应的估计器、改变探测器等等，来发现上面提到的概念漂移现象。On the basis of this method, the present invention makes some improvements by utilizing the prior art. First, since the Hoeffding decision tree only supports discrete-valued attributes, a robust incremental Gaussian discretization method is adopted to support continuous-valued attributes. Secondly, there is also the characteristic of concept drift in the data. The so-called concept drift refers to such a situation: the data generation may not be stable, and the generation process may change. Corresponding to the game data, there may also be some changes in the behavior of the plug-in, because the plug-in company will change some characteristics after realizing that the plug-in has been banned, so as to prevent the plug-in from being blocked. In this regard, ADapative WINdowing (ADWIN) technology is adopted, which adds a window and corresponding estimator, change detector, etc. to the original Hoeffding decision tree to discover the above-mentioned concept drift phenomenon.

这样整个方法，便称之为利用高斯离散化的Hoeffding自适应决策树。对的数据利用这个方法，随着数据不断的流入，决策树会不断的生长，当探测到某些明显变化时，决策树一些子树会发生替换、变成另外一棵子树。于是得到了一棵随时间变化的树，并且在每个时间片上，一个人都会有一个判断为是外挂或者是正常人的判断结果(0正常玩家，1外挂玩家)。In this way, the whole method is called Hoeffding adaptive decision tree using Gaussian discretization. Using this method for the right data, with the continuous inflow of data, the decision tree will continue to grow. When some obvious changes are detected, some subtrees of the decision tree will be replaced and become another subtree. So a tree that changes with time is obtained, and in each time slice, a person will have a judgment result of being a cheater or a normal person (0 normal player, 1 cheater).

具体的，模型建立过程如下：Specifically, the model building process is as follows:

步骤1：初始化决策树的状态：Step 1: Initialize the state of the decision tree:

建立一个根节点，在根节点初始化一个统计量，名为A_ijk，这个统计量是ADWIN方法(在步骤2)的一部分。对每一个实例(实例(x，y)，x也就是一个时间段内某一个人的动作的特征向量，以及其是否是外挂的判定值y，0为正常人，1为外挂)，用已生成的树先测试，然后进入下面步骤2的Hoeffding自适应树生长步骤。Create a root node and initialize a statistic at the root node, named A _ijk , which is part of the ADWIN method (in step 2). For each instance (instance (x, y), x is also the feature vector of a certain person's action in a period of time, and the judgment value y of whether it is a plug-in, 0 is a normal person, 1 is a plug-in), use the The generated tree is tested first, and then enters the Hoeffding adaptive tree growth step in step 2 below.

步骤2:决策树的生长Step 2: Growth of the decision tree

首先，把上面(x，y)先沿着已有的决策树通过一条决策路径归到某一个叶子节点，更新所有这个路径经过的节点以及叶子节点的估计器A_ijk。如果当前叶子节点l有一个可替换树T_alt的话，这棵可替换树也执行对应的Hoeffding自适应树生长步骤(就是当前这一步的Hoeffding自适应树)。计算每个动作的信息增益G(当然也可以是其他决策树中常用的指标)，这一步的作用是评估叶节点现在的条件是否适合作分裂。First, return the above (x, y) to a certain leaf node along the existing decision tree through a decision path, and update all the nodes passed by this path and the estimator A _ijk of the leaf node. If the current leaf node l has a replaceable tree T _alt , this replaceable tree also executes the corresponding Hoeffding adaptive tree growth step (that is, the Hoeffding adaptive tree of the current step). Calculate the information gain G of each action (of course, it can also be an indicator commonly used in other decision trees). The function of this step is to evaluate whether the current conditions of the leaf nodes are suitable for splitting.

信息增益的计算方法是这样的:首先有一个常见的概念，称为信息熵，对于一个分布，比如有个数据集有三个类别A，B，C，每一类别占的比例为p(A)＝0.2，p(B)＝0.3，p(C)＝1-0.2-0.3＝0.5，则信息熵H＝-∑_ip_i log(p_i)。在这里，就是0.2log(0.2)+0.3log(0.3)+0.5log(0.5)＝0.301。一般来说，熵越小，越有序：比如当数据集只有一个类别时，熵为0，这是熵的最小值，表示数据集完全有序。在此基础上，还有一个条件信息熵，H(Y|X)＝∑_x∈Xp(x)H(Y|X＝x)。这里的Y便是判断为是否是外挂，这个条件熵是对划分的一个描述，X是某种动作，比如游戏中进入副本的次数。对于某种动作的划分前后的熵的差值，H(X)-H(Y|X)即为信息增益。上述的i就是一种遍历方式，i就代表A,B,C这三个类别，常见的是

若集合N＝{1,2,3,…,n}那么∑_i∈Ni就等于1+2+3+…+n。The calculation method of information gain is as follows: First, there is a common concept called information entropy. For a distribution, for example, there are three categories A, B, and C in a data set, and the proportion of each category is p(A) =0.2, p(B)=0.3, p(C)=1-0.2-0.3=0.5, then the information entropy H=-∑ _i p _i log( _pi ). Here, it is 0.2log(0.2)+0.3log(0.3)+0.5log(0.5)=0.301. Generally speaking, the smaller the entropy, the more ordered: for example, when the dataset has only one category, the entropy is 0, which is the minimum value of entropy, indicating that the dataset is completely ordered. On this basis, there is also a conditional information entropy, H(Y|X)=∑ _x∈X p(x)H(Y|X=x). Y here is to judge whether it is a plug-in. This conditional entropy is a description of the division, and X is a certain action, such as the number of times of entering the copy in the game. For the difference of entropy before and after a certain action is divided, H(X)-H(Y|X) is the information gain. The above i is a traversal method, i represents the three categories of A, B, and C. The common ones are

If the set N={1,2,3,...,n} then ∑ _i∈N i is equal to 1+2+3+...+n.

计算信息增益时，必须用到一个离散化的步骤，如果这种动作的值是连续值而不是离散值的话，这样才便于分裂。这里采用一种稳健的增量式的高斯离散化方法，运用这种方法以后，将连续值离散化，进行下面的分裂。如果信息增益最大的那个属性的信息增益，减去第二大的那个信息增益的值大于

那么对最大信息增益那个动作进行分裂，并且对每个分裂分支都初始化一个估计器。分裂采用二分的方法，即将连续值区间划分为大于分裂点和小于等于分裂点两部分。这里用到了一个Hoeffding界的概念，其含义是一个随机变量，范围为R，在n次独立观测后真实均值偏离其估计值以概率1-δ不会大于

如果变化检测器(change detector，也采用ADWIN作为变化检测器)发现了数据产生的分布有了变化，若叶子节点l没有可替换树，则在叶子节点创建一个可替换树T_alt；如果已经存在可替换树更精确，那么就把当前节点l替换为可替换树T_alt。When calculating the information gain, a discretization step must be used. If the value of this action is a continuous value instead of a discrete value, it is easy to split. A robust incremental Gaussian discretization method is used here. After using this method, the continuous values are discretized and the following splits are performed. If the information gain of the attribute with the largest information gain is greater than the information gain of the attribute with the second largest information gain subtracted

Then split the action with the largest information gain and initialize an estimator for each split branch. The splitting method adopts the bisection method, that is, the continuous value interval is divided into two parts greater than the splitting point and less than or equal to the splitting point. The concept of a Hoeffding bound is used here. Its meaning is a random variable with a range of R. After n independent observations, the true mean deviates from its estimated value with a probability of 1-δ that will not be greater than

If the change detector (also using ADWIN as the change detector) finds that the distribution of the data has changed, if the leaf node l does not have a replaceable tree, create a replaceable tree T _alt at the leaf node; if it already exists The replaceable tree is more precise, then replace the current node l with the replaceable tree T _alt .

步骤3:ADWIN方法Step 3: ADWIN Method

初始化滑动窗口W，方差，总数。Initialize sliding window W, variance, total.

a、当一个新的实例到来的时候，添加进窗口W；a. When a new instance arrives, add it to the window W;

b、对于窗口W的任意分割，W＝W0+W1，如果

不满足，则扔掉窗口里的最后一个元素，直到此式子满足为止，其中，

(n₀和n₁的调和平均数)，

和

是窗口W0和W1里的数据平均值，n₀和n₁是窗口的长度。b. For any partition of window W, W=W0+W1, if

If it is not satisfied, throw away the last element in the window until this formula is satisfied, where,

(the harmonic mean of n ₀ and n ₁ ),

and

is the average of the data in windows W0 and W1, and n ₀ and n ₁ are the lengths of the windows.

c、如果步骤b有扔掉的实例，那么作为变化检测器，告诉外部程序数据分布已经发生了变化。c. If there are discarded instances in step b, then as a change detector, tell the external program that the data distribution has changed.

根据上面的模型，并且在每个时间片上，一个人都会有一个判断为是外挂或者是正常人的判断结果。嫌疑度就用到当前时间片所有的判断结果的值的平均值。比如一个人在时间片1至7的判断结果是0，0，0，0，1，1，0，那么把他在第7时间片的嫌疑度定为2/7。According to the above model, and in each time slice, a person will have a judgment result that is judged to be a plug-in or a normal person. The degree of suspicion uses the average value of all judgment results in the current time slice. For example, a person's judgment result in time slices 1 to 7 is 0, 0, 0, 0, 1, 1, 0, then his suspicion degree in the seventh time slice is set as 2/7.

在推荐和展示面板中的推荐标签中的表格，每一列代表一种信息，准确率、召回率、精确率和出现频次等；每一行代表选中的冰柱图中的某一个节点。表格内的灰色带长度和它的值在这一列中的相对大小成比例。单击表格抬头，即准确率、召回率、精确率和出现频次，便可以将所在列进行由高到低或者由低到高的排序。In the table in the recommendation tab of the recommendation and display panel, each column represents a kind of information, such as precision rate, recall rate, precision rate and frequency of occurrence; each row represents a certain node in the selected icicle diagram. The length of the gray band in the table is proportional to the relative size of its value in this column. Click the table header, that is, precision, recall, precision, and frequency of occurrence, to sort the columns from high to low or from low to high.

雷达分布图的算法：Algorithm for radar map:

投影是一些高维的点(就是多维向量，之所以说高维是因为一般超过三维的东西就不能直接画出来了，人不能想象四维以及四维以上的东西，所以要把他转换成低维度的，画出来让人看)，想把它变成二维的在平面上画出来，每个点代表一个玩家在某时间段的多维的特征向量。Projection is some high-dimensional points (that is, multi-dimensional vectors. The reason why we say high-dimensional is because generally more than three-dimensional things cannot be drawn directly. People can't imagine four-dimensional and more than four-dimensional things, so they must be converted into low-dimensional ones. , draw it for people to see), I want to turn it into two-dimensional and draw it on the plane, and each point represents a multi-dimensional feature vector of a player in a certain time period.

约束1：要保证在高维空间的距离和在低维度空间的距离尽量接近。这里的距离一般就用欧氏距离，是这么计算的，假设举个例子有两个4维度的点，x₁(x₁₁,x₁₂,x₁₃,x₁₄),x₂(x₂₁,x₂₂,x₂₃,x₂₄)，他们距离就是

他们投影下来变成二维的点的距离则就是平时熟知的那种距离表达方式。Constraint 1: Make sure that the distance in the high-dimensional space and the distance in the low-dimensional space are as close as possible. The distance here is generally the Euclidean distance, which is calculated like this. Suppose, for example, that there are two 4-dimensional points, x ₁ (x ₁₁ ,x ₁₂ ,x ₁₃ ,x ₁₄ ),x ₂ (x ₂₁ ,x ₂₂ ,x ₂₃ ,x ₂₄ ), their distance is

The distance of their projected two-dimensional points is the well-known way of expressing distance.

约束2：另外考虑到是个雷达投影，是圆的，并且点到圆心距离是嫌疑度指数的大小。所以这个投影就多了这样的一个半径的约束条件。Constraint 2: In addition, consider that it is a radar projection, is circular, and the distance from the point to the center of the circle is the size of the suspect index. So this projection has such a radius constraint.

考虑用极坐标来表示投影以后的二维平面的点：Consider polar coordinates to represent points on the projected 2D plane:

p_i＝(r_i·s(k_i),θ_i)p _i =(r _i ·s( _{ki ),θ i} ₎

极坐标就是(r_i,θ_i)，前者是半径后者是圆心角。转换为xy坐标(笛卡尔坐标)就是(r_icosθ_i,r_isinθ_i)，但是这里多了个东西，这里的s(k_i)之所以存在是因为“要保证在高维空间的距离和在低维度空间的距离尽量接近”，这里是尽量接近，其实有点难做到因为还有个半径约束条件，所以这是极坐标的一个扰动项。这里

soft是一个参数，指导扰动的大小，这里用的0.2，也可以根据需要进行选择。The polar coordinates are (ri _i , θ _i ), the former is the radius and the latter is the central angle. Converting to xy coordinates (Cartesian coordinates) is ( _{ri cosθ i , ri sinθ i} ₎ _, _but there is something more here, the reason why s(ki _i ) exists here is because "to ensure the distance in high-dimensional space It is as close as possible to the distance in the low-dimensional space”, here is as close as possible, in fact, it is a bit difficult to achieve because there is also a radius constraint, so this is a disturbance term of polar coordinates. here

soft is a parameter that guides the size of the disturbance. The 0.2 used here can also be selected as needed.

模仿多维标度法(multidimensional scaling，MDS)的投影计算方式，使得投影完的点和点之间距离与本来高维空间距离的差异最小化，也就是最小化这个函数：∑_ij(dist_ij-|p_i-p_j||)² It imitates the projection calculation method of multidimensional scaling (MDS), so that the difference between the distance between the projected points and the original high-dimensional space distance is minimized, that is, the function is minimized: ∑ _ij (dist _ij - |p _i -p _j ||) ²

这里||p_i-p_j||也是一种距离，这里就用欧式距离，之所以这么表示是为了区分上面的dist_ij。把极坐标代进去，最后上面这个要拿来最小化的函数就变成这个样子。Here ||p _i -p _j || is also a distance, and the Euclidean distance is used here. The reason for this is to distinguish the dist _ij above. Substitute polar coordinates in, and finally the function to be minimized above becomes like this.

然后用梯度下降法就能得到k_i,θ_i的值，进而就可以在二维圆面上画出来了,其笛卡尔坐标系下坐标(即xy坐标)为(r_is(k_i)cosθ_i,r_is(k_i)sinθ_i)。Then the gradient descent method can be used to obtain the values of k _i and θ _i , and then it can be drawn on a two-dimensional circle. The coordinates (ie xy coordinates) in the Cartesian coordinate system are (r _i s( _ki ) cosθ _i ,r _i s(k _i )sinθ _i ).

雷达图上点到圆心的距离一般是表征危险度的，但是上面也说到了，由于约束1约束2要同时满足，所以就会有些点会偏离一点实际的准确的嫌疑度指数的位置。于是有时候有可能会看到一些点(一簇的点)，离圆心比较近但是又不是完全在圆心上，这些点就可以作为观察的对象。The distance from the point on the radar map to the center of the circle generally represents the risk, but as mentioned above, since Constraint 1 and Constraint 2 must be satisfied at the same time, some points will deviate from the actual and accurate position of the suspect index. So sometimes it is possible to see some points (a cluster of points) that are relatively close to the center of the circle but not completely on the center of the circle, and these points can be used as objects of observation.

优选的，所述雷达分布图中有很多的点，每个点代表一个玩家在当前时间的状态，每个玩家各种动作频率作为多维向量，并带有一个是否为外挂的标记(0或1，0表示正常玩家，1表示外挂)；Preferably, there are many points in the radar distribution diagram, each point represents the state of a player at the current time, and the various action frequencies of each player are used as multi-dimensional vectors, with a mark (0 or 1) indicating whether it is a plug-in or not. , 0 means normal player, 1 means plug-in);

将这些多维向量投影在二维的圆面上，用于展现玩家之间关系，以及玩家的危险程度；Project these multi-dimensional vectors on a two-dimensional circle to show the relationship between players and the level of danger of players;

在所述圆面上，玩家之间的欧氏距离尽量保持原来多维向量的距离；而玩家到圆心的距离用1减去嫌疑度表示(嫌疑度是0～1之间的值，为到当前时间段模型输出玩家标记值的均值)；On the circular surface, the Euclidean distance between players keeps the distance of the original multi-dimensional vector as much as possible; and the distance from the player to the center of the circle is represented by 1 minus the suspect degree (the suspect degree is a value between 0 and 1, which is the current value between 0 and 1). The time period model outputs the mean of the player's tag values);

优选的，所述点为圆点，设有透明度。所述点是半透明的，这样如果点聚集越多的地方颜色就会越深。这样的一种设计其实是借鉴了军用雷达的设计，因为雷达是越接近圆心越危险的。Preferably, the dots are circular dots with transparency. The dots are semi-transparent so that the more the dots are clustered, the darker the color will be. Such a design is actually a reference to the design of military radars, because the closer the radar is to the center of the circle, the more dangerous it is.

优选的，所述高层视图可视化模块还包括树的缩略图，作为所述动态树图的缩略图，布置在所述动态树图和所述准确率/召回率折线图之间。(针对决策树多的时候有用)；树的缩略图，可以作为时间轴来看，用户可以在上面刷选不同的缩略图，这样上方的树也会限制到这些被框选的树范围中。Preferably, the high-level view visualization module further includes a thumbnail of the tree, as the thumbnail of the dynamic tree diagram, arranged between the dynamic tree diagram and the accuracy/recall line graph. (Useful when there are many decision trees); the thumbnail of the tree can be viewed as a timeline, and the user can select different thumbnails on it, so that the upper tree will also be limited to the range of these framed trees.

优选的，还包括细节视图可视化模块，包括个人面板和分组面板；多列相互关联的交互视图；Preferably, it also includes a detail view visualization module, including a personal panel and a grouping panel; multiple columns of interrelated interactive views;

所述个人面板用于展示个体玩家行为、动作及嫌疑度随着时间变化的情况；The personal panel is used to display the changes of individual player behaviors, actions and suspicions over time;

所述分组面板用于展示选中的两组玩家(比如一组是外挂玩家，一组是正常玩家；或者是在雷达分布图中选中的两组)属性的值的分布随着时间变化的情况。The grouping panel is used to display the distribution of attribute values of the selected two groups of players (for example, one group is a plug-in player, the other group is a normal player; or the two groups selected in the radar distribution chart) change over time.

为了更好地展示数据，方便用户找出外挂玩家，优选的，所述个人面板中每个人每个时间段用一个矩形框及其内部条形图表示，每个人随着时间变化信息用一整行来表示，每个条形图的一条代表一种行为下各种动作的数量值累加，每个条形图有4条代表玩家行为在此被分为4种类型。In order to better display the data and facilitate the user to find the plug-in players, preferably, each time period of each person in the personal panel is represented by a rectangular box and its inner bar graph, and the information of each person's change over time is represented by a whole Lines to represent, one bar of each bar represents the accumulation of the number of various actions under a behavior, and each bar has 4 to represent the player behavior is divided into 4 types here.

为了更好地展示数据，方便用户找出外挂玩家，优选的，所述个人面板上双击某个颜色的条形图，则展示分类在这个行为下的动作的详细的每一种动作的折线图，展现每一种动作随时间变化的情况。In order to better display the data and facilitate the user to find the plug-in players, preferably, double-click the bar graph of a certain color on the personal panel to display the detailed line graph of each action classified under this action. , showing how each action changes over time.

为了更好地展示数据，方便用户找出外挂玩家，优选的，在所述分组面板中，每行代表一种动作、两组玩家随着时间变化的情况，每组玩家是一个横置的条形，展现最大值和最小值的区间。In order to better display the data and facilitate the user to find the plug-in players, preferably, in the grouping panel, each row represents an action, the situation of the two groups of players changing over time, and each group of players is a horizontal bar shape, showing the interval of maximum and minimum values.

本发明系统的操作内容如下：The operation content of the system of the present invention is as follows:

用户交互User interaction

焦点与上下文:在树缩略图(时间轴)上刷选，上面动态树显示范围也会随之改变。在动态树图中鼠标框选几棵树，这几棵树会被横向的扩大，而未被选中的会被横向缩小，这样方便观察细节。并且推荐面板会随之展示选中的树中的节点的各种信息。在雷达分布图中双击以后，开启局部径向放大模式；在鼠标周围的点会在径向被放大，而别的点会被对应的进行压缩.Focus and context: Swipe on the tree thumbnail (timeline), and the dynamic tree display range above will also change accordingly. Select a few trees with the mouse in the dynamic tree diagram, these trees will be expanded horizontally, and those that are not selected will be reduced horizontally, which is convenient for observing the details. And the recommendation panel will then display various information about the nodes in the selected tree. After double-clicking in the radar distribution map, the local radial magnification mode is turned on; the points around the mouse will be enlarged in the radial direction, and other points will be compressed correspondingly.

鼠标悬停在某个树中的节点上时，和它一致的节点会用连线连起来When the mouse hovers over a node in a tree, the corresponding nodes will be connected with a line

在个人面板中单击灰点可以收缩(压扁成灰色带子)。在条形图展开时，双击某种颜色的条形图，可以展现更细节的这种行为下所有动作的情况，用对应颜色折线图表示。所有动作都用开头的动作的id标明。Click the grey dot in the personal panel to shrink (squeeze into a grey band). When the bar graph is expanded, double-clicking the bar graph of a certain color can show the situation of all actions under this behavior in more detail, which is represented by the corresponding color line graph. All actions are identified with the id of the beginning action.

搜索：在右上角搜索框可以输入一个或者多个玩家的id(多个中间用逗号隔开)，然后会在左下角个人面板展现出来。Search: In the search box in the upper right corner, you can enter one or more player IDs (separated by commas), and then they will be displayed in the personal panel in the lower left corner.

过滤：在个人面板中，单击某个颜色条形图，会将其余颜色的条形灰暗化，当前的颜色不变，方便分析人员单纯比较这种行为随时间变化以及人和人之间关系。Filtering: In the personal panel, clicking a color bar will darken the bars of other colors, and the current color will remain unchanged, which is convenient for analysts to simply compare the change of this behavior over time and the relationship between people .

拖拽：个人面板中，为了方便比较，可以按住小灰点拖拽，交换位置，来比较分析人员感兴趣的两个或多个玩家。Drag and drop: In the personal panel, for the convenience of comparison, you can hold down the small gray dots and drag to swap positions to compare two or more players that analysts are interested in.

视图联动：当鼠标悬停在节点上时，右边的展示面板对应的节点也会显示。View linkage: When the mouse hovers over a node, the node corresponding to the display panel on the right will also be displayed.

放大：当左键单击动态树图中的一个节点时候，对应的个人和分组面板都会展示；同时在展示面板会出现雷达分布图。Zoom in: When you left-click a node in the dynamic tree diagram, the corresponding individual and group panels will be displayed; at the same time, the radar distribution map will appear in the display panel.

与模型交互：系统支持与模型交互的功能—剪枝。决策树剪枝是一个常见的交互，通过停止某些节点的分裂，来控制树生长的进程。Interact with the model: The system supports the function of interacting with the model - pruning. Decision tree pruning is a common interaction that controls the progress of tree growth by stopping certain nodes from splitting.

本发明系统的分析内容如下：The analysis content of the system of the present invention is as follows:

1、展现决策树的动态演变的过程，展现决策树的决策的流程，一个人会怎样被判断为外挂或者非外挂。1. Show the dynamic evolution process of the decision tree, show the decision-making process of the decision tree, and how a person will be judged as a plug-in or a non-plug.

2、分析不同粒度的玩家行为、动作的演变情况。2. Analyze the evolution of player behaviors and actions of different granularities.

3、对于分析人员交互有及时的反馈、提示，以及恰当的上下文的信息。根据这些信息，能帮助用户找到一些模式产生的原因。3. Provide timely feedback, prompts, and appropriate contextual information for analyst interactions. Based on this information, it can help users find the reasons for some patterns.

4、分析人员可以与模型本身进行交互。分析人员的专业知识应该加入到决策分析过程中去，用于调节这个模型。同时也帮助分析其他的模式。4. Analysts can interact with the model itself. The analyst's expertise should be added to the decision analysis process to tune the model. It also helps to analyze other patterns.

本发明系统的操作过程：The operation process of the system of the present invention:

首先进入模型训练。First enter the model training.

训练完以后，会产生一系列的树，在动态树图从左至右排好。After training, a series of trees will be generated, arranged from left to right in the dynamic tree diagram.

鼠标悬停在树节点，可以看到相同节点在不同树的存在；框选一块区域，可以看到被选中的树放大了，而周围树缩小了。树下面有一排小的树缩略图，可以把它当时间轴，有个可伸缩的时间选择框，帮助选中要在动态树图展示的树。Hover the mouse over a tree node, you can see the existence of the same node in different trees; box-select an area, you can see that the selected tree is enlarged, while the surrounding trees are reduced. There is a row of small tree thumbnails below the tree, which can be used as a timeline, and there is a retractable time selection box to help select the tree to be displayed in the dynamic tree map.

对于框选中被放大的树，右边推荐面板会展示树中节点的各种指标，如精确率，召回率，F1(计算方法为

)和出现频次等等。For the tree with the box selected and enlarged, the recommendation panel on the right will display various indicators of the nodes in the tree, such as precision rate, recall rate, F1 (the calculation method is

) and frequency of occurrence, etc.

单击某个树中的节点，有边的展示面板会出现雷达分布图，展现在节点中出现的玩家一个玩家关系信息和玩家危险度信息。于此同时，个人面板和分组面板也会更新。其展示内容上面已有介绍。使用时，由于时间轴是对齐的，比较起来十分方便。同时各个层次的细节都会有所展现.同时在雷达分布图选取区域，个人面板则会对应展示区域内的玩家信息。Click a node in a tree, and a radar distribution graph will appear in the edged display panel, showing the player-player relationship information and player risk information that appear in the node. At the same time, the personal and group panels are updated. Its display content has been described above. When using, since the time axis is aligned, it is very convenient to compare. At the same time, the details of each level will be displayed. At the same time, select the area in the radar distribution map, and the personal panel will correspond to the player information in the display area.

分组面板一般展示的是个人面板中外挂与普通玩家的在每个动作下频次分布的聚合信息。但若在个人面板中单击比较按钮，则可以在雷达图再框选一块，然后对应在个人面板也出现另一批人的细节视图，在这进行比较如图8中矩形框框选部分。The grouping panel generally displays the aggregated information of the frequency distribution of plug-ins and ordinary players in the personal panel under each action. However, if you click the compare button in the personal panel, you can select another piece in the radar chart, and then the detailed view of another batch of people will also appear in the personal panel, and compare the rectangular selection part in Figure 8.

本发明的有益效果：Beneficial effects of the present invention:

本发明的采用增量决策树进行外挂检测的系统，利用动态决策树展现不同时段决策过程，分析外挂的特点、被判定为外挂的原因，发现其明显的区分于正常玩家的一些特征；并且由于模型特性，能发掘外挂演变的一些特性；此外用户还可以加入自己的知识对决策树进行剪枝，以及通过各个视图结合进行其他的分析过程。The system of the present invention adopts the incremental decision tree to detect the cheating, uses the dynamic decision tree to show the decision-making process in different time periods, analyzes the characteristics of the cheating and the reason for being judged as cheating, and finds that it is obviously different from some characteristics of normal players; Model features can explore some features of plug-in evolution; in addition, users can add their own knowledge to prune the decision tree, and perform other analysis processes through the combination of various views.

附图说明Description of drawings

图1为本发明系统的高层视图可视化模块中的动态树图、树的缩略图以及准确率/召回率折线图的成像示意图。FIG. 1 is an imaging schematic diagram of a dynamic tree diagram, a thumbnail of the tree, and a line graph of precision/recall in the high-level view visualization module of the system of the present invention.

图2为本发明系统的个人面板的成像示意图。FIG. 2 is an imaging schematic diagram of the personal panel of the system of the present invention.

图3为图2双击条形图展示的动作的折线图的结果示意图。FIG. 3 is a schematic diagram showing the result of the line graph of the action displayed by double-clicking the bar graph in FIG. 2 .

图4为本发明系统的分组面板的成像示意图。FIG. 4 is an imaging schematic diagram of a grouping panel of the system of the present invention.

图5为本发明系统的推荐面板的成像示意图。FIG. 5 is an imaging schematic diagram of a recommended panel of the system of the present invention.

图6为本发明系统的展示面板的成像示意图。FIG. 6 is an imaging schematic diagram of the display panel of the system of the present invention.

图7为图1中选中的黑框内的三棵树的放大示意图。FIG. 7 is an enlarged schematic diagram of the three trees in the black box selected in FIG. 1 .

图8为本发明系统的展示面板的某一种数据情况下的雷达分布图的成像示意图。FIG. 8 is an imaging schematic diagram of a radar distribution diagram under a certain data situation of the display panel of the system of the present invention.

具体实施方式Detailed ways

下面结合附图详细描述本发明，本发明的目的和效果将变得更加明显。The present invention will be described in detail below with reference to the accompanying drawings, and the objects and effects of the present invention will become more apparent.

如图1～7所示，本实施例的采用增量决策树进行外挂检测的系统的结构以及构造包括以下步骤和内容：As shown in FIGS. 1 to 7 , the structure and structure of the system for detecting plug-ins by using an incremental decision tree in this embodiment include the following steps and contents:

步骤1：可视设计：Step 1: Visual Design:

数据预处理模块，对原始日志数据清洗和特征向量提取；Data preprocessing module, cleans original log data and extracts feature vectors;

模型生成和交互模块，将上面的特征向量作为模型的输入来生成模型并输出；同时它可以接收可视化模块的反馈对模型进行调整；The model generation and interaction module uses the above feature vector as the input of the model to generate and output the model; at the same time, it can receive feedback from the visualization module to adjust the model;

高层视图可视化模块，主要用并排放置的冰柱图来表示随之间变化的决策树的树结构，包括：The high-level view visualization module mainly uses side-by-side icicle diagrams to represent the tree structure of the decision tree that changes with each other, including:

动态树图，图1上半部分，主要部分，将决策树的树结构用一种紧凑的方式(冰柱图)展现出来，里面每个节点表示一个决策树的分裂节点；多个冰柱图并列放置，体现决策树随时间的变化，如图7所示。The dynamic tree diagram, the upper part of Figure 1, the main part, displays the tree structure of the decision tree in a compact way (icicle diagram), in which each node represents a split node of a decision tree; multiple icicle diagrams They are placed side by side to reflect the change of the decision tree over time, as shown in Figure 7.

推荐和展示面板，如图5所示，在推荐面板中用可排序的表格展示在动态树图中选中(一棵或多棵)树的所有节点的信息，包括准确率、召回率、精确率和出现频次等；如图6所示，切换到展示面板，若在动态树图选中一个节点，可以在这里展示该节点包含的玩家的之间关系及危险度等情况；每个树上的每个矩形(就是节点)相当于是一个动作，比如5300040是接受任务。其冒号后面带的数字，相当于上面说的选项。如果5300040次数大于6.4则归为一类，小于6.4归为另一类。The recommendation and display panel, as shown in Figure 5, uses a sortable table to display the information of all nodes of the selected (one or more) tree in the dynamic tree diagram, including precision, recall, and precision. and occurrence frequency, etc.; as shown in Figure 6, switch to the display panel, if you select a node in the dynamic tree diagram, you can display the relationship and risk of the players included in the node here; A rectangle (that is, a node) is equivalent to an action, for example, 5300040 is to accept a task. The numbers after the colon are equivalent to the options mentioned above. If 5300040 times is greater than 6.4, it is classified into one category, and less than 6.4 is classified into another category.

树的缩略图，图1中间部分，可以作为时间轴来看，用户可以在上面刷选不同的缩略图，这样上方的树也会限制到这些被框选的树范围中；The thumbnail of the tree, the middle part of Figure 1, can be viewed as a timeline, and the user can select different thumbnails on it, so that the upper tree will also be limited to the range of these framed trees;

准确率/召回率折线图，图1下半部分，随着时间的变化，每棵决策树预测的准确率/召回率也会随时间变化，将这些值用折线图展示；Line chart of precision/recall rate, the lower part of Figure 1, with the change of time, the precision rate/recall rate predicted by each decision tree will also change with time, and these values are displayed in a line chart;

细节视图可视化模块，多列相互关联的交互视图，包括：Detail view visualization module, multi-column interconnected interactive views, including:

个人面板，如图2所示，用于展示个体玩家行为、动作及嫌疑度随着时间变化的情况。The personal panel, shown in Figure 2, is used to display individual player behavior, actions, and suspicion over time.

在个人面板上双击某个颜色的条形图，则分类在这个行为下的动作的详细的每一种动作的折线图，展现每一种动作随时间变化的情况，如图3所示。Double-click the bar graph of a certain color on the personal panel, then the detailed line graph of each action is classified under this behavior, showing the change of each action over time, as shown in Figure 3.

分组面板，如图4所示，用于展示两组玩家(比如一组是外挂玩家，一组是正常玩家；或者是在雷达分布图中选中的两组)属性的值的分布随着时间变化的情况。The grouping panel, as shown in Figure 4, is used to display the distribution of attribute values of two groups of players (for example, one group is a plug-in player, one group is a normal player; or the two groups selected in the radar distribution chart) change over time. Case.

原始数据是按照各种不同动作来存储的，每天每种动作一个文件(比如2017年1月1日，登录日志)，该游戏一共有上百种动作。统计了在感兴趣的时间段内，每个人在不同时间片上的动作的频次。注意这里的时间段既可以是自然的时间(年月日时分秒)，也可以是游戏时间(1级、2级…分别做了什么动作，对应的频次是什么)。这样每个玩家每个时间片都有一个对应的特征向量。另外由于动作数量实在太多，对动作进行了一个分类(任务相关，属性相关，战斗相关，物品相关)。这个分类既可以是用户自己指定、亦可以是某些现有文献中的一些完备的归纳分类方法。The original data is stored according to various actions, one file for each action every day (such as the log on January 1, 2017), and there are hundreds of actions in the game. The frequency of each person's actions on different time slices during the time period of interest is counted. Note that the time period here can be either natural time (year, month, day, hour, minute, second), or game time (level 1, level 2...what actions were performed, and what is the corresponding frequency). In this way, each player has a corresponding feature vector for each time slice. In addition, because the number of actions is too large, the actions are classified into a category (task-related, attribute-related, combat-related, and item-related). This classification can be specified by the user, or it can be some complete inductive classification methods in some existing literature.

这里采用的模型是一种利用高斯离散化的Hoeffding自适应决策树(Hoeffdingadaptive tree with Gaussian discretization)。这是一个在线算法，他利用Hoeffding界的特性，使得决策树可以做到在线训练，即数据来了就可以拿来训练，并且只在决策树中使用一次；而不是像传统决策树那样需要一整批数据，每个数据都要被用于判断分裂条件多次。The model used here is a Hoeffding adaptive tree with Gaussian discretization. This is an online algorithm that uses the characteristics of the Hoeffding world to enable online training of decision trees, that is, the data can be used for training as soon as it comes, and it is only used once in the decision tree; instead of the traditional decision tree that requires a The entire batch of data, each of which is used to determine the split condition multiple times.

在这个方法基础上，又利用现有技术做了一些改进。首先，由于Hoeffding决策树只支持离散值属性，采用了稳健的增量式高斯离散化方法，使得支持连续值属性。其次，数据中还存在了概念漂移的特点，所谓概念漂移是指这样一种情况：数据生成可能并不是平稳的，其生成过程可能会发生变化。对应到游戏数据中，外挂的行为也可能会发生一些变化，因为外挂公司会察觉到自己的外挂被封号以后，改变某些特点，以防止继续被查封外挂。对此，采用了ADapative WINdowing(ADWIN)的技术，这种方法在原本的Hoeffding决策树中加入了一个窗口以及对应的估计器、改变探测器等等，来发现上面提到的概念漂移现象。On the basis of this method, some improvements are made by using the existing technology. First, since the Hoeffding decision tree only supports discrete-valued attributes, a robust incremental Gaussian discretization method is adopted to support continuous-valued attributes. Secondly, there is also the characteristic of concept drift in the data. The so-called concept drift refers to such a situation: the data generation may not be stable, and the generation process may change. Corresponding to the game data, there may also be some changes in the behavior of the plug-in, because the plug-in company will change some characteristics after realizing that the plug-in has been banned, so as to prevent the plug-in from being blocked. In this regard, ADapative WINdowing (ADWIN) technology is adopted, which adds a window and corresponding estimator, change detector, etc. to the original Hoeffding decision tree to discover the above-mentioned concept drift phenomenon.

这样整个方法，便称之为利用高斯离散化的Hoeffding自适应决策树。对数据利用这个方法，随着数据不断的流入，决策树会不断的生长，当探测到某些明显变化时，决策树一些子树会发生替换、变成另外一棵子树。于是得到了一棵随时间变化的树，并且在每个时间片上，一个人都会有一个判断为是外挂或者是正常人的判断结果(0正常玩家，1外挂玩家)。In this way, the whole method is called Hoeffding adaptive decision tree using Gaussian discretization. Using this method for data, with the continuous inflow of data, the decision tree will continue to grow. When some obvious changes are detected, some subtrees of the decision tree will be replaced and become another subtree. So a tree that changes with time is obtained, and in each time slice, a person will have a judgment result of being a cheater or a normal person (0 normal player, 1 cheater).

在推荐和展示面板中的推荐标签中的表格，每一列代表一种信息，准确率、召回率、精确率和出现频次等；每一行代表选中的冰柱图中的某一个节点。表格内的灰色带长度和它的值在这一列中的相对大小成比例。单击表格抬头，即准确率、召回率、精确率和出现频次，便可以将所在列进行由高到低或者由低到高的排序。In the table in the recommendation tab of the recommendation and display panel, each column represents a kind of information, such as precision, recall, precision and frequency of occurrence; each row represents a node in the selected icicle diagram. The length of the gray band in the table is proportional to the relative size of its value in this column. Click the table header, that is, precision, recall, precision, and frequency of occurrence, to sort the columns from high to low or from low to high.

推荐和展示面板中的展示标签中的雷达图，雷达分布图中有很多的点，每个点代表一个玩家在当前时间的状态。每个玩家各种动作频率作为多维向量，并带有一个标记(0或1，0表示正常玩家，1表示外挂)。然后将这些多维向量投影在二维的圆面上，用于展现玩家之间关系，以及玩家的危险程度。在这个圆面上，玩家之间的欧氏距离尽量保持原来多维向量的距离；而玩家到圆心的距离用1减去嫌疑度表示(嫌疑度是0～1之间的值，为到当前时间段模型输出玩家标记值的均值)。The radar map in the display tab of the recommendation and display panel, there are many points in the radar distribution map, each point represents the state of a player at the current time. Each player's various action frequencies are used as a multi-dimensional vector with a marker (0 or 1, 0 means normal player, 1 means cheating). These multidimensional vectors are then projected on a two-dimensional circular surface to show the relationship between players and how dangerous the players are. On this circle, the Euclidean distance between players keeps the distance of the original multi-dimensional vector as far as possible; and the distance from the player to the center of the circle is represented by 1 minus the suspect degree (the suspect degree is a value between 0 and 1, which is the current time The segment model outputs the mean of player tag values).

这样的一种设计其实是借鉴了军用雷达的设计，因为雷达是越接近圆心越危险的。这种设计借鉴了军用雷达的设计。Such a design is actually a reference to the design of military radars, because the closer the radar is to the center of the circle, the more dangerous it is. This design is borrowed from the design of military radars.

细节视图可视化模块中的个人面板，每个人每个时间段用一个矩形框及其内部条形图表示，每个人随着时间变化信息用一整行来表示。每个条形图的一条代表一种行为下各种动作的数量值累加，每个条形图有4条代表玩家行为在此被分为4种类型。The personal panel in the detail view visualization module, each time period is represented by a rectangular box and its inner bar chart, and the information of each person over time is represented by a whole row. One bar of each bar represents the cumulative value of various actions under one action, and 4 bars of each bar represent that the player actions are divided into 4 types here.

在个人面板上双击某个颜色的条形图，则分类在这个行为下的动作的详细的每一种动作的折线图，展现每一种动作随时间变化的情况。Double-click the bar graph of a certain color on the personal panel, and then the detailed line graph of each action is classified into the actions under this behavior, showing the change of each action over time.

在分组面板中，每行代表一种动作、两组玩家随着时间变化的情况，每组玩家是一个横置的条形，这个条形的左端右端分别展示了该组玩家某种动作的值的最小最大值。展现最大值和最小值的区间。In the grouping panel, each row represents an action and the changes of two groups of players over time. Each group of players is a horizontal bar, and the left and right ends of the bar show the value of a certain action of the group of players. the minimum and maximum values. Displays an interval of maximum and minimum values.

步骤2：用户交互Step 2: User Interaction

焦点与上下文:在树缩略图(时间轴)上刷选，上面动态树显示范围也会随之改变。在动态树图中鼠标框选几棵树，这几棵树会被横向的扩大，而未被选中的会被横向缩小，这样方便观察细节。并且推荐面板会随之展示选中的树中的节点的各种信息。在雷达分布图中双击以后，开启局部径向放大模式；在鼠标周围的点会在径向被放大，而别的点会被对应的进行压缩。Focus and context: Swipe on the tree thumbnail (timeline), and the dynamic tree display range above will also change accordingly. Select a few trees with the mouse in the dynamic tree diagram, these trees will be expanded horizontally, and those that are not selected will be reduced horizontally, which is convenient for observing the details. And the recommendation panel will then display various information about the nodes in the selected tree. After double-clicking in the radar distribution map, the local radial magnification mode is turned on; the points around the mouse will be enlarged in the radial direction, and other points will be compressed correspondingly.

放大：当左键单击动态树图中的一个节点时候，对应的个人和分组面板都会展示；同时在展示面板会出现雷达分布图。Zoom in: When you left-click a node in the dynamic tree diagram, the corresponding individual and group panels will be displayed; at the same time, the radar distribution diagram will appear in the display panel.

步骤3：分析任务Step 3: Analyze the task

实际操作过程Actual operation process

首先进入模型训练。First enter the model training.

) and frequency of occurrence, etc.

如果单击某个树中的节点，有边的展示面板会出现雷达分布图，展现在节点中出现的玩家一个玩家关系信息和玩家危险度信息。于此同时，个人面板和分组面板也会更新。其展示内容上面已有介绍。使用时，由于时间轴是对齐的，比较起来十分方便。同时各个层次的细节都会有所展现。同时在雷达分布图选取区域，个人面板则会对应展示区域内的玩家信息。If you click on a node in a tree, a radar distribution graph will appear in the edged display panel, showing the player-player relationship information and player risk information that appear in the node. At the same time, the personal and group panels are updated. Its display content has been described above. When using, since the time axis is aligned, it is very convenient to compare. At the same time, every level of detail will be displayed. At the same time, select the area in the radar distribution map, and the personal panel will correspond to the player information in the display area.

第一种检测外挂的方法：The first method to detect plug-ins:

首先在控制面板把动作都选上，并且这些动作分为了4类。模型运行后，可以看到一系列的树，这些树的结构在随着时间演化。树会慢慢生长，如果到了一定阶段效果不好了会有部分的子树被替换。能从上面获取一些信息，诸如外挂改变其行为模式的可能策略。First, select the actions in the control panel, and these actions are divided into 4 categories. After running the model, you can see a series of trees whose structure evolves over time. The tree will grow slowly, and if the effect is not good at a certain stage, some subtrees will be replaced. Some information can be obtained from the above, such as possible strategies of the plug-in to change its behavior mode.

通过鼠标悬浮在树节点上看那些红线，可以发现有些节点是从头到尾一直出现的，有些是只出现一段时间就消失的。高层的节点一般会持续很久，说明这几个是比较好的可用于判别的属性，比如接受任务等，任务往往是获得财富的，获得财富有助于外挂的盈利所以外挂倾向于大量接受任务，所以这方面可能会有比较明显的判别效果。By hovering the mouse over the tree nodes and looking at those red lines, you can find that some nodes appear from beginning to end, and some disappear after only appearing for a period of time. The high-level nodes generally last for a long time, indicating that these are relatively good attributes that can be used for discrimination, such as accepting tasks, etc. The tasks are often to gain wealth, and gaining wealth helps the plug-in to make profits, so the plug-in tends to accept a large number of tasks. So this aspect may have a more obvious discriminative effect.

挑选一个节点，那种持续一段消失的，想下是什么原因导致的。单击节点会在右边展示面板展现雷达分布图，如图8所示，看到上面有两簇点，(所谓一簇就是聚集在一团的点)。可以双击开启局部放大的交互，然后对这两簇点进行研究。Pick a node, the one that disappears for a period of time, and think about what caused it. Clicking on the node will display the radar distribution map on the right panel, as shown in Figure 8, and you can see that there are two clusters of points on it (the so-called cluster is the points gathered in a group). You can double-click to open the local zoom interaction, and then study the two clusters of points.

打开个人面板对比功能，看看这两批外挂在某个时间点开始以后，会有不太一样的行为和动作随时间的变化模式。在这个节点上本来是区分外挂和正常人的(默认只有正常人和外挂)，由于新外挂出现，这种区分能力下降。在小组面板可以得到类似结论，如外挂和正常人区间覆盖了。Open the personal panel comparison function to see if the two batches of plug-ins start at a certain point in time, there will be different behaviors and changes in action patterns over time. On this node, it was originally used to distinguish between plug-ins and normal people (by default, there are only normal people and plug-ins). Due to the emergence of new plug-ins, this ability to distinguish is reduced. Similar conclusions can be obtained in the group panel, such as plug-in and normal coverage.

做出这样的一个猜想:因为有两簇外挂，或者说有不太相同行为模式的外挂，新入局的外挂动作模式可能不太一样，随着新入局外挂越来越多，原本可以用来区分的动作可能会渐渐失效(信息增益不够了)，所以导致这种情况。Make such a conjecture: because there are two clusters of cheaters, or cheaters with different behavior patterns, the action patterns of newly entered cheaters may not be the same. As more and more cheaters enter the game, they can be used to distinguish The action may gradually become invalid (not enough information gain), so this situation is caused.

第二种检测外挂的方法：The second way to detect plug-ins:

对于那群都集中在雷达中间的外挂，看起来颜色很深。由于这些点都是透明度的，所以必须是叠在一起才会颜色很深。说明这里聚集了不少玩家。For the plug-ins that are concentrated in the middle of the radar, it looks very dark. Since these points are all transparent, they must be stacked together to get a deep color. It means that there are many players gathered here.

不管是从投影结果来看，还是从实际选择数据(在雷达图上面框选)来看(个人面板的柱形图)都是高度一致的，外挂作为批量产物仍然比较近似。就算是观察个人面板的折线图也能得到类似的结论；而对于正常玩家相对就比较零散的分布了。Whether it is from the projection results or from the actual selection data (box selection on the radar chart) (the column chart of the personal panel), it is highly consistent, and the plug-in is still relatively similar as a batch product. Even looking at the line graph of the personal panel can get a similar conclusion; for normal players, the distribution is relatively scattered.

这里原因是外挂一般是批量产物，所以各方面比较相似也是合乎常理的。因为为了追求效益最大化，外挂设计时候会批量生产。而相对来说，正常玩家相对就比较零散的分布了。The reason here is that plug-ins are generally batch products, so it is reasonable to be similar in all aspects. Because in order to maximize the benefits, the plug-in design will be mass-produced. Relatively speaking, normal players are relatively scattered.

Claims

1. a system that adopts incremental decision tree to carry out plug-in detection, is characterized in that, comprises:

The data preprocessing module cleans and extracts feature vectors from the original data of player actions;

The model generation and interaction module uses the feature vector of the data preprocessing module as the input of the model to generate and output the model, and simultaneously receives feedback to adjust the model;

The high-level view visualization module generates a dynamic tree diagram, a recommendation and display panel, and a line chart of accuracy/recall rate according to the output model of the model generation and interaction module; the high-level view visualization module includes a thumbnail of the tree as the dynamic tree diagram The thumbnails are arranged between the dynamic tree diagram and the precision/recall line graph;

The dynamic tree diagram displays the tree structure of the decision tree as an icicle diagram, where each node in the icicle diagram represents a split node of a decision tree, and multiple icicle diagrams are placed side by side to reflect the change of the decision tree over time;

In the recommendation and display panel, the information of all nodes of the selected icicle diagram in the dynamic tree diagram is displayed in the recommendation panel in a sortable table; in the display panel, if a node is selected in the dynamic tree diagram, the information of all nodes in the dynamic tree diagram is displayed by radar. In the form of a distribution graph, it shows the situation that this node contains players;

The accuracy/recall rate line graph and the dynamic tree graph are arranged up and down according to the time correspondence, indicating the accuracy/recall rate predicted by each decision tree over time;

There are many points in the radar distribution map, each point represents the state of a player at the current time, and the various action frequencies of each player are used as multi-dimensional vectors, with a mark of whether it is a plug-in;

Project these multi-dimensional vectors on a two-dimensional circle to show the relationship between players and the level of danger of players;

On the circular surface, the Euclidean distance between players maintains the distance of the original multi-dimensional vector, and the distance from the player to the center of the circle is represented by 1 minus the suspect degree.

2 . The system for plug-in detection using an incremental decision tree according to claim 1 , wherein the points are dots with transparency. 3 .

3. the system that adopts incremental decision tree to carry out plug-in detection as claimed in claim 1, is characterized in that, also comprises detail view visualization module, comprises individual panel and grouping panel;

The personal panel is used to display the changes of individual player behaviors, actions and suspicions over time;

The grouping panel is used to display the distribution of the values of the attributes of the selected two groups of players over time.

4. The system for plug-in detection using incremental decision tree as claimed in claim 3, wherein each time period of each person in the personal panel is represented by a rectangular box and its inner bar graph, and each person is represented by a rectangular frame and an internal bar graph thereof. The time-varying information is represented by a whole line, and one bar of each bar represents the accumulation of the quantitative values of various actions under a behavior.

5. the system that adopts incremental decision tree to carry out plug-in detection as claimed in claim 4, is characterized in that, double-clicks the bar graph of a certain color on the described personal panel, then shows the detailed each classification under this behavior. A line graph of each action, showing how each action changes over time.

6. the system that adopts incremental decision tree to carry out plug-in detection as claimed in claim 3, is characterized in that, in described grouping panel, each row represents a kind of action, the situation that two groups of players change over time, and each group The player is a horizontal bar showing the range of maximum and minimum values.