CN111586001A

CN111586001A - Abnormal user identification method, device, electronic device and storage medium

Info

Publication number: CN111586001A
Application number: CN202010351557.5A
Authority: CN
Inventors: 王浩然; 邵传贤
Original assignee: China Mobile Communications Group Co Ltd; MIGU Culture Technology Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; MIGU Culture Technology Co Ltd
Priority date: 2020-04-28
Filing date: 2020-04-28
Publication date: 2020-08-25
Anticipated expiration: 2040-04-28
Also published as: CN111586001B

Abstract

Embodiments of the present invention provide an abnormal user identification method, device, electronic device, and storage medium; the method includes: an initial grouping step, determining a central user from target users according to the similarity between target users, and The target users are initially grouped according to the similarity between users other than the central user; the grouping adjustment step is to re-determine the central user according to the similarity between the users in the initial grouping, and according to the re-determined central user and the target The similarity between the users except the re-determined central user is to regroup the target users; the grouping step is determined, and the grouping adjustment step is repeated until the users included in each group no longer change; the abnormal user identification step, according to The number of known abnormal users included in each group determines the abnormal group, and the users in the abnormal group are determined as abnormal users.

Description

Abnormal user identification method, device, electronic device and storage medium

技术领域technical field

本发明涉及网络安全领域，尤其涉及一种异常用户识别方法、装置、电子设备及存储介质。The invention relates to the field of network security, and in particular, to an abnormal user identification method, device, electronic device and storage medium.

背景技术Background technique

异常登录是指与用户日常习惯存在明显不同的登录行为。由于异常登录是网络入侵行为的一种常见现象，具有异常登录行为的用户很有可能是网络入侵行为的实施者，因此对异常登录用户的识别在网络安全领域具有重要的意义。Abnormal login refers to the login behavior that is significantly different from the user's daily habits. Since abnormal login is a common phenomenon of network intrusion behavior, users with abnormal login behavior are likely to be the perpetrators of network intrusion behavior, so the identification of abnormal login users is of great significance in the field of network security.

在现有技术中，通常是根据用户的登录次数、登录时使用的IP地址以及登录时使用的设备来发现异常登录用户。但当网络攻击者使用分散的IP或模拟设备来实施登录行为时，现有技术中的异常登录用户识别方法将难以发现异常用户。In the prior art, an abnormally logged-in user is usually found according to the number of times the user has logged in, the IP address used for logging in, and the device used for logging in. However, when network attackers use scattered IPs or simulated devices to implement login behaviors, it is difficult for the abnormal login user identification methods in the prior art to find abnormal users.

综上所述，现有技术中的异常登录用户识别方法很难发现隐藏的异常用户，发现异常用户的效率较低。To sum up, the method for identifying abnormal logged-in users in the prior art is difficult to find hidden abnormal users, and the efficiency of finding abnormal users is low.

发明内容SUMMARY OF THE INVENTION

本发明实施例提供一种异常用户识别方法、装置、电子设备及存储介质，用以解决现有技术中的异常登录用户识别方法很难发现隐藏的异常用户，发现异常用户的效率较低的缺陷。Embodiments of the present invention provide an abnormal user identification method, device, electronic device, and storage medium, which are used to solve the problem that the abnormal login user identification method in the prior art is difficult to find hidden abnormal users, and the efficiency of finding abnormal users is low. .

本发明第一方面实施例提供一种异常用户识别方法，包括：The embodiment of the first aspect of the present invention provides a method for identifying abnormal users, including:

初始分组步骤，根据目标用户之间的相似度从所述目标用户中确定中心用户，并根据所述中心用户与所述目标用户中除所述中心用户之外的用户之间的相似度对所述目标用户进行初始分组；In the initial grouping step, the central user is determined from the target users according to the similarity between the target users, and all the target users are classified according to the similarity between the central user and the target users except the central user. The target users are initially grouped;

分组调整步骤，根据所述初始分组内的用户之间的相似度，重新确定所述中心用户，并根据重新确定的所述中心用户与所述目标用户中除重新确定的所述中心用户之外的用户之间的相似度对所述目标用户进行再次分组；The grouping adjustment step is to re-determine the central user according to the similarity between the users in the initial group, and according to the re-determined central user and the target user except the re-determined central user The target users are grouped again according to the similarity between the users;

确定分组步骤，重复执行所述分组调整步骤直至各组所包含的用户不再变化；Determine the grouping step, and repeat the grouping adjustment step until the users included in each group no longer change;

异常用户识别步骤，根据各组包含的已知异常用户的数量确定异常分组，并将异常分组内的用户确定为异常用户。In the abnormal user identification step, an abnormal group is determined according to the number of known abnormal users included in each group, and the users in the abnormal group are determined as abnormal users.

上述技术方案中，所述目标用户之间的相似度为所述目标用户中的任一用户与所述目标用户中除所述任一用户之外的所有用户之间的相似度之和。In the above technical solution, the similarity between the target users is the sum of the similarity between any one of the target users and all the target users except the one user.

上述技术方案中，在所述初始分组步骤之前还包括：In the above technical solution, before the initial grouping step, it also includes:

计算所述目标用户中的任一用户与所述目标用户中除所述任一用户之外的一个用户之间的相似度，所述相似度包括以下相似度中的一种或多种：在时间维度上的相似度、在用户登录平台上的相似度和在用户登录设备上的相似度；Calculate the similarity between any one of the target users and one of the target users other than the one user, where the similarity includes one or more of the following similarities: Similarity in time dimension, similarity on user login platform and similarity on user login device;

为所述目标用户中的任一用户与所述目标用户中除所述任一用户之外的所有用户之间的相似度求和，得到所述目标用户中的任一用户与所述目标用户中除所述任一用户之外的所有用户之间的相似度之和。Sum the similarity between any one of the target users and all the target users except the one user, and obtain any one of the target users and the target user The sum of the similarity between all users except the one user in .

上述技术方案中，所述根据目标用户之间的相似度从所述目标用户中确定中心用户包括：In the above technical solution, determining the central user from the target users according to the similarity between the target users includes:

根据所述目标用户之间的相似度最大值，确定第一用户；Determine the first user according to the maximum similarity between the target users;

将所述第一用户确定为中心用户；determining the first user as a central user;

根据所述目标用户中的非中心用户与所述中心用户之间的相似度最小值以及预设的第一阈值确定第二用户；Determine the second user according to the minimum similarity between the non-central users among the target users and the central users and a preset first threshold;

将所述第二用户确定为新的所述中心用户，并返回根据所述目标用户中的非中心用户与所述中心用户之间的相似度最小值以及预设的第一阈值确定第二用户的步骤重复执行，直至所述中心用户的数量达到预设数量阈值。Determine the second user as the new central user, and return to determine the second user according to the minimum similarity between the non-central user among the target users and the central user and a preset first threshold The steps are repeated until the number of the central users reaches a preset number threshold.

上述技术方案中，所述根据所述目标用户的非中心用户与所述中心用户之间的相似度最小值以及预设的第一阈值确定第二用户包括：In the above technical solution, determining the second user according to the minimum similarity between the non-central users of the target user and the central user and a preset first threshold includes:

计算所述目标用户的非中心用户与所述中心用户之间的相似度最小值；Calculate the minimum similarity between the non-central users of the target user and the central users;

当所述目标用户中的前n个非中心用户与所述中心用户之间的相似度最小值之和小于第一阈值，且所述目标用户中的前n+1个非中心用户与所述中心用户之间的相似度最小值之和不小于所述第一阈值时，将所述第n+1个非中心用户确定为所述第二用户；When the sum of the minimum similarity between the top n non-central users in the target users and the central user is less than the first threshold, and the top n+1 non-central users in the target users and the central user When the sum of the minimum similarity between central users is not less than the first threshold, the n+1th non-central user is determined as the second user;

其中，所述第一阈值是0至第一相似度之和之间的随机值，所述第一相似度之和是所述目标用户中所有非中心用户与所述中心用户之间的相似度最小值之和；n为自然数。Wherein, the first threshold is a random value between 0 and the sum of the first similarity, and the sum of the first similarity is the similarity between all non-central users in the target users and the central user Sum of minimum values; n is a natural number.

上述技术方案中，所述计算所述目标用户中的任一用户与所述目标用户中除所述任一用户之外的一个用户之间的相似度包括：In the above technical solution, the calculating the similarity between any one of the target users and one user other than the one user in the target users includes:

根据第三用户的登录记录以及第四用户的登录记录，计算所述第三用户与所述第四用户之间的差异度；Calculate the degree of difference between the third user and the fourth user according to the login record of the third user and the login record of the fourth user;

其中，所述第三用户为所述目标用户中的任一用户，所述第四用户为所述目标用户中除所述任一用户之外的一个用户；Wherein, the third user is any user among the target users, and the fourth user is a user other than the any user among the target users;

根据所述第三用户与所述第四用户之间的差异度，计算所述第三用户与所述第四用户之间的相似度。Calculate the similarity between the third user and the fourth user according to the difference between the third user and the fourth user.

上述技术方案中，所述根据第三用户的登录记录以及第四用户的登录记录，计算所述第三用户与所述第四用户之间的差异度采用如下公式：In the above technical solution, the following formula is used to calculate the degree of difference between the third user and the fourth user according to the login record of the third user and the login record of the fourth user:

其中，d(i,j)表示所述第三用户与所述第四用户之间的差异度；Wherein, d(i,j) represents the degree of difference between the third user and the fourth user;

当所述相似度包括在时间维度上的相似度时，参数H_i表示第三用户i在第一时间段内登录的记录；H_j表示第四用户j在第一时间段内登录的记录；When the similarity includes the similarity in the time dimension, the parameter H _i represents the records logged in by the third user i in the first time period; H _j represents the records logged in by the fourth user j in the first time period;

当所述相似度包括在用户登录平台上的相似度时，参数H_i表示第三用户i在第一平台上登录的记录；H_j表示第四用户j在第一平台上登录的记录；When the similarity includes the similarity on the user login platform, the parameter H _i represents the record that the third user i logs in on the first platform; H _j represents the record that the fourth user j logs in on the first platform;

当所述相似度包括在用户登录设备上的相似度时，参数H_i表示第三用户i在第一设备上登录的记录；H_j表示第四用户j在第一设备上登录的记录；When the similarity includes the similarity on the user's login device, the parameter H _i represents the record of the third user i's login on the first device; H _j represents the record of the fourth user j's login on the first device;

所述根据所述第三用户与所述第四用户之间的差异度，计算所述第三用户与所述第四用户之间的相似度采用如下公式：The following formula is used to calculate the similarity between the third user and the fourth user according to the difference between the third user and the fourth user:

其中，sim(i,j)表示所述第三用户与所述第四用户之间的相似度。Wherein, sim(i,j) represents the similarity between the third user and the fourth user.

本发明第二方面实施例提供一种异常用户识别装置，包括：The embodiment of the second aspect of the present invention provides an abnormal user identification device, including:

初步分组模块，用于根据目标用户之间的相似度从所述目标用户中确定中心用户，并根据所述中心用户与所述目标用户中除所述中心用户之外的用户之间的相似度对所述目标用户进行初始分组；A preliminary grouping module, configured to determine a central user from the target users according to the similarity between the target users, and according to the similarity between the central user and the target users other than the central user initial grouping of the target users;

分组调整模块，用于根据所述初始分组内的用户之间的相似度，重新确定所述中心用户，并根据重新确定的所述中心用户与所述目标用户中除重新确定的所述中心用户之外的用户之间的相似度对所述目标用户进行再次分组；A grouping adjustment module, configured to re-determine the central user according to the similarity between the users in the initial group, and divide the re-determined central user according to the re-determined central user and the target user The target users are grouped again by the similarity between other users;

确定分组模块，用于重复执行所述分组调整步骤直至各组所包含的用户不再变化；determining a grouping module for repeating the grouping adjustment step until the users included in each group no longer change;

异常用户识别模块，用于根据各组包含的已知异常用户的数量确定异常分组，并将异常分组内的用户确定为异常用户。The abnormal user identification module is used to determine abnormal groups according to the number of known abnormal users included in each group, and determine the users in the abnormal groups as abnormal users.

本发明第三方面实施例提供一种电子设备，包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序，所述处理器执行所述程序时实现如第一方面实施例所述异常用户识别方法的步骤。An embodiment of a third aspect of the present invention provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor executes the program to implement the embodiment of the first aspect The steps of the abnormal user identification method.

本发明第四方面实施例提供一种非暂态计算机可读存储介质，其上存储有计算机程序，该计算机程序被处理器执行时实现如第一方面实施例所述异常用户识别方法的步骤。A fourth aspect of the present invention provides a non-transitory computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements the steps of the abnormal user identification method described in the first aspect.

本发明实施例提供的异常用户识别方法、装置、电子设备及存储介质，通过用户之间的相似度来实现用户的聚类，基于异常用户具有相似的行为轨迹这一特点，利用已发现的异常用户找出异常用户所在的分组，从而发现更多隐藏的异常用户，具有识别效率高、识别能力强的优点。The abnormal user identification method, device, electronic device, and storage medium provided by the embodiments of the present invention realize user clustering through the similarity between users. The user finds out the group in which the abnormal user belongs, so as to discover more hidden abnormal users, which has the advantages of high identification efficiency and strong identification ability.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍，显而易见地，下面描述中的附图是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description These are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained according to these drawings without creative efforts.

图1为本发明实施例提供的异常用户识别方法的流程图；1 is a flowchart of a method for identifying abnormal users provided by an embodiment of the present invention;

图2为本发明实施例提供的异常用户识别装置的结构图；2 is a structural diagram of an abnormal user identification device provided by an embodiment of the present invention;

图3示例了一种电子设备的实体结构示意图。FIG. 3 illustrates a schematic diagram of the physical structure of an electronic device.

具体实施方式Detailed ways

为使本发明实施例的目的、技术方案和优点更加清楚，下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。In order to make the purposes, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments These are some embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

图1为本发明实施例提供的异常用户识别方法的流程图，如图1所示，本发明实施例提供的异常用户识别方法包括：FIG. 1 is a flowchart of a method for identifying an abnormal user provided by an embodiment of the present invention. As shown in FIG. 1 , the method for identifying an abnormal user provided by an embodiment of the present invention includes:

步骤101、根据目标用户之间的相似度从所述目标用户中确定中心用户，并根据所述中心用户与所述目标用户中除所述中心用户之外的用户之间的相似度对所述目标用户进行初始分组。Step 101: Determine a central user from the target users according to the similarity between the target users, and classify the central user according to the similarity between the central user and the target users other than the central user. Target users are initially grouped.

本发明实施例提供的异常用户识别方法的基本思想是基于异常用户具有相似的行为轨迹，从而利用已发现的异常用户去发现更多隐藏的异常用户。基于这一思想，首先采集多个目标用户的数据。所述多个目标用户中的某些用户已经被识别为异常用户，例如采用现有技术中的基于用户的登录次数或登录时使用的IP地址或登录时使用的设备来识别异常用户的方法。但这批目标用户中的多数用户未能识别出它们的身份，有可能是正常用户，也有可能是隐藏的异常用户。The basic idea of the method for identifying abnormal users provided by the embodiments of the present invention is based on the fact that abnormal users have similar behavioral trajectories, so that the abnormal users that have been discovered are used to discover more hidden abnormal users. Based on this idea, the data of multiple target users are first collected. Some of the multiple target users have been identified as abnormal users, for example, using the prior art method for identifying abnormal users based on the number of user logins, the IP address used for login, or the device used for login. However, most of these target users failed to identify their identities. They may be normal users or hidden abnormal users.

在计算目标用户的相似度之前，首先要采集目标用户的数据。所采集的多个目标用户的数据包括目标用户的登录日志。登录日志中一般包含有目标用户大量的行为记录，如目标用户的登录时间、目标用户的登录平台、目标用户登录时所使用的设备。目标用户的数据还可包括目标用户的身份标注信息，如已被识别出的异常用户，可在它们的身份标注信息中标注出。Before calculating the similarity of target users, the data of target users should be collected first. The collected data of the multiple target users includes the login logs of the target users. The login log generally contains a large number of behavior records of the target user, such as the login time of the target user, the login platform of the target user, and the device used by the target user when logging in. The data of the target user may also include the identity annotation information of the target user. For example, abnormal users who have been identified may be marked in their identity annotation information.

这些目标用户可形成一个用户集合。该集合中的任意一个目标用户均可计算与集合中其他单个目标用户之间的相似度，基于一个目标用户与其他单个目标用户之间的相似度，可进一步计算出一个用户与其他多个目标用户之间的相似度指标。所述相似度指标反映了一个用户与其他多个目标用户之间的总体相似关系。在本发明实施例中，所述相似度指标为相似度之和。在本发明其他实施例中，相似度指标还可以是相似度的平方和等其他表达方式。These target users can form a user set. Any target user in the set can calculate the similarity with other single target users in the set. Similarity metric between users. The similarity index reflects the overall similarity relationship between one user and multiple other target users. In this embodiment of the present invention, the similarity index is the sum of the similarities. In other embodiments of the present invention, the similarity index may also be other expressions such as the sum of squares of the similarity.

在本发明实施例中，将一个目标用户与集合中其他单个目标用户的相似度相加，也就得到了这一目标用户与集合中除该目标用户之外的其他所有目标用户的相似度之和。在本发明实施例中，不对用户之间的相似度计算方法进行限定，可采用本领域技术人员公知的相似度计算方法，也可采用本发明另一实施例中所描述的相似度计算方法。在本发明实施例中，也不对相似度计算的时间点进行限定，相似度计算可以在本步骤执行前预先完成，也可以是本步骤执行过程中实时完成。In the embodiment of the present invention, the similarity between a target user and other single target users in the set is added, and the similarity between the target user and all other target users in the set except the target user is obtained. and. In this embodiment of the present invention, the similarity calculation method between users is not limited, and a similarity calculation method known to those skilled in the art may be used, or a similarity calculation method described in another embodiment of the present invention may be used. In this embodiment of the present invention, the time point of similarity calculation is also not limited, and similarity calculation may be completed in advance before the execution of this step, or may be completed in real time during the execution of this step.

基于多个目标用户中的任一目标用户与其他所有目标用户之间的相似度之和，可从多个目标用户中确定多个分组的中心用户。Based on the sum of the degrees of similarity between any one of the multiple target users and all other target users, multiple grouped central users may be determined from the multiple target users.

所述多个分组的数量可根据实际情况确定，如目标用户的数量等。The number of the multiple groups may be determined according to the actual situation, such as the number of target users.

在确定多个分组的中心用户时，首先选取所有目标用户中与其他目标用户相似度和最大的目标用户作为第一分组的中心用户C1。所得到的C1可添加到分组中心集合中。When determining the central users of multiple groups, first select the target user with the highest similarity with other target users among all the target users as the central user C1 of the first group. The resulting C1 can be added to the grouping center set.

然后选取与C1相似度较大的目标用户作为另外一个分组的中心C2。在选取C2时，如果选取与C1相似度最大的目标用户，则有可能会因为离群用户产生噪声影响，因此选取与C1相似度较大的用户作为C2。此处的相似度较大的用户的选取过程为：取一个随机值，该随机值大于0且小于中心用户C1与其他所有目标用户之间的相似度之和；在计算出所有目标用户中除C1外的各个单个目标用户与C1之间的相似度之后，把这些相似度相加，在相加过程中，当前n个目标用户的相似度之和小于前述的随机值，而第n+1个目标用户的相似度之和大于前述的随机值，那么这第n+1个目标用户就是C2。所得到的C2可添加到分组中心集合中。其中，n为正整数。Then, the target user with greater similarity with C1 is selected as the center C2 of another group. When selecting C2, if the target user with the greatest similarity to C1 is selected, it may be affected by noise due to outlier users, so the user with greater similarity to C1 is selected as C2. The selection process of the user with high similarity here is: take a random value, which is greater than 0 and less than the sum of the similarity between the central user C1 and all other target users; After the similarity between each single target user other than C1 and C1, these similarities are added. During the addition process, the sum of the similarities of the current n target users is less than the aforementioned random value, and the n+1th The sum of the similarity of the target users is greater than the aforementioned random value, then the n+1th target user is C2. The resulting C2 can be added to the grouping center set. where n is a positive integer.

接着，再计算除了C1、C2以外的其余所有目标用户与分组中心集合的相似度之和(在之前求取C2时，计算所有目标用户与C1之间的相似度之和可视为计算所有目标用户与仅包括C1的分组中心集合的相似度之和)。此时，分组中心集合包括有C1、C2两个用户，但将分组中心集合视为一个整体，计算另一目标用户与分组中心集合的相似度时，只需要计算一个值。该值取另一目标用户与分组中心集合中各个分组中心用户之间相似度的最小值，即：在计算一目标用户与分组中心集合的相似度时，分别计算该目标用户与C1、C2的相似度，然后将较小的相似度值作为该目标用户与分组中心集合的相似度。Next, calculate the sum of the similarity between all target users except C1 and C2 and the grouping center set (when C2 was obtained before, calculating the sum of the similarity between all target users and C1 can be regarded as calculating all targets The sum of the similarity of the user to the grouping center set including only C1). At this time, the grouping center set includes two users, C1 and C2, but the grouping center set is regarded as a whole, and only one value needs to be calculated when calculating the similarity between another target user and the grouping center set. This value takes the minimum value of the similarity between another target user and each grouping center user in the grouping center set, that is: when calculating the similarity between a target user and the grouping center set, calculate the similarity between the target user and C1 and C2 respectively. similarity, and then take the smaller similarity value as the similarity between the target user and the grouping center set.

计算出除C1、C2以外的其余所有目标用户分别与分组中心集合之间的相似度后，根据该相似度之和可生成另一个随机值，该随机值大于0且小于除了C1、C2以外的其余所有目标用户与分组中心集合的相似度之和。然后将除C1、C2以外的其余所有目标用户与分组中心集合之间的各个相似度相加，当前n个目标用户的相似度之和小于前述的另一个随机值，而第n+1个目标用户的相似度之和大于前述的另一个随机值，那么这第n+1个目标用户就是C3。所得到的C3可添加到分组中心集合中。After calculating the similarity between all target users except C1 and C2 and the grouping center set, another random value can be generated according to the sum of the similarity, the random value is greater than 0 and less than except C1 and C2. The sum of the similarity between all remaining target users and the grouping center set. Then, the similarity between all the remaining target users except C1 and C2 and the grouping center set is added. The sum of the similarity of users is greater than the other random value mentioned above, then the n+1th target user is C3. The resulting C3 can be added to the grouping center set.

按照上述的方法不断选取新的分组中心，直到分组中心的数量达到K个，其中K就是所要分组的数量。According to the above method, new grouping centers are continuously selected until the number of grouping centers reaches K, where K is the number of groups to be grouped.

在得到多个分组的中心用户后，计算未被确定为分组中心用户的目标用户与各个分组的中心用户之间的相似度，将目标用户分配到与其相似度最高的分组中心用户所在的分组中。此处所得到的分组为初始分组。After obtaining the central users of multiple groups, calculate the similarity between the target users who are not identified as the central users of the grouping and the central users of each group, and assign the target users to the group where the central user of the grouping with the highest similarity is located. . The grouping obtained here is the initial grouping.

步骤102、根据所述初始分组内的用户之间的相似度，重新确定所述中心用户，并根据重新确定的所述中心用户与所述目标用户中除重新确定的所述中心用户之外的用户之间的相似度对所述目标用户进行再次分组。Step 102: Re-determine the center user according to the similarity between the users in the initial group, and determine the center user according to the re-determined center user and the target user except the re-determined center user. The similarity between users regroups the target users.

步骤103、重复执行步骤102直至各个分组所包含的用户不再变化。Step 103: Repeat step 102 until the users included in each group no longer change.

将目标用户分配到各个分组后，目标用户与所在的分组未必是最佳匹配。因此需要对分组所包含的用户进行调整。After assigning target users to each group, the target users may not be the best match with the group they belong to. Therefore, it is necessary to adjust the users included in the group.

在调整时，首先重新确定各个分组的中心用户。在本发明实施例中，即计算目标用户与所在分组内的其他所有目标用户之间的相似度之和，确定相似度之和的值最大的目标用户为分组的新的中心用户。During adjustment, the central users of each group are first re-determined. In this embodiment of the present invention, the sum of the similarities between the target user and all other target users in the group is calculated, and the target user with the largest sum of the similarities is determined as the new central user of the group.

在重新确定各个分组的中心用户后，计算未被确定为分组中心用户的目标用户与各个分组的新的中心用户之间的相似度，将目标用户分配到与其相似度最高的分组中心用户所在的分组中。After re-determining the center users of each group, calculate the similarity between the target users who are not identified as group center users and the new center users of each group, and assign the target users to the group center user with the highest similarity. in grouping.

上述对分组所包含的目标用户进行调整的过程需要迭代进行，直至各个分组所包含的用户不再发生变化。The above process of adjusting the target users included in a group needs to be performed iteratively until the users included in each group no longer change.

步骤104、根据各个分组内包含的已知异常用户的数量确定异常分组，并将异常分组内的用户确定为异常用户。Step 104: Determine an abnormal group according to the number of known abnormal users included in each group, and determine the users in the abnormal group as abnormal users.

在前文中已经提到，在获取用户的基本信息时，有些用户已经被识别为异常用户，因此在最终确定各个分组所包含的用户后，可获取各个分组中已知异常用户的数量。As mentioned above, when acquiring basic information of users, some users have been identified as abnormal users. Therefore, after the users included in each group are finally determined, the number of known abnormal users in each group can be obtained.

鉴于异常用户具有相似的行为轨迹，可将一个分组中异常用户的数量高于异常判定阈值的分组确定为异常用户分组，将这一异常用户分组中的用户确定为异常用户。从本步骤的描述可以看出，异常用户分组中未被其他异常识别方法识别出的隐藏异常用户可以被本发明实施例提供的方法找出。Given that abnormal users have similar behavior trajectories, a group whose number of abnormal users in a group is higher than the abnormal judgment threshold can be determined as an abnormal user group, and the users in this abnormal user group are determined as abnormal users. It can be seen from the description of this step that hidden abnormal users in the abnormal user group that have not been identified by other abnormal identification methods can be found by the method provided by the embodiment of the present invention.

在一个分组内，已知异常用户的数量或在分组内所占的比例后，将数量或比例与异常判定阈值进行比较，若高于异常判定阈值，则该分组为异常分组，分组内的用户为异常用户。In a group, after knowing the number of abnormal users or their proportion in the group, compare the number or proportion with the abnormal judgment threshold. If it is higher than the abnormal judgment threshold, the group is an abnormal group, and the users in the group are abnormal. for abnormal users.

所述异常判定阈值可以是一个具体的数值，也可以是一个比例值，具体的取值需根据实际应用情况确定。The abnormality determination threshold may be a specific numerical value or a proportional value, and the specific value should be determined according to the actual application.

本发明实施例提供的异常用户识别方法通过目标用户之间的相似度来实现用户的聚类，基于异常用户具有相似的行为轨迹这一特点，利用已发现的异常用户找出异常用户所在的分组，从而发现更多隐藏的异常用户，具有识别效率高、识别能力强的优点。The abnormal user identification method provided by the embodiment of the present invention realizes the clustering of users through the similarity between target users, and based on the characteristic that abnormal users have similar behavioral trajectories, the abnormal users that have been discovered are used to find out the group where the abnormal users belong. , so as to find more hidden abnormal users, which has the advantages of high recognition efficiency and strong recognition ability.

基于上述任一实施例，在本发明实施例中，在步骤101之前，方法还包括：Based on any of the foregoing embodiments, in this embodiment of the present invention, before step 101, the method further includes:

在本发明实施例中，所述相似度指标为相似度之和。在计算一个用户与用户集合中其他单个用户的相似度时，从时间维度、平台维度、设备维度上分别计算相似度。具体描述如下。In this embodiment of the present invention, the similarity index is the sum of the similarities. When calculating the similarity between a user and other single users in the user set, the similarity is calculated from the time dimension, platform dimension, and device dimension. The specific description is as follows.

(1)时间维度上的相似度(1) Similarity in the time dimension

用户的登录时间可对应成一个离散的、长度统一的时间段集合；其中每个时间段都是固定长度且相等。时间段的具体长度可根据需要设定，时间段越小越精确，但数据也会更稀疏。The user's login time may correspond to a discrete set of time periods of uniform length; each of the time periods has a fixed length and is equal. The specific length of the time period can be set as required. The smaller the time period, the more accurate, but the data will be more sparse.

在本发明实施例中，以天为统计单位把登录时间划分成48个连续的时间序列，即每个时间段代表半个小时。由此得到每个用户的一个登录时间集合，用户i在时间序列上对应一个签到时间集合S_it为：{S_i1,S_i2,…S_i48}。其中，S_it为用户i在时间段t内登录的记录，若用户登录了，则向量对应的值为1，否则为0。需要说明的是，本发明实施例中所涉及的登录不仅限于登录操作那个时间点，还包括用户在登录后、退出前所有的使用时间。类似的，可以得到用户j在同一时间内的签到时间集合S_jt：{S_j1,S_j2,…S_j48}。据此可计算用户i和用户j在时间维度上的相似度simt：In the embodiment of the present invention, the log-in time is divided into 48 consecutive time series with days as the statistical unit, that is, each time period represents half an hour. Thus, a log-in time set of each user is obtained, and a log-in time set S _it corresponding to user i in the time series is: {S _i1 , S _i2 ,...S _i48 }. Among them, S _it is the record of user i logged in during time period t. If the user logs in, the value corresponding to the vector is 1, otherwise it is 0. It should be noted that the login involved in the embodiment of the present invention is not limited to the time point of the login operation, but also includes all the usage time of the user after login and before logout. Similarly, the check-in time set S _jt of user j at the same time can be obtained: {S _j1 , S _j2 ,...S _j48 }. According to this, the similarity simt of user i and user j in the time dimension can be calculated:

其中，S_it表示用户i在时间段t内登录的记录；S_jt表示用户j在时间段t内登录的记录。Among them, S _it represents the records logged in by the user i in the time period t; S _jt represents the records logged in by the user j in the time period t.

从上述公式可以看出，当用户i和用户j差异度d越小，用户相似度越高。It can be seen from the above formula that when the difference d between user i and user j is smaller, the user similarity is higher.

(2)平台维度上的相似度(2) Similarity in platform dimension

对于用户登录的平台，使用P_i,p标识用户i在平台p上的登录行为。当用户i在平台p上有登录行为，则P_i,p的值为1，没有登录行为则为0。据此可得到用户i登录m个平台所对应的登录平台向量(P_i1,P_i2,…,P_im)。类似的，可以得到用户j登录同样的m个平台所对应的登录平台向量(P_j1,P_j2,…,P_jm)。For the platform where the user logs in, use P _i,p to identify the login behavior of user i on the platform p. When user i has a login behavior on the platform p, the value of P _i,p is 1, and if there is no login behavior, it is 0. According to this, the login platform vectors (P _i1 , P _i2 , . . . , P _im ) corresponding to the user i logging in to m platforms can be obtained. Similarly, the login platform vectors (P _j1 , P _j2 , . . . , P _jm ) corresponding to the user j logging in to the same m platforms can be obtained.

据此可计算用户i和用户j在平台维度上的相似度simp：According to this, the similarity simp between user i and user j in the platform dimension can be calculated:

其中，P_ip表示用户i在平台p上登录的记录；P_jp表示用户j在平台p上登录的记录。Among them, P _ip represents the record of user i logging in on platform p; P _jp represents the record of user j logging in on platform p.

(3)设备维度上的相似度(3) Similarity in device dimension

对于用户登录的设备，使用E_i,s标识用户i在设备s上的登录行为。当用户i在设备s上有登录行为，则E_i,s的值为1，没有登录行为的值为0。则得到用户i登录设备的向量(E_i1,E_i2,…,E_iw)。类似的，可以得到用户j登录同样的w个设备所对应的登录设备向量(E_j1,E_j2,…,E_jw)。For the device where the user logs in, use E _i,s to identify the login behavior of user i on device s. When user i has login behavior on device s, the value of E _i,s is 1, and the value of no login behavior is 0. Then the vector (E _i1 , E _i2 , . . . , E _iw ) of the user i logging in to the device is obtained. Similarly, the login device vectors (E _j1 , E _j2 , . . . , E _jw ) corresponding to the user j logging in to the same w devices can be obtained.

据此可计算用户i和用户j在设备维度上的相似度sime：According to this, the similarity sime between user i and user j in the device dimension can be calculated:

其中，E_is表示用户i在设备s上登录的记录；E_js表示用户j在设备s上登录的记录。Among them, E _is represents the record that user i logs in on device s; E _js represents the record that user j logs in on device s.

需要说明的是，由于设备的种类繁多，因此设备维度的设备值比较杂乱，本发明实施例中主要取现有网络中使用设备的用户量较多的设备。对于使用用户量较少设备的用户，则向量的值全为0。对于全为0的用户，sime＝1。It should be noted that, due to the wide variety of devices, the device values in the device dimension are relatively messy, and in the embodiment of the present invention, the device with a large number of users using the device is mainly selected in the existing network. For users using devices with fewer users, the vector's values are all 0s. For users with all 0s, sime=1.

在计算出时间维度、平台维度、设备维度上各自的相似度之后，就可计算用户之间的相似度。在本发明实施例中，用户之间的相似度sim通过为三个维度的相似度求平均值得到，即sim＝1/3*(simt+simp+sime)。在本发明的其他实施例中，也可根据实际需要为三个维度的相似度分别设置一定的权重，进而计算用户之间的相似度。After calculating the similarity in the time dimension, platform dimension, and device dimension, the similarity between users can be calculated. In the embodiment of the present invention, the similarity sim between users is obtained by averaging the similarity in three dimensions, that is, sim=1/3*(simt+simp+sime). In other embodiments of the present invention, certain weights may be respectively set for the similarity of the three dimensions according to actual needs, so as to calculate the similarity between users.

通过上述描述可以计算出一个用户与另一个用户之间的相似度，为一个用户与其他所有单个用户的相似度求和，就可以得到一个用户与其他所有用户的相似度之和。Through the above description, the similarity between one user and another user can be calculated, and by summing the similarity between one user and all other single users, the sum of the similarity between one user and all other users can be obtained.

本发明实施例提供的异常用户识别方法通过计算用户之间在时间维度、设备维度、平台维度这三个维度上的相似度得到用户之间的相似度，利用用户之间的相似度来实现用户的聚类，基于异常用户具有相似的行为轨迹这一特点，利用已发现的异常用户找出异常用户所在的分组，从而发现更多隐藏的异常用户，具有识别效率高、识别能力强的优点。The abnormal user identification method provided by the embodiment of the present invention obtains the similarity between users by calculating the similarity between users in the three dimensions of time dimension, device dimension, and platform dimension, and uses the similarity between users to realize the similarity between users. Based on the characteristic that abnormal users have similar behavioral trajectories, the abnormal users that have been discovered are used to find out the group of abnormal users, so as to find more hidden abnormal users, which has the advantages of high identification efficiency and strong identification ability.

基于上述任一实施例，在本发明实施例中，在步骤101与步骤102之间，方法还包括：Based on any of the foregoing embodiments, in this embodiment of the present invention, between step 101 and step 102, the method further includes:

对初始分组内的目标用户的信息进行降维。Dimensionality reduction is performed on the information of target users in the initial group.

在本发明实施例中，对初始分组内的目标用户的信息进行降维是在时间维度上实现的。In the embodiment of the present invention, the dimensionality reduction of the information of the target users in the initial group is implemented in the time dimension.

由于目标用户在时间维度具有稀疏性，因此在得到初始分组后，对初始分组内的用户的时间段集合选取子集合。Since the target users are sparse in the time dimension, after the initial grouping is obtained, a subset is selected for the time period set of the users in the initial grouping.

在选取子集合时，首先计算初始分组内所有用户在不同时间段内的信息熵。信息熵的计算公式如下：When selecting a subset, first calculate the information entropy of all users in the initial group in different time periods. The calculation formula of information entropy is as follows:

其中，e_ti表示初始分组内所有目标用户在第i个时间段t的信息熵；P(u_j)表示目标用户u_j在第i个时间段t登录的概率。n为初始分组中所有的目标用户数目。P(u_j)计算方式为：Among them, e _ti represents the information entropy of all target users in the initial group in the ith time period t; P(u _j ) represents the probability that the target user u _j logs in in the ith time period t. n is the number of all target users in the initial group. P(u _j ) is calculated as:

目标用户签到的总数/总的时间段数目。Total number of target user check-ins/total number of time periods.

接着，为初始分组内所有目标用户计算各个时间段的熵值后，选取熵值大于阈值a的时间段作为该初始分组的登录时间段。Next, after calculating the entropy value of each time period for all target users in the initial group, the time period with the entropy value greater than the threshold a is selected as the login time period of the initial group.

通过分组集合的登录时间段的求取，可有效降低时间维度的数量，如将之前示例中一天划分为48个时间段减少为24个时间段。The number of time dimensions can be effectively reduced by obtaining the login time period of the grouping set, for example, dividing a day into 48 time periods in the previous example and reducing it to 24 time periods.

对目标用户信息进行降维有助于减少后续操作的运算量。Dimensionality reduction for target user information helps to reduce the computational load of subsequent operations.

在本发明的其他实施例中，在计算出分组集合的登录时间段后，在新的登录时间段上，计算分组集合内各用户之间的相似度。保留相似度大于阈值b的用户，将相似度小于或等于阈值b的用户作为离散用户，从分组集合中剔除。被剔除的离散用户可作为疑似登录异常用户，采用现有技术中的其他方法对其是否为异常用户进行检测。In other embodiments of the present invention, after the login time period of the grouping set is calculated, the similarity between the users in the grouping set is calculated in the new login time period. Users whose similarity is greater than threshold b are retained, and users whose similarity is less than or equal to threshold b are regarded as discrete users and eliminated from the grouping set. The excluded discrete users can be regarded as suspected abnormal login users, and other methods in the prior art are used to detect whether they are abnormal users.

本发明实施例提供的异常用户识别方法通过对初始分组中用户信息的降维有助于减少运算量，在保证对隐藏异常用户的识别效果的同时，提高了识别的实时性。The abnormal user identification method provided by the embodiment of the present invention helps to reduce the calculation amount by reducing the dimension of the user information in the initial grouping, and improves the real-time performance of the identification while ensuring the identification effect of hidden abnormal users.

基于上述任一实施例，图2为本发明实施例提供的异常用户识别装置的结构图，如图2所示，本发明实施例提供的异常用户识别装置包括：Based on any of the above embodiments, FIG. 2 is a structural diagram of an abnormal user identification device provided by an embodiment of the present invention. As shown in FIG. 2 , the abnormal user identification device provided by the embodiment of the present invention includes:

初步分组模块201，用于根据目标用户之间的相似度从所述目标用户中确定中心用户，并根据所述中心用户与所述目标用户中除所述中心用户之外的用户之间的相似度对所述目标用户进行初始分组；A preliminary grouping module 201 is configured to determine a central user from the target users according to the similarity between the target users, and according to the similarity between the central user and the target users other than the central user degree to initially group the target users;

分组调整模块202，用于根据所述初始分组内的用户之间的相似度，重新确定所述中心用户，并根据重新确定的所述中心用户与所述目标用户中除重新确定的所述中心用户之外的用户之间的相似度对所述目标用户进行再次分组；The grouping adjustment module 202 is configured to re-determine the center user according to the similarity between the users in the initial group, and divide the re-determined center according to the re-determined center user and the target user The target users are grouped again by the similarity between users other than the users;

确定分组模块203，用于重复执行所述分组调整步骤直至各组所包含的用户不再变化；determining a grouping module 203, configured to repeatedly execute the grouping adjustment step until the users included in each group no longer change;

异常用户识别模块204，用于根据各组包含的已知异常用户的数量确定异常分组，并将异常分组内的用户确定为异常用户。The abnormal user identification module 204 is configured to determine abnormal groups according to the number of known abnormal users included in each group, and determine the users in the abnormal groups as abnormal users.

本发明实施例提供的异常用户识别装置通过目标用户之间的相似度来实现用户的聚类，基于异常用户具有相似的行为轨迹这一特点，利用已发现的异常用户找出异常用户所在的分组，从而发现更多隐藏的异常用户，具有识别效率高、识别能力强的优点。The abnormal user identification device provided by the embodiment of the present invention realizes the clustering of users through the similarity between target users, and based on the characteristic that abnormal users have similar behavioral trajectories, the abnormal users that have been discovered are used to find out the group where the abnormal users belong. , so as to find more hidden abnormal users, which has the advantages of high recognition efficiency and strong recognition ability.

图3示例了一种电子设备的实体结构示意图，如图3所示，该电子设备可以包括：处理器(processor)310、通信接口(Communications Interface)320、存储器(memory)330和通信总线340，其中，处理器310，通信接口320，存储器330通过通信总线340完成相互间的通信。处理器310可以调用存储器330中的逻辑指令，以执行如下方法：初始分组步骤，根据目标用户之间的相似度从所述目标用户中确定中心用户，并根据所述中心用户与所述目标用户中除所述中心用户之外的用户之间的相似度对所述目标用户进行初始分组；分组调整步骤，根据所述初始分组内的用户之间的相似度，重新确定所述中心用户，并根据重新确定的所述中心用户与所述目标用户中除重新确定的所述中心用户之外的用户之间的相似度对所述目标用户进行再次分组；确定分组步骤，重复执行所述分组调整步骤直至各组所包含的用户不再变化；异常用户识别步骤，根据各组包含的已知异常用户的数量确定异常分组，并将异常分组内的用户确定为异常用户。FIG. 3 illustrates a schematic diagram of the physical structure of an electronic device. As shown in FIG. 3 , the electronic device may include: a processor (processor) 310, a communication interface (Communications Interface) 320, a memory (memory) 330 and a communication bus 340, The processor 310 , the communication interface 320 , and the memory 330 communicate with each other through the communication bus 340 . The processor 310 may invoke the logic instructions in the memory 330 to perform the following method: an initial grouping step, determining a central user from the target users according to the similarity between the target users, and determining a central user from the target users according to the similarity between the target users, and The target users are initially grouped according to the similarity between the users other than the central user; the grouping adjustment step is to re-determine the central user according to the similarity between the users in the initial grouping, and The target users are grouped again according to the similarity between the re-determined central user and the target users other than the re-determined central user; the grouping step is determined, and the grouping adjustment is performed repeatedly The steps are until the users included in each group do not change; the abnormal user identification step is to determine abnormal groups according to the number of known abnormal users included in each group, and determine the users in the abnormal groups as abnormal users.

需要说明的是，本实施例中的电子设备在具体实现时可以为服务器，也可以为PC机，还可以为其他设备，只要其结构中包括如图3所示的处理器310、通信接口320、存储器330和通信总线340，其中处理器310，通信接口320，存储器330通过通信总线340完成相互间的通信，且处理器310可以调用存储器330中的逻辑指令以执行上述方法即可。本实施例不对电子设备的具体实现形式进行限定。It should be noted that the electronic device in this embodiment may be a server, a PC, or other devices during specific implementation, as long as its structure includes the processor 310 and the communication interface 320 as shown in FIG. 3 . , a memory 330 and a communication bus 340, wherein the processor 310, the communication interface 320, and the memory 330 communicate with each other through the communication bus 340, and the processor 310 can call the logic instructions in the memory 330 to execute the above method. This embodiment does not limit the specific implementation form of the electronic device.

此外，上述的存储器330中的逻辑指令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(ROM，Read-Only Memory)、随机存取存储器(RAM，Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。In addition, the above-mentioned logic instructions in the memory 330 may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as an independent product. Based on this understanding, the technical solution of the present invention can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution. The computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes: U disk, mobile hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes .

进一步地，本发明实施例公开一种计算机程序产品，所述计算机程序产品包括存储在非暂态计算机可读存储介质上的计算机程序，所述计算机程序包括程序指令，当所述程序指令被计算机执行时，计算机能够执行上述各方法实施例所提供的方法，例如包括：初始分组步骤，根据目标用户之间的相似度从所述目标用户中确定中心用户，并根据所述中心用户与所述目标用户中除所述中心用户之外的用户之间的相似度对所述目标用户进行初始分组；分组调整步骤，根据所述初始分组内的用户之间的相似度，重新确定所述中心用户，并根据重新确定的所述中心用户与所述目标用户中除重新确定的所述中心用户之外的用户之间的相似度对所述目标用户进行再次分组；确定分组步骤，重复执行所述分组调整步骤直至各组所包含的用户不再变化；异常用户识别步骤，根据各组包含的已知异常用户的数量确定异常分组，并将异常分组内的用户确定为异常用户。Further, an embodiment of the present invention discloses a computer program product, the computer program product includes a computer program stored on a non-transitory computer-readable storage medium, the computer program includes program instructions, and when the program instructions are executed by a computer During execution, the computer can execute the methods provided by the above method embodiments, for example, including: an initial grouping step, determining a central user from the target users according to the similarity between target users, and The target users are initially grouped according to the similarity between users other than the central user among the target users; the grouping adjustment step is to re-determine the central user according to the similarity between the users in the initial grouping , and group the target users again according to the similarity between the re-determined central user and the target users except the re-determined central user; The grouping adjustment step is performed until the users included in each group do not change; the abnormal user identification step is to determine an abnormal group according to the number of known abnormal users included in each group, and determine the users in the abnormal group as abnormal users.

另一方面，本发明实施例还提供一种非暂态计算机可读存储介质，其上存储有计算机程序，该计算机程序被处理器执行时实现以执行上述各实施例提供的方法，例如包括：初始分组步骤，根据目标用户之间的相似度从所述目标用户中确定中心用户，并根据所述中心用户与所述目标用户中除所述中心用户之外的用户之间的相似度对所述目标用户进行初始分组；分组调整步骤，根据所述初始分组内的用户之间的相似度，重新确定所述中心用户，并根据重新确定的所述中心用户与所述目标用户中除重新确定的所述中心用户之外的用户之间的相似度对所述目标用户进行再次分组；确定分组步骤，重复执行所述分组调整步骤直至各组所包含的用户不再变化；异常用户识别步骤，根据各组包含的已知异常用户的数量确定异常分组，并将异常分组内的用户确定为异常用户。On the other hand, an embodiment of the present invention further provides a non-transitory computer-readable storage medium on which a computer program is stored, and the computer program is implemented when executed by a processor to execute the methods provided by the foregoing embodiments, for example, including: In the initial grouping step, the central user is determined from the target users according to the similarity between the target users, and all the target users are classified according to the similarity between the central user and the target users except the central user. The target users are initially grouped; in the grouping adjustment step, the central user is re-determined according to the similarity between the users in the initial group, and the central user is re-determined according to the difference between the re-determined central user and the target user. The target users are grouped again according to the similarity between users other than the central user; the grouping step is determined, and the grouping adjustment step is repeatedly performed until the users included in each group no longer change; the abnormal user identification step, An abnormal group is determined according to the number of known abnormal users included in each group, and the users in the abnormal group are determined as abnormal users.

以上所描述的装置实施例仅仅是示意性的，其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下，即可以理解并实施。The device embodiments described above are only illustrative, wherein the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in One place, or it can be distributed over multiple network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment. Those of ordinary skill in the art can understand and implement it without creative effort.

通过以上的实施方式的描述，本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现，当然也可以通过硬件。基于这样的理解，上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品可以存储在计算机可读存储介质中，如ROM/RAM、磁碟、光盘等，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行各个实施例或者实施例的某些部分所述的方法。From the description of the above embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus a necessary general hardware platform, and certainly can also be implemented by hardware. Based on this understanding, the above-mentioned technical solutions can be embodied in the form of software products in essence or the parts that make contributions to the prior art, and the computer software products can be stored in computer-readable storage media, such as ROM/RAM, magnetic A disc, an optical disc, etc., includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the methods described in various embodiments or some parts of the embodiments.

最后应说明的是：以上实施例仅用以说明本发明的技术方案，而非对其限制；尽管参照前述实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, but not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that it can still be The technical solutions described in the foregoing embodiments are modified, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. An abnormal user identification method is characterized by comprising the following steps:

an initial grouping step, namely determining a central user from target users according to the similarity between the target users, and performing initial grouping on the target users according to the similarity between the central user and users except the central user in the target users;

a grouping adjustment step, namely re-determining the central user according to the similarity between the users in the initial grouping, and re-grouping the target users according to the similarity between the re-determined central user and the users except the re-determined central user in the target users;

determining grouping, and repeatedly executing the grouping adjustment step until the users contained in each group do not change any more;

and an abnormal user identification step, namely determining abnormal groups according to the number of the known abnormal users contained in each group, and determining the users in the abnormal groups as the abnormal users.

2. The abnormal user identification method according to claim 1, wherein the similarity between the target users is a sum of similarities between any one of the target users and all users except the any one of the target users.

3. The abnormal user identification method according to claim 2, further comprising, before the initial grouping step:

calculating a similarity between any one of the target users and one of the target users other than the any user, the similarity including one or more of the following similarities: similarity in time dimension, similarity on user login platform and similarity on user login device;

summing the similarity between any user in the target users and all users except any user in the target users to obtain the sum of the similarity between any user in the target users and all users except any user in the target users.

4. The abnormal user identification method according to claim 2, wherein the determining a center user from the target users according to the similarity between the target users comprises:

determining a first user according to the maximum similarity between the target users;

determining the first user as a central user;

determining a second user according to a minimum similarity between a non-central user and the central user in the target users and a preset first threshold;

and determining the second user as a new central user, and returning to the step of determining the second user according to the minimum value of the similarity between the non-central user and the central user in the target users and a preset first threshold value, and repeating the steps until the number of the central users reaches a preset number threshold value.

5. The abnormal user identification method according to claim 4, wherein the determining the second user according to the minimum value of the similarity between the non-central user of the target user and the central user and a preset first threshold value comprises:

calculating the minimum similarity between the non-central user of the target user and the central user;

when the sum of the minimum similarity values between the first n non-central users of the target users and the central user is smaller than a first threshold value, and the sum of the minimum similarity values between the first n +1 non-central users of the target users and the central user is not smaller than the first threshold value, determining the n +1 non-central user as the second user;

wherein the first threshold is a random value between 0 and a first similarity sum, and the first similarity sum is a similarity minimum sum between all non-central users in the target users and the central user; n is a positive integer.

6. The abnormal user identification method according to claim 3, wherein the calculating the similarity between any one of the target users and one of the target users other than the any user comprises:

calculating the difference degree between a third user and a fourth user according to the login record of the third user and the login record of the fourth user;

the third user is any user in the target users, and the fourth user is one user except any user in the target users;

and calculating the similarity between the third user and the fourth user according to the difference between the third user and the fourth user.

7. The method for identifying an abnormal user according to claim 6, wherein the calculating the degree of difference between the third user and the fourth user according to the log-in record of the third user and the log-in record of the fourth user adopts the following formula:

wherein d (i, j) represents a degree of difference between the third user and the fourth user;

when the similarity includes a similarity in a time dimension, a parameter H_iA record representing that a third user i logged in within a first time period; h_jA record representing that a fourth user j logged in within a first time period;

when the similarity includes a similarity on a user login platform, a parameter H_iA record representing a third user i logged in on the first platform; h_jA record representing a fourth user j logged in on the first platform;

when the similarity includes a similarity on a user login device, a parameter H_iA record representing that a third user i is logged in on the first device; h_jA record representing a fourth user j logged in on the first device;

the calculating the similarity between the third user and the fourth user according to the difference between the third user and the fourth user adopts the following formula:

wherein sim (i, j) represents a similarity between the third user and the fourth user.

8. An abnormal user identification apparatus, comprising:

the preliminary grouping module is used for determining a central user from the target users according to the similarity between the target users and initially grouping the target users according to the similarity between the central user and the users except the central user in the target users;

a grouping adjustment module, configured to re-determine the central user according to a similarity between users in the initial grouping, and re-group the target users according to a similarity between the re-determined central user and a user other than the re-determined central user among the target users;

a grouping determining module for repeatedly executing the grouping adjusting step until the users contained in each group do not change any more;

and the abnormal user identification module is used for determining abnormal groups according to the number of the known abnormal users contained in each group and determining the users in the abnormal groups as the abnormal users.

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program performs the steps of the method for abnormal user identification according to any of claims 1 to 7.

10. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method for abnormal user identification according to any one of claims 1 to 7.