CN111586001A - Abnormal user identification method and device, electronic equipment and storage medium - Google Patents

Abnormal user identification method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN111586001A
CN111586001A CN202010351557.5A CN202010351557A CN111586001A CN 111586001 A CN111586001 A CN 111586001A CN 202010351557 A CN202010351557 A CN 202010351557A CN 111586001 A CN111586001 A CN 111586001A
Authority
CN
China
Prior art keywords
user
users
similarity
abnormal
central
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010351557.5A
Other languages
Chinese (zh)
Other versions
CN111586001B (en
Inventor
王浩然
邵传贤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Migu Cultural Technology Co Ltd
China Mobile Communications Group Co Ltd
Original Assignee
Migu Cultural Technology Co Ltd
China Mobile Communications Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Migu Cultural Technology Co Ltd, China Mobile Communications Group Co Ltd filed Critical Migu Cultural Technology Co Ltd
Priority to CN202010351557.5A priority Critical patent/CN111586001B/en
Publication of CN111586001A publication Critical patent/CN111586001A/en
Application granted granted Critical
Publication of CN111586001B publication Critical patent/CN111586001B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/552Detecting local intrusion or implementing counter-measures involving long-term monitoring or reporting
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/16Threshold monitoring
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/08Network architectures or network communication protocols for network security for authentication of entities
    • H04L63/0876Network architectures or network communication protocols for network security for authentication of entities based on the identity of the terminal or configuration, e.g. MAC address, hardware or software configuration or device fingerprint
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computing Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Power Engineering (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides an abnormal user identification method, an abnormal user identification device, electronic equipment and a storage medium; the method comprises the following steps: an initial grouping step, namely determining a central user from target users according to the similarity between the target users, and performing initial grouping on the target users according to the similarity between the central user and users except the central user in the target users; a grouping adjustment step, namely re-determining the central user according to the similarity between the users in the initial grouping, and re-grouping the target users according to the similarity between the re-determined central user and the users except the re-determined central user in the target users; determining grouping, and repeatedly executing the grouping adjustment step until the users contained in each group do not change any more; and an abnormal user identification step, namely determining abnormal groups according to the number of the known abnormal users contained in each group, and determining the users in the abnormal groups as the abnormal users.

Description

Abnormal user identification method and device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of network security, and in particular, to a method and an apparatus for identifying an abnormal user, an electronic device, and a storage medium.
Background
Abnormal login refers to login behavior that is significantly different from the daily habits of the user. Since abnormal login is a common phenomenon of network intrusion behavior, a user with the abnormal login behavior is likely to be an implementer of the network intrusion behavior, and therefore, the identification of the abnormal login user is of great significance in the field of network security.
In the prior art, an abnormal login user is usually discovered according to the login times of the user, an IP address used during login and a device used during login. However, when a network attacker uses a decentralized IP or analog device to implement login behavior, the abnormal login user identification method in the prior art will be difficult to find the abnormal user.
In summary, the method for identifying an abnormal login user in the prior art is difficult to discover a hidden abnormal user, and the efficiency of discovering the abnormal user is low.
Disclosure of Invention
The embodiment of the invention provides an abnormal user identification method, an abnormal user identification device, electronic equipment and a storage medium, which are used for solving the defects that a hidden abnormal user is difficult to find by an abnormal login user identification method in the prior art and the efficiency of finding the abnormal user is low.
An embodiment of a first aspect of the present invention provides an abnormal user identification method, including:
an initial grouping step, namely determining a central user from target users according to the similarity between the target users, and performing initial grouping on the target users according to the similarity between the central user and users except the central user in the target users;
a grouping adjustment step, namely re-determining the central user according to the similarity between the users in the initial grouping, and re-grouping the target users according to the similarity between the re-determined central user and the users except the re-determined central user in the target users;
determining grouping, and repeatedly executing the grouping adjustment step until the users contained in each group do not change any more;
and an abnormal user identification step, namely determining abnormal groups according to the number of the known abnormal users contained in each group, and determining the users in the abnormal groups as the abnormal users.
In the above technical solution, the similarity between the target users is a sum of similarities between any one of the target users and all users except the any one of the target users.
In the above technical solution, before the step of initially grouping, the method further includes:
calculating a similarity between any one of the target users and one of the target users other than the any user, the similarity including one or more of the following similarities: similarity in time dimension, similarity on user login platform and similarity on user login device;
summing the similarity between any user in the target users and all users except any user in the target users to obtain the sum of the similarity between any user in the target users and all users except any user in the target users.
In the above technical solution, the determining a center user from the target users according to the similarity between the target users includes:
determining a first user according to the maximum similarity between the target users;
determining the first user as a central user;
determining a second user according to a minimum similarity between a non-central user and the central user in the target users and a preset first threshold;
and determining the second user as a new central user, and returning to the step of determining the second user according to the minimum value of the similarity between the non-central user and the central user in the target users and a preset first threshold value, and repeating the steps until the number of the central users reaches a preset number threshold value.
In the above technical solution, the determining the second user according to the minimum value of the similarity between the non-central user of the target user and the central user and a preset first threshold includes:
calculating the minimum similarity between the non-central user of the target user and the central user;
when the sum of the minimum similarity values between the first n non-central users of the target users and the central user is smaller than a first threshold value, and the sum of the minimum similarity values between the first n +1 non-central users of the target users and the central user is not smaller than the first threshold value, determining the n +1 non-central user as the second user;
wherein the first threshold is a random value between 0 and a first similarity sum, and the first similarity sum is a similarity minimum sum between all non-central users in the target users and the central user; n is a natural number.
In the above technical solution, the calculating the similarity between any one of the target users and one of the target users other than the any user includes:
calculating the difference degree between a third user and a fourth user according to the login record of the third user and the login record of the fourth user;
the third user is any user in the target users, and the fourth user is one user except any user in the target users;
and calculating the similarity between the third user and the fourth user according to the difference between the third user and the fourth user.
In the above technical solution, the following formula is adopted for calculating the difference between the third user and the fourth user according to the login record of the third user and the login record of the fourth user:
Figure BDA0002471989920000031
wherein d (i, j) represents a degree of difference between the third user and the fourth user;
when the similarity includes a similarity in a time dimension, a parameter HiA record representing that a third user i logged in within a first time period; hjA record representing that a fourth user j logged in within a first time period;
when the similarity includes a similarity on a user login platform, a parameter HiA record representing a third user i logged in on the first platform; hjA record representing a fourth user j logged in on the first platform;
when the similarity includes a similarity on a user login device, a parameter HiA record representing that a third user i is logged in on the first device; hjA record representing a fourth user j logged in on the first device;
the calculating the similarity between the third user and the fourth user according to the difference between the third user and the fourth user adopts the following formula:
Figure BDA0002471989920000041
wherein sim (i, j) represents a similarity between the third user and the fourth user.
An embodiment of a second aspect of the present invention provides an abnormal user identification apparatus, including:
the preliminary grouping module is used for determining a central user from the target users according to the similarity between the target users and initially grouping the target users according to the similarity between the central user and the users except the central user in the target users;
a grouping adjustment module, configured to re-determine the central user according to a similarity between users in the initial grouping, and re-group the target users according to a similarity between the re-determined central user and a user other than the re-determined central user among the target users;
a grouping determining module for repeatedly executing the grouping adjusting step until the users contained in each group do not change any more;
and the abnormal user identification module is used for determining abnormal groups according to the number of the known abnormal users contained in each group and determining the users in the abnormal groups as the abnormal users.
In a third embodiment of the present invention, an electronic device is provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the steps of the method for identifying an abnormal user according to the first embodiment are implemented.
A fourth aspect of the present invention provides a non-transitory computer readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps of the abnormal user identification method according to the first aspect.
According to the abnormal user identification method, the abnormal user identification device, the electronic equipment and the storage medium, clustering of users is achieved through similarity among the users, and based on the characteristic that the abnormal users have similar behavior tracks, the groups where the abnormal users are located are found out by utilizing the found abnormal users, so that more hidden abnormal users are found, and the abnormal user identification method and the abnormal user identification device have the advantages of being high in identification efficiency and strong in identification capacity.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
Fig. 1 is a flowchart of an abnormal user identification method according to an embodiment of the present invention;
fig. 2 is a structural diagram of an abnormal user identification apparatus according to an embodiment of the present invention;
fig. 3 illustrates a physical structure diagram of an electronic device.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a flowchart of an abnormal user identification method according to an embodiment of the present invention, and as shown in fig. 1, the abnormal user identification method according to the embodiment of the present invention includes:
step 101, determining a central user from the target users according to the similarity between the target users, and initially grouping the target users according to the similarity between the central user and the users except the central user in the target users.
The basic idea of the abnormal user identification method provided by the embodiment of the invention is that based on the fact that abnormal users have similar behavior tracks, more hidden abnormal users can be found by using the found abnormal users. Based on this idea, data of a plurality of target users is first collected. Some of the plurality of target users have been identified as abnormal users, for example, by using a method of identifying abnormal users based on the login times of the users or the IP addresses used during login or the devices used during login in the prior art. However, most of the target users in the group fail to recognize their identities, which may be normal users or hidden abnormal users.
Before calculating the similarity of the target users, data of the target users are collected firstly. The collected data of the plurality of target users includes log logs of the target users. The log generally includes a large amount of behavior records of the target user, such as the login time of the target user, the login platform of the target user, and the device used by the target user during login. The target user's data may also include the target user's identity label information, such as anomalous users that have been identified, that may be marked in their identity label information.
These target users may form a set of users. Any target user in the set can calculate the similarity with other single target users in the set, and based on the similarity between one target user and other single target users, the similarity index between one user and other multiple target users can be further calculated. The similarity index reflects the overall similarity relationship between one user and other multiple target users. In the embodiment of the present invention, the similarity index is a sum of similarities. In other embodiments of the present invention, the similarity index may also be other expression modes such as a square sum of similarities.
In the embodiment of the invention, the similarity between one target user and other single target users in the set is added, so that the sum of the similarity between the target user and all other target users except the target user in the set is obtained. In the embodiment of the present invention, the method for calculating the similarity between users is not limited, and a method for calculating the similarity known to those skilled in the art may be used, or a method for calculating the similarity described in another embodiment of the present invention may be used. In the embodiment of the present invention, the time point of the similarity calculation is not limited, and the similarity calculation may be completed in advance before the step is executed, or may be completed in real time during the step.
A plurality of grouped center users may be determined from the plurality of target users based on a sum of similarities between any one of the plurality of target users and all other target users.
The number of the plurality of groups can be determined according to actual conditions, such as the number of target users and the like.
When determining the plurality of grouped center users, the center user C1 of the first group is selected as the target user with the highest similarity and similarity to other target users among all target users. The resulting C1 may be added to the set of packet centers.
Then, the target user with a greater similarity to C1 is selected as the center C2 of another group. When selecting C2, if the target user with the greatest similarity to C1 is selected, it is possible that the outlier user will have a noise effect, so the user with the greatest similarity to C1 is selected as C2. The selection process of the user with the greater similarity here is as follows: taking a random value which is greater than 0 and less than the sum of the similarity between the central user C1 and all other target users; after calculating the similarities between the individual target users except C1 and C1, the similarities are added, and in the adding process, the sum of the similarities of the current n target users is smaller than the random value, and the sum of the similarities of the (n + 1) th target user is larger than the random value, so that the (n + 1) th target user is C2. The resulting C2 may be added to the set of packet centers. Wherein n is a positive integer.
Then, the sum of the similarities of all the other target users except C1 and C2 and the group center set is calculated (when C2 is found before, the sum of the similarities of all the target users and C1 is calculated, which can be regarded as the sum of the similarities of all the target users and the group center set only including C1). At this time, the group center set includes two users, C1 and C2, but when the group center set is regarded as a whole and the similarity between another target user and the group center set is calculated, only one value needs to be calculated. The value is the minimum value of the similarity between another target user and each group center user in the group center set, namely: when calculating the similarity between a target user and the group center set, the similarity between the target user and C1 and C2 is calculated respectively, and then the smaller similarity value is taken as the similarity between the target user and the group center set.
After calculating the similarity between all target users except C1 and C2 and the grouping center set, another random value may be generated according to the sum of the similarities, where the random value is greater than 0 and less than the sum of the similarities between all target users except C1 and C2 and the grouping center set. Then, the respective similarities between all the other target users except C1 and C2 and the group center set are added, the sum of the similarities of the current n target users is smaller than the aforementioned another random value, and the sum of the similarities of the (n + 1) th target user is larger than the aforementioned another random value, so that the (n + 1) th target user is C3. The resulting C3 may be added to the set of packet centers.
And continuously selecting new grouping centers according to the method until the number of the grouping centers reaches K, wherein K is the number of the groups to be grouped.
After obtaining a plurality of grouped central users, calculating the similarity between the target user which is not determined as the grouped central user and the central user of each group, and distributing the target user to the group where the grouped central user with the highest similarity is located. The resulting packet here is the initial packet.
102, re-determining the central user according to the similarity between the users in the initial grouping, and re-grouping the target users according to the similarity between the re-determined central user and the users except the re-determined central user.
And 103, repeatedly executing the step 102 until the users contained in each group do not change any more.
After assigning the target user to each group, the target user may not be the best match to the group in which it is located. And therefore requires adjustments to the users contained in the packet.
In the adjustment, the central users of the individual groups are first redetermined. In the embodiment of the invention, the sum of the similarity between the target user and all other target users in the group is calculated, and the target user with the maximum value of the sum of the similarities is determined as the new central user of the group.
After the central users of each group are re-determined, the similarity between the target user which is not determined as the central user of the group and the new central user of each group is calculated, and the target user is distributed to the group where the central user of the group with the highest similarity is located.
The above process of adjusting the target users included in the groups needs to be performed iteratively until the users included in each group do not change any more.
And 104, determining abnormal groups according to the number of the known abnormal users contained in each group, and determining the users in the abnormal groups as the abnormal users.
It has been mentioned in the foregoing that, when acquiring basic information of users, some users have been identified as abnormal users, and therefore, after finally determining the users included in each group, the number of known abnormal users in each group can be acquired.
In view of the fact that abnormal users have similar behavior tracks, the group in which the number of abnormal users in one group is higher than the abnormal determination threshold can be determined as an abnormal user group, and the users in the abnormal user group are determined as abnormal users. It can be seen from the description of this step that hidden abnormal users in the abnormal user group that are not identified by other abnormal identification methods can be found by the method provided by the embodiment of the present invention.
In a group, after the number of abnormal users or the proportion of the abnormal users in the group is known, the number or the proportion is compared with an abnormal judgment threshold value, if the number or the proportion is higher than the abnormal judgment threshold value, the group is an abnormal group, and the users in the group are abnormal users.
The abnormality determination threshold may be a specific numerical value or a proportional value, and the specific value is determined according to the actual application condition.
The abnormal user identification method provided by the embodiment of the invention realizes the clustering of users through the similarity between target users, and finds out the group where the abnormal user is located by utilizing the discovered abnormal user based on the characteristic that the abnormal user has similar behavior tracks, thereby discovering more hidden abnormal users, and having the advantages of high identification efficiency and strong identification capability.
Based on any one of the above embodiments, in an embodiment of the present invention, before step 101, the method further includes:
calculating a similarity between any one of the target users and one of the target users other than the any user, the similarity including one or more of the following similarities: similarity in time dimension, similarity on user login platform and similarity on user login device;
summing the similarity between any user in the target users and all users except any user in the target users to obtain the sum of the similarity between any user in the target users and all users except any user in the target users.
In the embodiment of the present invention, the similarity index is a sum of similarities. When the similarity between one user and other single users in the user set is calculated, the similarity is calculated from the time dimension, the platform dimension and the equipment dimension respectively. The specific description is as follows.
(1) Similarity in time dimension
The login time of the user can correspond to a discrete time period set with a uniform length; each of which is of fixed length and equal. The specific length of the time period can be set according to needs, and the smaller the time period is, the more accurate the time period is, but the more sparse the data is.
In the present embodiment, the login time is divided into 48 consecutive time series in statistical units of days, i.e. each time period represents half an hour. Thus, a login time set of each user is obtained, and a user i corresponds to a check-in time set S in the time sequenceitComprises the following steps: { Si1,Si2,…Si48}. Wherein S isitAnd recording the login of the user i in the time period t, wherein if the user logs in, the value corresponding to the vector is 1, and otherwise, the value is 0. It should be noted that the login involved in the embodiment of the present invention is not limited to the time point of the login operation, but also includes all the usage time of the user after login and before logout. Similarly, a set S of check-in times of the user j in the same time can be obtainedjt:{Sj1,Sj2,…Sj48}. Accordingly, the similarity simt of the user i and the user j in the time dimension can be calculated:
Figure BDA0002471989920000091
Figure BDA0002471989920000092
wherein S isitA record representing that user i logged in during a time period t; sjtRepresenting a record of user j logging in during time period t.
It can be seen from the above formula that, as the difference d between the user i and the user j is smaller, the user similarity is higher.
(2) Similarity in platform dimensions
For user logged-in platform, P is usedi,pThe login behavior of user i on platform p is identified. When the user i has login behavior on the platform P, Pi,pThe value of (1) is 0 without logging. Accordingly, the login platform vector (P) corresponding to the m platforms logged in by the user i can be obtainedi1,Pi2,…,Pim). Similarly, a login platform vector (P) corresponding to the same m platforms logged in by the user j can be obtainedj1,Pj2,…,Pjm)。
From this, the similarity simp of user i and user j in the platform dimension can be calculated:
Figure BDA0002471989920000101
Figure BDA0002471989920000102
wherein, PipA record representing that user i is logged on platform p; pjpA record representing that user j is logged on platform p.
(3) Similarity in device dimensions
For devices on which the user logs in, use Ei,sThe login behavior of user i on device s is identified. When user i has login behavior on device s, then Ei,sThe value of (1) and the value of no login behavior is 0. Then the vector of user i logging in to the device is obtained (E)i1,Ei2,…,Eiw). Class ISimilarly, a login device vector (E) corresponding to the same w devices logged in by the user j can be obtainedj1,Ej2,…,Ejw)。
Accordingly, the similarity sime of the user i and the user j in the device dimension can be calculated:
Figure BDA0002471989920000103
Figure BDA0002471989920000104
wherein E isisA record representing that user i is logged in on device s; ejsRepresenting a record of user j logging on device s.
It should be noted that, because the types of devices are various, the device values of the device dimensions are relatively cluttered, and in the embodiment of the present invention, devices with a large number of users using devices in the existing network are mainly taken. For users using a device with a smaller number of users, the values of the vectors are all 0. For all 0 users, sim is 1.
After respective similarities in the time dimension, the platform dimension, and the device dimension are calculated, the similarities between users can be calculated. In the embodiment of the present invention, the similarity sim between users is obtained by averaging the similarities in three dimensions, that is, sim is 1/3 × (simt + simp + sime). In other embodiments of the present invention, a certain weight may be set for the similarity of the three dimensions according to actual needs, so as to calculate the similarity between users.
The similarity between one user and another user can be calculated through the above description, and the sum of the similarities between one user and all other users can be obtained by summing up the similarities between one user and all other users.
The abnormal user identification method provided by the embodiment of the invention obtains the similarity among users by calculating the similarity among the users in three dimensions of time dimension, equipment dimension and platform dimension, realizes the clustering of the users by utilizing the similarity among the users, and finds out the grouping where the abnormal users are based on the characteristic that the abnormal users have similar behavior tracks by utilizing the discovered abnormal users, thereby discovering more hidden abnormal users and having the advantages of high identification efficiency and strong identification capability.
Based on any one of the above embodiments, in an embodiment of the present invention, between step 101 and step 102, the method further includes:
and reducing the dimension of the information of the target user in the initial grouping.
In the embodiment of the invention, the dimension reduction of the information of the target user in the initial grouping is realized in the time dimension.
Since the target user has sparsity in the time dimension, after the initial grouping is obtained, a subset is selected for the time period set of the users in the initial grouping.
When the subset is selected, the information entropy of all users in the initial group in different time periods is calculated first. The calculation formula of the information entropy is as follows:
Figure BDA0002471989920000111
wherein e istiRepresenting the information entropy of all target users in the initial group in the ith time period t; p (u)j) Representing a target user ujProbability of logging in at the i-th time period t. n is the number of all target users in the initial packet. P (u)j) The calculation method is as follows:
total number of check-ins by target user/total number of time slots.
And then, after entropy values of all time periods are calculated for all target users in the initial grouping, selecting the time period with the entropy value larger than a threshold value a as a login time period of the initial grouping.
By evaluating the login time periods of the group set, the number of time dimensions can be effectively reduced, such as reducing the day divided into 48 time periods to 24 time periods in the previous example.
The dimension reduction of the target user information is beneficial to reducing the calculation amount of subsequent operation.
In other embodiments of the present invention, after the login time period of the group set is calculated, the similarity between users in the group set is calculated in a new login time period. And (4) keeping users with the similarity greater than the threshold b, and taking the users with the similarity less than or equal to the threshold b as discrete users to be removed from the grouping set. The removed discrete users can be used as suspected login abnormal users, and other methods in the prior art are adopted to detect whether the users are abnormal users or not.
The abnormal user identification method provided by the embodiment of the invention is beneficial to reducing the calculation amount by reducing the dimension of the user information in the initial grouping, and improves the identification instantaneity while ensuring the identification effect of hiding the abnormal user.
Based on any of the above embodiments, fig. 2 is a structural diagram of an abnormal user identification apparatus according to an embodiment of the present invention, and as shown in fig. 2, the abnormal user identification apparatus according to the embodiment of the present invention includes:
a preliminary grouping module 201, configured to determine a central user from target users according to a similarity between the target users, and perform preliminary grouping on the target users according to a similarity between the central user and a user other than the central user in the target users;
a grouping adjustment module 202, configured to re-determine the central user according to the similarity between the users in the initial grouping, and re-group the target users according to the similarity between the re-determined central user and the users other than the re-determined central user in the target users;
a grouping determining module 203 for repeatedly executing the grouping adjusting step until the users included in each group do not change any more;
and the abnormal user identification module 204 is used for determining abnormal groups according to the number of the known abnormal users contained in each group, and determining the users in the abnormal groups as the abnormal users.
The abnormal user identification device provided by the embodiment of the invention realizes user clustering through the similarity between target users, and finds out the group where the abnormal user is located by utilizing the discovered abnormal user based on the characteristic that the abnormal user has similar behavior tracks, thereby discovering more hidden abnormal users, and having the advantages of high identification efficiency and strong identification capability.
Fig. 3 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 3: a processor (processor)310, a communication Interface (communication Interface)320, a memory (memory)330 and a communication bus 340, wherein the processor 310, the communication Interface 320 and the memory 330 communicate with each other via the communication bus 340. The processor 310 may call logic instructions in the memory 330 to perform the following method: an initial grouping step, namely determining a central user from target users according to the similarity between the target users, and performing initial grouping on the target users according to the similarity between the central user and users except the central user in the target users; a grouping adjustment step, namely re-determining the central user according to the similarity between the users in the initial grouping, and re-grouping the target users according to the similarity between the re-determined central user and the users except the re-determined central user in the target users; determining grouping, and repeatedly executing the grouping adjustment step until the users contained in each group do not change any more; and an abnormal user identification step, namely determining abnormal groups according to the number of the known abnormal users contained in each group, and determining the users in the abnormal groups as the abnormal users.
It should be noted that, when being implemented specifically, the electronic device in this embodiment may be a server, a PC, or other devices, as long as the structure includes the processor 310, the communication interface 320, the memory 330, and the communication bus 340 shown in fig. 3, where the processor 310, the communication interface 320, and the memory 330 complete mutual communication through the communication bus 340, and the processor 310 may call the logic instruction in the memory 330 to execute the above method. The embodiment does not limit the specific implementation form of the electronic device.
In addition, the logic instructions in the memory 330 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Further, embodiments of the present invention disclose a computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions, which when executed by a computer, the computer is capable of performing the methods provided by the above-mentioned method embodiments, for example, comprising: an initial grouping step, namely determining a central user from target users according to the similarity between the target users, and performing initial grouping on the target users according to the similarity between the central user and users except the central user in the target users; a grouping adjustment step, namely re-determining the central user according to the similarity between the users in the initial grouping, and re-grouping the target users according to the similarity between the re-determined central user and the users except the re-determined central user in the target users; determining grouping, and repeatedly executing the grouping adjustment step until the users contained in each group do not change any more; and an abnormal user identification step, namely determining abnormal groups according to the number of the known abnormal users contained in each group, and determining the users in the abnormal groups as the abnormal users.
In another aspect, an embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented by a processor to perform the method provided by the foregoing embodiments, for example, including: an initial grouping step, namely determining a central user from target users according to the similarity between the target users, and performing initial grouping on the target users according to the similarity between the central user and users except the central user in the target users; a grouping adjustment step, namely re-determining the central user according to the similarity between the users in the initial grouping, and re-grouping the target users according to the similarity between the re-determined central user and the users except the re-determined central user in the target users; determining grouping, and repeatedly executing the grouping adjustment step until the users contained in each group do not change any more; and an abnormal user identification step, namely determining abnormal groups according to the number of the known abnormal users contained in each group, and determining the users in the abnormal groups as the abnormal users.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. An abnormal user identification method is characterized by comprising the following steps:
an initial grouping step, namely determining a central user from target users according to the similarity between the target users, and performing initial grouping on the target users according to the similarity between the central user and users except the central user in the target users;
a grouping adjustment step, namely re-determining the central user according to the similarity between the users in the initial grouping, and re-grouping the target users according to the similarity between the re-determined central user and the users except the re-determined central user in the target users;
determining grouping, and repeatedly executing the grouping adjustment step until the users contained in each group do not change any more;
and an abnormal user identification step, namely determining abnormal groups according to the number of the known abnormal users contained in each group, and determining the users in the abnormal groups as the abnormal users.
2. The abnormal user identification method according to claim 1, wherein the similarity between the target users is a sum of similarities between any one of the target users and all users except the any one of the target users.
3. The abnormal user identification method according to claim 2, further comprising, before the initial grouping step:
calculating a similarity between any one of the target users and one of the target users other than the any user, the similarity including one or more of the following similarities: similarity in time dimension, similarity on user login platform and similarity on user login device;
summing the similarity between any user in the target users and all users except any user in the target users to obtain the sum of the similarity between any user in the target users and all users except any user in the target users.
4. The abnormal user identification method according to claim 2, wherein the determining a center user from the target users according to the similarity between the target users comprises:
determining a first user according to the maximum similarity between the target users;
determining the first user as a central user;
determining a second user according to a minimum similarity between a non-central user and the central user in the target users and a preset first threshold;
and determining the second user as a new central user, and returning to the step of determining the second user according to the minimum value of the similarity between the non-central user and the central user in the target users and a preset first threshold value, and repeating the steps until the number of the central users reaches a preset number threshold value.
5. The abnormal user identification method according to claim 4, wherein the determining the second user according to the minimum value of the similarity between the non-central user of the target user and the central user and a preset first threshold value comprises:
calculating the minimum similarity between the non-central user of the target user and the central user;
when the sum of the minimum similarity values between the first n non-central users of the target users and the central user is smaller than a first threshold value, and the sum of the minimum similarity values between the first n +1 non-central users of the target users and the central user is not smaller than the first threshold value, determining the n +1 non-central user as the second user;
wherein the first threshold is a random value between 0 and a first similarity sum, and the first similarity sum is a similarity minimum sum between all non-central users in the target users and the central user; n is a positive integer.
6. The abnormal user identification method according to claim 3, wherein the calculating the similarity between any one of the target users and one of the target users other than the any user comprises:
calculating the difference degree between a third user and a fourth user according to the login record of the third user and the login record of the fourth user;
the third user is any user in the target users, and the fourth user is one user except any user in the target users;
and calculating the similarity between the third user and the fourth user according to the difference between the third user and the fourth user.
7. The method for identifying an abnormal user according to claim 6, wherein the calculating the degree of difference between the third user and the fourth user according to the log-in record of the third user and the log-in record of the fourth user adopts the following formula:
Figure FDA0002471989910000031
wherein d (i, j) represents a degree of difference between the third user and the fourth user;
when the similarity includes a similarity in a time dimension, a parameter HiA record representing that a third user i logged in within a first time period; hjA record representing that a fourth user j logged in within a first time period;
when the similarity includes a similarity on a user login platform, a parameter HiA record representing a third user i logged in on the first platform; hjA record representing a fourth user j logged in on the first platform;
when the similarity includes a similarity on a user login device, a parameter HiA record representing that a third user i is logged in on the first device; hjA record representing a fourth user j logged in on the first device;
the calculating the similarity between the third user and the fourth user according to the difference between the third user and the fourth user adopts the following formula:
Figure FDA0002471989910000032
wherein sim (i, j) represents a similarity between the third user and the fourth user.
8. An abnormal user identification apparatus, comprising:
the preliminary grouping module is used for determining a central user from the target users according to the similarity between the target users and initially grouping the target users according to the similarity between the central user and the users except the central user in the target users;
a grouping adjustment module, configured to re-determine the central user according to a similarity between users in the initial grouping, and re-group the target users according to a similarity between the re-determined central user and a user other than the re-determined central user among the target users;
a grouping determining module for repeatedly executing the grouping adjusting step until the users contained in each group do not change any more;
and the abnormal user identification module is used for determining abnormal groups according to the number of the known abnormal users contained in each group and determining the users in the abnormal groups as the abnormal users.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program performs the steps of the method for abnormal user identification according to any of claims 1 to 7.
10. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method for abnormal user identification according to any one of claims 1 to 7.
CN202010351557.5A 2020-04-28 2020-04-28 Abnormal user identification method and device, electronic equipment and storage medium Active CN111586001B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010351557.5A CN111586001B (en) 2020-04-28 2020-04-28 Abnormal user identification method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010351557.5A CN111586001B (en) 2020-04-28 2020-04-28 Abnormal user identification method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111586001A true CN111586001A (en) 2020-08-25
CN111586001B CN111586001B (en) 2022-11-22

Family

ID=72120084

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010351557.5A Active CN111586001B (en) 2020-04-28 2020-04-28 Abnormal user identification method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111586001B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112163096A (en) * 2020-09-18 2021-01-01 中国建设银行股份有限公司 Malicious group determination method and device, electronic equipment and storage medium
CN112488175A (en) * 2020-11-26 2021-03-12 中孚安全技术有限公司 Abnormal user detection method based on behavior aggregation characteristics, terminal and storage medium
CN113521749A (en) * 2021-07-15 2021-10-22 珠海金山网络游戏科技有限公司 Abnormal account detection model training method and abnormal account detection method

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107579956A (en) * 2017-08-07 2018-01-12 北京奇安信科技有限公司 The detection method and device of a kind of user behavior
CN107730271A (en) * 2017-09-20 2018-02-23 北京奇艺世纪科技有限公司 Similar users based on virtual interacting object determine method, apparatus and electronic equipment
CA3003779A1 (en) * 2017-05-05 2018-11-05 Servicenow, Inc. Identifying clusters for service management operations
CN109873832A (en) * 2019-03-15 2019-06-11 北京三快在线科技有限公司 Method for recognizing flux, device, electronic equipment and storage medium
CN109873812A (en) * 2019-01-28 2019-06-11 腾讯科技(深圳)有限公司 Method for detecting abnormality, device and computer equipment
CN110225036A (en) * 2019-06-12 2019-09-10 北京奇艺世纪科技有限公司 A kind of account detection method, device, server and storage medium
CN110309424A (en) * 2019-07-04 2019-10-08 东北大学 A kind of socialization recommended method based on Rough clustering
CN110532429A (en) * 2019-09-04 2019-12-03 重庆邮电大学 It is a kind of based on cluster and correlation rule line on user group's classification method and device
CN110706092A (en) * 2019-09-23 2020-01-17 深圳中兴飞贷金融科技有限公司 Risk user identification method and device, storage medium and electronic equipment
CN110876072A (en) * 2018-08-31 2020-03-10 武汉斗鱼网络科技有限公司 Batch registered user identification method, storage medium, electronic device and system

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA3003779A1 (en) * 2017-05-05 2018-11-05 Servicenow, Inc. Identifying clusters for service management operations
CN107579956A (en) * 2017-08-07 2018-01-12 北京奇安信科技有限公司 The detection method and device of a kind of user behavior
CN107730271A (en) * 2017-09-20 2018-02-23 北京奇艺世纪科技有限公司 Similar users based on virtual interacting object determine method, apparatus and electronic equipment
CN110876072A (en) * 2018-08-31 2020-03-10 武汉斗鱼网络科技有限公司 Batch registered user identification method, storage medium, electronic device and system
CN109873812A (en) * 2019-01-28 2019-06-11 腾讯科技(深圳)有限公司 Method for detecting abnormality, device and computer equipment
CN109873832A (en) * 2019-03-15 2019-06-11 北京三快在线科技有限公司 Method for recognizing flux, device, electronic equipment and storage medium
CN110225036A (en) * 2019-06-12 2019-09-10 北京奇艺世纪科技有限公司 A kind of account detection method, device, server and storage medium
CN110309424A (en) * 2019-07-04 2019-10-08 东北大学 A kind of socialization recommended method based on Rough clustering
CN110532429A (en) * 2019-09-04 2019-12-03 重庆邮电大学 It is a kind of based on cluster and correlation rule line on user group's classification method and device
CN110706092A (en) * 2019-09-23 2020-01-17 深圳中兴飞贷金融科技有限公司 Risk user identification method and device, storage medium and electronic equipment

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112163096A (en) * 2020-09-18 2021-01-01 中国建设银行股份有限公司 Malicious group determination method and device, electronic equipment and storage medium
CN112488175A (en) * 2020-11-26 2021-03-12 中孚安全技术有限公司 Abnormal user detection method based on behavior aggregation characteristics, terminal and storage medium
CN112488175B (en) * 2020-11-26 2023-06-23 中孚安全技术有限公司 Abnormal user detection method based on behavior aggregation characteristics, terminal and storage medium
CN113521749A (en) * 2021-07-15 2021-10-22 珠海金山网络游戏科技有限公司 Abnormal account detection model training method and abnormal account detection method
CN113521749B (en) * 2021-07-15 2024-02-13 珠海金山数字网络科技有限公司 Abnormal account detection model training method and abnormal account detection method

Also Published As

Publication number Publication date
CN111586001B (en) 2022-11-22

Similar Documents

Publication Publication Date Title
CN111586001B (en) Abnormal user identification method and device, electronic equipment and storage medium
US11087329B2 (en) Method and apparatus of identifying a transaction risk
CN104090967B (en) Application program recommends method and recommendation apparatus
US10103942B2 (en) Computer processing method and system for network data
CN108881250B (en) Power communication network security situation prediction method, device, equipment and storage medium
RU2011148277A (en) NETWORK COMPUTING SYSTEM AND METHOD FOR SOLVING A COMPUTING PROBLEM
CN110807488A (en) Anomaly detection method and device based on user peer-to-peer group
CN110166344B (en) Identity identification method, device and related equipment
CN110224859B (en) Method and system for identifying a group
CN110033302A (en) The recognition methods of malice account and device
CN108366012B (en) Social relationship establishing method and device and electronic equipment
CN113726783B (en) Abnormal IP address identification method and device, electronic equipment and readable storage medium
US8930773B2 (en) Determining root cause
CN110876072B (en) Batch registered user identification method, storage medium, electronic device and system
CN115509875A (en) Server health degree evaluation method and device
CN114116705B (en) Method and device for determining contribution value of participants in joint learning
CN108076032A (en) A kind of abnormal behaviour user identification method and device
JP6600939B2 (en) Data classification device, data classification program, and data classification method
KR101928822B1 (en) System and method for computing a user's trust value of unknown device in IoT
US11468191B2 (en) Method and apparatus for identifying applets of risky content based on differential privacy preserving
CN113313505B (en) Abnormality positioning method and device and computing equipment
CN111026816B (en) High-net-value customer group identification method and device based on knowledge graph and storage medium
CN110933079A (en) Method and device for identifying fake MAC address group
JP6623564B2 (en) Data classification device, data classification program and data classification method
CN116305220B (en) Big data-based resource data processing method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant