CN111586001A - Abnormal user identification method and device, electronic equipment and storage medium - Google Patents
Abnormal user identification method and device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN111586001A CN111586001A CN202010351557.5A CN202010351557A CN111586001A CN 111586001 A CN111586001 A CN 111586001A CN 202010351557 A CN202010351557 A CN 202010351557A CN 111586001 A CN111586001 A CN 111586001A
- Authority
- CN
- China
- Prior art keywords
- user
- users
- similarity
- abnormal
- central
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000002159 abnormal effect Effects 0.000 title claims abstract description 140
- 238000000034 method Methods 0.000 title claims abstract description 51
- 230000008859 change Effects 0.000 claims abstract description 11
- 238000004590 computer program Methods 0.000 claims description 10
- 230000006399 behavior Effects 0.000 description 17
- 238000004891 communication Methods 0.000 description 10
- 238000004364 calculation method Methods 0.000 description 6
- 239000013598 vector Substances 0.000 description 6
- 238000010586 diagram Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 230000005856 abnormality Effects 0.000 description 1
- 230000002547 anomalous effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1425—Traffic logging, e.g. anomaly detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/552—Detecting local intrusion or implementing counter-measures involving long-term monitoring or reporting
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/16—Threshold monitoring
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/08—Network architectures or network communication protocols for network security for authentication of entities
- H04L63/0876—Network architectures or network communication protocols for network security for authentication of entities based on the identity of the terminal or configuration, e.g. MAC address, hardware or software configuration or device fingerprint
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1416—Event detection, e.g. attack signature detection
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Signal Processing (AREA)
- Computer Hardware Design (AREA)
- General Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Computing Systems (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Power Engineering (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the invention provides an abnormal user identification method, an abnormal user identification device, electronic equipment and a storage medium; the method comprises the following steps: an initial grouping step, namely determining a central user from target users according to the similarity between the target users, and performing initial grouping on the target users according to the similarity between the central user and users except the central user in the target users; a grouping adjustment step, namely re-determining the central user according to the similarity between the users in the initial grouping, and re-grouping the target users according to the similarity between the re-determined central user and the users except the re-determined central user in the target users; determining grouping, and repeatedly executing the grouping adjustment step until the users contained in each group do not change any more; and an abnormal user identification step, namely determining abnormal groups according to the number of the known abnormal users contained in each group, and determining the users in the abnormal groups as the abnormal users.
Description
Technical Field
The present invention relates to the field of network security, and in particular, to a method and an apparatus for identifying an abnormal user, an electronic device, and a storage medium.
Background
Abnormal login refers to login behavior that is significantly different from the daily habits of the user. Since abnormal login is a common phenomenon of network intrusion behavior, a user with the abnormal login behavior is likely to be an implementer of the network intrusion behavior, and therefore, the identification of the abnormal login user is of great significance in the field of network security.
In the prior art, an abnormal login user is usually discovered according to the login times of the user, an IP address used during login and a device used during login. However, when a network attacker uses a decentralized IP or analog device to implement login behavior, the abnormal login user identification method in the prior art will be difficult to find the abnormal user.
In summary, the method for identifying an abnormal login user in the prior art is difficult to discover a hidden abnormal user, and the efficiency of discovering the abnormal user is low.
Disclosure of Invention
The embodiment of the invention provides an abnormal user identification method, an abnormal user identification device, electronic equipment and a storage medium, which are used for solving the defects that a hidden abnormal user is difficult to find by an abnormal login user identification method in the prior art and the efficiency of finding the abnormal user is low.
An embodiment of a first aspect of the present invention provides an abnormal user identification method, including:
an initial grouping step, namely determining a central user from target users according to the similarity between the target users, and performing initial grouping on the target users according to the similarity between the central user and users except the central user in the target users;
a grouping adjustment step, namely re-determining the central user according to the similarity between the users in the initial grouping, and re-grouping the target users according to the similarity between the re-determined central user and the users except the re-determined central user in the target users;
determining grouping, and repeatedly executing the grouping adjustment step until the users contained in each group do not change any more;
and an abnormal user identification step, namely determining abnormal groups according to the number of the known abnormal users contained in each group, and determining the users in the abnormal groups as the abnormal users.
In the above technical solution, the similarity between the target users is a sum of similarities between any one of the target users and all users except the any one of the target users.
In the above technical solution, before the step of initially grouping, the method further includes:
calculating a similarity between any one of the target users and one of the target users other than the any user, the similarity including one or more of the following similarities: similarity in time dimension, similarity on user login platform and similarity on user login device;
summing the similarity between any user in the target users and all users except any user in the target users to obtain the sum of the similarity between any user in the target users and all users except any user in the target users.
In the above technical solution, the determining a center user from the target users according to the similarity between the target users includes:
determining a first user according to the maximum similarity between the target users;
determining the first user as a central user;
determining a second user according to a minimum similarity between a non-central user and the central user in the target users and a preset first threshold;
and determining the second user as a new central user, and returning to the step of determining the second user according to the minimum value of the similarity between the non-central user and the central user in the target users and a preset first threshold value, and repeating the steps until the number of the central users reaches a preset number threshold value.
In the above technical solution, the determining the second user according to the minimum value of the similarity between the non-central user of the target user and the central user and a preset first threshold includes:
calculating the minimum similarity between the non-central user of the target user and the central user;
when the sum of the minimum similarity values between the first n non-central users of the target users and the central user is smaller than a first threshold value, and the sum of the minimum similarity values between the first n +1 non-central users of the target users and the central user is not smaller than the first threshold value, determining the n +1 non-central user as the second user;
wherein the first threshold is a random value between 0 and a first similarity sum, and the first similarity sum is a similarity minimum sum between all non-central users in the target users and the central user; n is a natural number.
In the above technical solution, the calculating the similarity between any one of the target users and one of the target users other than the any user includes:
calculating the difference degree between a third user and a fourth user according to the login record of the third user and the login record of the fourth user;
the third user is any user in the target users, and the fourth user is one user except any user in the target users;
and calculating the similarity between the third user and the fourth user according to the difference between the third user and the fourth user.
In the above technical solution, the following formula is adopted for calculating the difference between the third user and the fourth user according to the login record of the third user and the login record of the fourth user:
wherein d (i, j) represents a degree of difference between the third user and the fourth user;
when the similarity includes a similarity in a time dimension, a parameter HiA record representing that a third user i logged in within a first time period; hjA record representing that a fourth user j logged in within a first time period;
when the similarity includes a similarity on a user login platform, a parameter HiA record representing a third user i logged in on the first platform; hjA record representing a fourth user j logged in on the first platform;
when the similarity includes a similarity on a user login device, a parameter HiA record representing that a third user i is logged in on the first device; hjA record representing a fourth user j logged in on the first device;
the calculating the similarity between the third user and the fourth user according to the difference between the third user and the fourth user adopts the following formula:
wherein sim (i, j) represents a similarity between the third user and the fourth user.
An embodiment of a second aspect of the present invention provides an abnormal user identification apparatus, including:
the preliminary grouping module is used for determining a central user from the target users according to the similarity between the target users and initially grouping the target users according to the similarity between the central user and the users except the central user in the target users;
a grouping adjustment module, configured to re-determine the central user according to a similarity between users in the initial grouping, and re-group the target users according to a similarity between the re-determined central user and a user other than the re-determined central user among the target users;
a grouping determining module for repeatedly executing the grouping adjusting step until the users contained in each group do not change any more;
and the abnormal user identification module is used for determining abnormal groups according to the number of the known abnormal users contained in each group and determining the users in the abnormal groups as the abnormal users.
In a third embodiment of the present invention, an electronic device is provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the steps of the method for identifying an abnormal user according to the first embodiment are implemented.
A fourth aspect of the present invention provides a non-transitory computer readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps of the abnormal user identification method according to the first aspect.
According to the abnormal user identification method, the abnormal user identification device, the electronic equipment and the storage medium, clustering of users is achieved through similarity among the users, and based on the characteristic that the abnormal users have similar behavior tracks, the groups where the abnormal users are located are found out by utilizing the found abnormal users, so that more hidden abnormal users are found, and the abnormal user identification method and the abnormal user identification device have the advantages of being high in identification efficiency and strong in identification capacity.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
Fig. 1 is a flowchart of an abnormal user identification method according to an embodiment of the present invention;
fig. 2 is a structural diagram of an abnormal user identification apparatus according to an embodiment of the present invention;
fig. 3 illustrates a physical structure diagram of an electronic device.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a flowchart of an abnormal user identification method according to an embodiment of the present invention, and as shown in fig. 1, the abnormal user identification method according to the embodiment of the present invention includes:
The basic idea of the abnormal user identification method provided by the embodiment of the invention is that based on the fact that abnormal users have similar behavior tracks, more hidden abnormal users can be found by using the found abnormal users. Based on this idea, data of a plurality of target users is first collected. Some of the plurality of target users have been identified as abnormal users, for example, by using a method of identifying abnormal users based on the login times of the users or the IP addresses used during login or the devices used during login in the prior art. However, most of the target users in the group fail to recognize their identities, which may be normal users or hidden abnormal users.
Before calculating the similarity of the target users, data of the target users are collected firstly. The collected data of the plurality of target users includes log logs of the target users. The log generally includes a large amount of behavior records of the target user, such as the login time of the target user, the login platform of the target user, and the device used by the target user during login. The target user's data may also include the target user's identity label information, such as anomalous users that have been identified, that may be marked in their identity label information.
These target users may form a set of users. Any target user in the set can calculate the similarity with other single target users in the set, and based on the similarity between one target user and other single target users, the similarity index between one user and other multiple target users can be further calculated. The similarity index reflects the overall similarity relationship between one user and other multiple target users. In the embodiment of the present invention, the similarity index is a sum of similarities. In other embodiments of the present invention, the similarity index may also be other expression modes such as a square sum of similarities.
In the embodiment of the invention, the similarity between one target user and other single target users in the set is added, so that the sum of the similarity between the target user and all other target users except the target user in the set is obtained. In the embodiment of the present invention, the method for calculating the similarity between users is not limited, and a method for calculating the similarity known to those skilled in the art may be used, or a method for calculating the similarity described in another embodiment of the present invention may be used. In the embodiment of the present invention, the time point of the similarity calculation is not limited, and the similarity calculation may be completed in advance before the step is executed, or may be completed in real time during the step.
A plurality of grouped center users may be determined from the plurality of target users based on a sum of similarities between any one of the plurality of target users and all other target users.
The number of the plurality of groups can be determined according to actual conditions, such as the number of target users and the like.
When determining the plurality of grouped center users, the center user C1 of the first group is selected as the target user with the highest similarity and similarity to other target users among all target users. The resulting C1 may be added to the set of packet centers.
Then, the target user with a greater similarity to C1 is selected as the center C2 of another group. When selecting C2, if the target user with the greatest similarity to C1 is selected, it is possible that the outlier user will have a noise effect, so the user with the greatest similarity to C1 is selected as C2. The selection process of the user with the greater similarity here is as follows: taking a random value which is greater than 0 and less than the sum of the similarity between the central user C1 and all other target users; after calculating the similarities between the individual target users except C1 and C1, the similarities are added, and in the adding process, the sum of the similarities of the current n target users is smaller than the random value, and the sum of the similarities of the (n + 1) th target user is larger than the random value, so that the (n + 1) th target user is C2. The resulting C2 may be added to the set of packet centers. Wherein n is a positive integer.
Then, the sum of the similarities of all the other target users except C1 and C2 and the group center set is calculated (when C2 is found before, the sum of the similarities of all the target users and C1 is calculated, which can be regarded as the sum of the similarities of all the target users and the group center set only including C1). At this time, the group center set includes two users, C1 and C2, but when the group center set is regarded as a whole and the similarity between another target user and the group center set is calculated, only one value needs to be calculated. The value is the minimum value of the similarity between another target user and each group center user in the group center set, namely: when calculating the similarity between a target user and the group center set, the similarity between the target user and C1 and C2 is calculated respectively, and then the smaller similarity value is taken as the similarity between the target user and the group center set.
After calculating the similarity between all target users except C1 and C2 and the grouping center set, another random value may be generated according to the sum of the similarities, where the random value is greater than 0 and less than the sum of the similarities between all target users except C1 and C2 and the grouping center set. Then, the respective similarities between all the other target users except C1 and C2 and the group center set are added, the sum of the similarities of the current n target users is smaller than the aforementioned another random value, and the sum of the similarities of the (n + 1) th target user is larger than the aforementioned another random value, so that the (n + 1) th target user is C3. The resulting C3 may be added to the set of packet centers.
And continuously selecting new grouping centers according to the method until the number of the grouping centers reaches K, wherein K is the number of the groups to be grouped.
After obtaining a plurality of grouped central users, calculating the similarity between the target user which is not determined as the grouped central user and the central user of each group, and distributing the target user to the group where the grouped central user with the highest similarity is located. The resulting packet here is the initial packet.
102, re-determining the central user according to the similarity between the users in the initial grouping, and re-grouping the target users according to the similarity between the re-determined central user and the users except the re-determined central user.
And 103, repeatedly executing the step 102 until the users contained in each group do not change any more.
After assigning the target user to each group, the target user may not be the best match to the group in which it is located. And therefore requires adjustments to the users contained in the packet.
In the adjustment, the central users of the individual groups are first redetermined. In the embodiment of the invention, the sum of the similarity between the target user and all other target users in the group is calculated, and the target user with the maximum value of the sum of the similarities is determined as the new central user of the group.
After the central users of each group are re-determined, the similarity between the target user which is not determined as the central user of the group and the new central user of each group is calculated, and the target user is distributed to the group where the central user of the group with the highest similarity is located.
The above process of adjusting the target users included in the groups needs to be performed iteratively until the users included in each group do not change any more.
And 104, determining abnormal groups according to the number of the known abnormal users contained in each group, and determining the users in the abnormal groups as the abnormal users.
It has been mentioned in the foregoing that, when acquiring basic information of users, some users have been identified as abnormal users, and therefore, after finally determining the users included in each group, the number of known abnormal users in each group can be acquired.
In view of the fact that abnormal users have similar behavior tracks, the group in which the number of abnormal users in one group is higher than the abnormal determination threshold can be determined as an abnormal user group, and the users in the abnormal user group are determined as abnormal users. It can be seen from the description of this step that hidden abnormal users in the abnormal user group that are not identified by other abnormal identification methods can be found by the method provided by the embodiment of the present invention.
In a group, after the number of abnormal users or the proportion of the abnormal users in the group is known, the number or the proportion is compared with an abnormal judgment threshold value, if the number or the proportion is higher than the abnormal judgment threshold value, the group is an abnormal group, and the users in the group are abnormal users.
The abnormality determination threshold may be a specific numerical value or a proportional value, and the specific value is determined according to the actual application condition.
The abnormal user identification method provided by the embodiment of the invention realizes the clustering of users through the similarity between target users, and finds out the group where the abnormal user is located by utilizing the discovered abnormal user based on the characteristic that the abnormal user has similar behavior tracks, thereby discovering more hidden abnormal users, and having the advantages of high identification efficiency and strong identification capability.
Based on any one of the above embodiments, in an embodiment of the present invention, before step 101, the method further includes:
calculating a similarity between any one of the target users and one of the target users other than the any user, the similarity including one or more of the following similarities: similarity in time dimension, similarity on user login platform and similarity on user login device;
summing the similarity between any user in the target users and all users except any user in the target users to obtain the sum of the similarity between any user in the target users and all users except any user in the target users.
In the embodiment of the present invention, the similarity index is a sum of similarities. When the similarity between one user and other single users in the user set is calculated, the similarity is calculated from the time dimension, the platform dimension and the equipment dimension respectively. The specific description is as follows.
(1) Similarity in time dimension
The login time of the user can correspond to a discrete time period set with a uniform length; each of which is of fixed length and equal. The specific length of the time period can be set according to needs, and the smaller the time period is, the more accurate the time period is, but the more sparse the data is.
In the present embodiment, the login time is divided into 48 consecutive time series in statistical units of days, i.e. each time period represents half an hour. Thus, a login time set of each user is obtained, and a user i corresponds to a check-in time set S in the time sequenceitComprises the following steps: { Si1,Si2,…Si48}. Wherein S isitAnd recording the login of the user i in the time period t, wherein if the user logs in, the value corresponding to the vector is 1, and otherwise, the value is 0. It should be noted that the login involved in the embodiment of the present invention is not limited to the time point of the login operation, but also includes all the usage time of the user after login and before logout. Similarly, a set S of check-in times of the user j in the same time can be obtainedjt:{Sj1,Sj2,…Sj48}. Accordingly, the similarity simt of the user i and the user j in the time dimension can be calculated:
wherein S isitA record representing that user i logged in during a time period t; sjtRepresenting a record of user j logging in during time period t.
It can be seen from the above formula that, as the difference d between the user i and the user j is smaller, the user similarity is higher.
(2) Similarity in platform dimensions
For user logged-in platform, P is usedi,pThe login behavior of user i on platform p is identified. When the user i has login behavior on the platform P, Pi,pThe value of (1) is 0 without logging. Accordingly, the login platform vector (P) corresponding to the m platforms logged in by the user i can be obtainedi1,Pi2,…,Pim). Similarly, a login platform vector (P) corresponding to the same m platforms logged in by the user j can be obtainedj1,Pj2,…,Pjm)。
From this, the similarity simp of user i and user j in the platform dimension can be calculated:
wherein, PipA record representing that user i is logged on platform p; pjpA record representing that user j is logged on platform p.
(3) Similarity in device dimensions
For devices on which the user logs in, use Ei,sThe login behavior of user i on device s is identified. When user i has login behavior on device s, then Ei,sThe value of (1) and the value of no login behavior is 0. Then the vector of user i logging in to the device is obtained (E)i1,Ei2,…,Eiw). Class ISimilarly, a login device vector (E) corresponding to the same w devices logged in by the user j can be obtainedj1,Ej2,…,Ejw)。
Accordingly, the similarity sime of the user i and the user j in the device dimension can be calculated:
wherein E isisA record representing that user i is logged in on device s; ejsRepresenting a record of user j logging on device s.
It should be noted that, because the types of devices are various, the device values of the device dimensions are relatively cluttered, and in the embodiment of the present invention, devices with a large number of users using devices in the existing network are mainly taken. For users using a device with a smaller number of users, the values of the vectors are all 0. For all 0 users, sim is 1.
After respective similarities in the time dimension, the platform dimension, and the device dimension are calculated, the similarities between users can be calculated. In the embodiment of the present invention, the similarity sim between users is obtained by averaging the similarities in three dimensions, that is, sim is 1/3 × (simt + simp + sime). In other embodiments of the present invention, a certain weight may be set for the similarity of the three dimensions according to actual needs, so as to calculate the similarity between users.
The similarity between one user and another user can be calculated through the above description, and the sum of the similarities between one user and all other users can be obtained by summing up the similarities between one user and all other users.
The abnormal user identification method provided by the embodiment of the invention obtains the similarity among users by calculating the similarity among the users in three dimensions of time dimension, equipment dimension and platform dimension, realizes the clustering of the users by utilizing the similarity among the users, and finds out the grouping where the abnormal users are based on the characteristic that the abnormal users have similar behavior tracks by utilizing the discovered abnormal users, thereby discovering more hidden abnormal users and having the advantages of high identification efficiency and strong identification capability.
Based on any one of the above embodiments, in an embodiment of the present invention, between step 101 and step 102, the method further includes:
and reducing the dimension of the information of the target user in the initial grouping.
In the embodiment of the invention, the dimension reduction of the information of the target user in the initial grouping is realized in the time dimension.
Since the target user has sparsity in the time dimension, after the initial grouping is obtained, a subset is selected for the time period set of the users in the initial grouping.
When the subset is selected, the information entropy of all users in the initial group in different time periods is calculated first. The calculation formula of the information entropy is as follows:
wherein e istiRepresenting the information entropy of all target users in the initial group in the ith time period t; p (u)j) Representing a target user ujProbability of logging in at the i-th time period t. n is the number of all target users in the initial packet. P (u)j) The calculation method is as follows:
total number of check-ins by target user/total number of time slots.
And then, after entropy values of all time periods are calculated for all target users in the initial grouping, selecting the time period with the entropy value larger than a threshold value a as a login time period of the initial grouping.
By evaluating the login time periods of the group set, the number of time dimensions can be effectively reduced, such as reducing the day divided into 48 time periods to 24 time periods in the previous example.
The dimension reduction of the target user information is beneficial to reducing the calculation amount of subsequent operation.
In other embodiments of the present invention, after the login time period of the group set is calculated, the similarity between users in the group set is calculated in a new login time period. And (4) keeping users with the similarity greater than the threshold b, and taking the users with the similarity less than or equal to the threshold b as discrete users to be removed from the grouping set. The removed discrete users can be used as suspected login abnormal users, and other methods in the prior art are adopted to detect whether the users are abnormal users or not.
The abnormal user identification method provided by the embodiment of the invention is beneficial to reducing the calculation amount by reducing the dimension of the user information in the initial grouping, and improves the identification instantaneity while ensuring the identification effect of hiding the abnormal user.
Based on any of the above embodiments, fig. 2 is a structural diagram of an abnormal user identification apparatus according to an embodiment of the present invention, and as shown in fig. 2, the abnormal user identification apparatus according to the embodiment of the present invention includes:
a preliminary grouping module 201, configured to determine a central user from target users according to a similarity between the target users, and perform preliminary grouping on the target users according to a similarity between the central user and a user other than the central user in the target users;
a grouping adjustment module 202, configured to re-determine the central user according to the similarity between the users in the initial grouping, and re-group the target users according to the similarity between the re-determined central user and the users other than the re-determined central user in the target users;
a grouping determining module 203 for repeatedly executing the grouping adjusting step until the users included in each group do not change any more;
and the abnormal user identification module 204 is used for determining abnormal groups according to the number of the known abnormal users contained in each group, and determining the users in the abnormal groups as the abnormal users.
The abnormal user identification device provided by the embodiment of the invention realizes user clustering through the similarity between target users, and finds out the group where the abnormal user is located by utilizing the discovered abnormal user based on the characteristic that the abnormal user has similar behavior tracks, thereby discovering more hidden abnormal users, and having the advantages of high identification efficiency and strong identification capability.
Fig. 3 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 3: a processor (processor)310, a communication Interface (communication Interface)320, a memory (memory)330 and a communication bus 340, wherein the processor 310, the communication Interface 320 and the memory 330 communicate with each other via the communication bus 340. The processor 310 may call logic instructions in the memory 330 to perform the following method: an initial grouping step, namely determining a central user from target users according to the similarity between the target users, and performing initial grouping on the target users according to the similarity between the central user and users except the central user in the target users; a grouping adjustment step, namely re-determining the central user according to the similarity between the users in the initial grouping, and re-grouping the target users according to the similarity between the re-determined central user and the users except the re-determined central user in the target users; determining grouping, and repeatedly executing the grouping adjustment step until the users contained in each group do not change any more; and an abnormal user identification step, namely determining abnormal groups according to the number of the known abnormal users contained in each group, and determining the users in the abnormal groups as the abnormal users.
It should be noted that, when being implemented specifically, the electronic device in this embodiment may be a server, a PC, or other devices, as long as the structure includes the processor 310, the communication interface 320, the memory 330, and the communication bus 340 shown in fig. 3, where the processor 310, the communication interface 320, and the memory 330 complete mutual communication through the communication bus 340, and the processor 310 may call the logic instruction in the memory 330 to execute the above method. The embodiment does not limit the specific implementation form of the electronic device.
In addition, the logic instructions in the memory 330 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Further, embodiments of the present invention disclose a computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions, which when executed by a computer, the computer is capable of performing the methods provided by the above-mentioned method embodiments, for example, comprising: an initial grouping step, namely determining a central user from target users according to the similarity between the target users, and performing initial grouping on the target users according to the similarity between the central user and users except the central user in the target users; a grouping adjustment step, namely re-determining the central user according to the similarity between the users in the initial grouping, and re-grouping the target users according to the similarity between the re-determined central user and the users except the re-determined central user in the target users; determining grouping, and repeatedly executing the grouping adjustment step until the users contained in each group do not change any more; and an abnormal user identification step, namely determining abnormal groups according to the number of the known abnormal users contained in each group, and determining the users in the abnormal groups as the abnormal users.
In another aspect, an embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented by a processor to perform the method provided by the foregoing embodiments, for example, including: an initial grouping step, namely determining a central user from target users according to the similarity between the target users, and performing initial grouping on the target users according to the similarity between the central user and users except the central user in the target users; a grouping adjustment step, namely re-determining the central user according to the similarity between the users in the initial grouping, and re-grouping the target users according to the similarity between the re-determined central user and the users except the re-determined central user in the target users; determining grouping, and repeatedly executing the grouping adjustment step until the users contained in each group do not change any more; and an abnormal user identification step, namely determining abnormal groups according to the number of the known abnormal users contained in each group, and determining the users in the abnormal groups as the abnormal users.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (10)
1. An abnormal user identification method is characterized by comprising the following steps:
an initial grouping step, namely determining a central user from target users according to the similarity between the target users, and performing initial grouping on the target users according to the similarity between the central user and users except the central user in the target users;
a grouping adjustment step, namely re-determining the central user according to the similarity between the users in the initial grouping, and re-grouping the target users according to the similarity between the re-determined central user and the users except the re-determined central user in the target users;
determining grouping, and repeatedly executing the grouping adjustment step until the users contained in each group do not change any more;
and an abnormal user identification step, namely determining abnormal groups according to the number of the known abnormal users contained in each group, and determining the users in the abnormal groups as the abnormal users.
2. The abnormal user identification method according to claim 1, wherein the similarity between the target users is a sum of similarities between any one of the target users and all users except the any one of the target users.
3. The abnormal user identification method according to claim 2, further comprising, before the initial grouping step:
calculating a similarity between any one of the target users and one of the target users other than the any user, the similarity including one or more of the following similarities: similarity in time dimension, similarity on user login platform and similarity on user login device;
summing the similarity between any user in the target users and all users except any user in the target users to obtain the sum of the similarity between any user in the target users and all users except any user in the target users.
4. The abnormal user identification method according to claim 2, wherein the determining a center user from the target users according to the similarity between the target users comprises:
determining a first user according to the maximum similarity between the target users;
determining the first user as a central user;
determining a second user according to a minimum similarity between a non-central user and the central user in the target users and a preset first threshold;
and determining the second user as a new central user, and returning to the step of determining the second user according to the minimum value of the similarity between the non-central user and the central user in the target users and a preset first threshold value, and repeating the steps until the number of the central users reaches a preset number threshold value.
5. The abnormal user identification method according to claim 4, wherein the determining the second user according to the minimum value of the similarity between the non-central user of the target user and the central user and a preset first threshold value comprises:
calculating the minimum similarity between the non-central user of the target user and the central user;
when the sum of the minimum similarity values between the first n non-central users of the target users and the central user is smaller than a first threshold value, and the sum of the minimum similarity values between the first n +1 non-central users of the target users and the central user is not smaller than the first threshold value, determining the n +1 non-central user as the second user;
wherein the first threshold is a random value between 0 and a first similarity sum, and the first similarity sum is a similarity minimum sum between all non-central users in the target users and the central user; n is a positive integer.
6. The abnormal user identification method according to claim 3, wherein the calculating the similarity between any one of the target users and one of the target users other than the any user comprises:
calculating the difference degree between a third user and a fourth user according to the login record of the third user and the login record of the fourth user;
the third user is any user in the target users, and the fourth user is one user except any user in the target users;
and calculating the similarity between the third user and the fourth user according to the difference between the third user and the fourth user.
7. The method for identifying an abnormal user according to claim 6, wherein the calculating the degree of difference between the third user and the fourth user according to the log-in record of the third user and the log-in record of the fourth user adopts the following formula:
wherein d (i, j) represents a degree of difference between the third user and the fourth user;
when the similarity includes a similarity in a time dimension, a parameter HiA record representing that a third user i logged in within a first time period; hjA record representing that a fourth user j logged in within a first time period;
when the similarity includes a similarity on a user login platform, a parameter HiA record representing a third user i logged in on the first platform; hjA record representing a fourth user j logged in on the first platform;
when the similarity includes a similarity on a user login device, a parameter HiA record representing that a third user i is logged in on the first device; hjA record representing a fourth user j logged in on the first device;
the calculating the similarity between the third user and the fourth user according to the difference between the third user and the fourth user adopts the following formula:
wherein sim (i, j) represents a similarity between the third user and the fourth user.
8. An abnormal user identification apparatus, comprising:
the preliminary grouping module is used for determining a central user from the target users according to the similarity between the target users and initially grouping the target users according to the similarity between the central user and the users except the central user in the target users;
a grouping adjustment module, configured to re-determine the central user according to a similarity between users in the initial grouping, and re-group the target users according to a similarity between the re-determined central user and a user other than the re-determined central user among the target users;
a grouping determining module for repeatedly executing the grouping adjusting step until the users contained in each group do not change any more;
and the abnormal user identification module is used for determining abnormal groups according to the number of the known abnormal users contained in each group and determining the users in the abnormal groups as the abnormal users.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program performs the steps of the method for abnormal user identification according to any of claims 1 to 7.
10. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method for abnormal user identification according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010351557.5A CN111586001B (en) | 2020-04-28 | 2020-04-28 | Abnormal user identification method and device, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010351557.5A CN111586001B (en) | 2020-04-28 | 2020-04-28 | Abnormal user identification method and device, electronic equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111586001A true CN111586001A (en) | 2020-08-25 |
CN111586001B CN111586001B (en) | 2022-11-22 |
Family
ID=72120084
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010351557.5A Active CN111586001B (en) | 2020-04-28 | 2020-04-28 | Abnormal user identification method and device, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111586001B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112163096A (en) * | 2020-09-18 | 2021-01-01 | 中国建设银行股份有限公司 | Malicious group determination method and device, electronic equipment and storage medium |
CN112488175A (en) * | 2020-11-26 | 2021-03-12 | 中孚安全技术有限公司 | Abnormal user detection method based on behavior aggregation characteristics, terminal and storage medium |
CN113521749A (en) * | 2021-07-15 | 2021-10-22 | 珠海金山网络游戏科技有限公司 | Abnormal account detection model training method and abnormal account detection method |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107579956A (en) * | 2017-08-07 | 2018-01-12 | 北京奇安信科技有限公司 | The detection method and device of a kind of user behavior |
CN107730271A (en) * | 2017-09-20 | 2018-02-23 | 北京奇艺世纪科技有限公司 | Similar users based on virtual interacting object determine method, apparatus and electronic equipment |
CA3003779A1 (en) * | 2017-05-05 | 2018-11-05 | Servicenow, Inc. | Identifying clusters for service management operations |
CN109873832A (en) * | 2019-03-15 | 2019-06-11 | 北京三快在线科技有限公司 | Method for recognizing flux, device, electronic equipment and storage medium |
CN109873812A (en) * | 2019-01-28 | 2019-06-11 | 腾讯科技(深圳)有限公司 | Method for detecting abnormality, device and computer equipment |
CN110225036A (en) * | 2019-06-12 | 2019-09-10 | 北京奇艺世纪科技有限公司 | A kind of account detection method, device, server and storage medium |
CN110309424A (en) * | 2019-07-04 | 2019-10-08 | 东北大学 | A kind of socialization recommended method based on Rough clustering |
CN110532429A (en) * | 2019-09-04 | 2019-12-03 | 重庆邮电大学 | It is a kind of based on cluster and correlation rule line on user group's classification method and device |
CN110706092A (en) * | 2019-09-23 | 2020-01-17 | 深圳中兴飞贷金融科技有限公司 | Risk user identification method and device, storage medium and electronic equipment |
CN110876072A (en) * | 2018-08-31 | 2020-03-10 | 武汉斗鱼网络科技有限公司 | Batch registered user identification method, storage medium, electronic device and system |
-
2020
- 2020-04-28 CN CN202010351557.5A patent/CN111586001B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA3003779A1 (en) * | 2017-05-05 | 2018-11-05 | Servicenow, Inc. | Identifying clusters for service management operations |
CN107579956A (en) * | 2017-08-07 | 2018-01-12 | 北京奇安信科技有限公司 | The detection method and device of a kind of user behavior |
CN107730271A (en) * | 2017-09-20 | 2018-02-23 | 北京奇艺世纪科技有限公司 | Similar users based on virtual interacting object determine method, apparatus and electronic equipment |
CN110876072A (en) * | 2018-08-31 | 2020-03-10 | 武汉斗鱼网络科技有限公司 | Batch registered user identification method, storage medium, electronic device and system |
CN109873812A (en) * | 2019-01-28 | 2019-06-11 | 腾讯科技(深圳)有限公司 | Method for detecting abnormality, device and computer equipment |
CN109873832A (en) * | 2019-03-15 | 2019-06-11 | 北京三快在线科技有限公司 | Method for recognizing flux, device, electronic equipment and storage medium |
CN110225036A (en) * | 2019-06-12 | 2019-09-10 | 北京奇艺世纪科技有限公司 | A kind of account detection method, device, server and storage medium |
CN110309424A (en) * | 2019-07-04 | 2019-10-08 | 东北大学 | A kind of socialization recommended method based on Rough clustering |
CN110532429A (en) * | 2019-09-04 | 2019-12-03 | 重庆邮电大学 | It is a kind of based on cluster and correlation rule line on user group's classification method and device |
CN110706092A (en) * | 2019-09-23 | 2020-01-17 | 深圳中兴飞贷金融科技有限公司 | Risk user identification method and device, storage medium and electronic equipment |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112163096A (en) * | 2020-09-18 | 2021-01-01 | 中国建设银行股份有限公司 | Malicious group determination method and device, electronic equipment and storage medium |
CN112488175A (en) * | 2020-11-26 | 2021-03-12 | 中孚安全技术有限公司 | Abnormal user detection method based on behavior aggregation characteristics, terminal and storage medium |
CN112488175B (en) * | 2020-11-26 | 2023-06-23 | 中孚安全技术有限公司 | Abnormal user detection method based on behavior aggregation characteristics, terminal and storage medium |
CN113521749A (en) * | 2021-07-15 | 2021-10-22 | 珠海金山网络游戏科技有限公司 | Abnormal account detection model training method and abnormal account detection method |
CN113521749B (en) * | 2021-07-15 | 2024-02-13 | 珠海金山数字网络科技有限公司 | Abnormal account detection model training method and abnormal account detection method |
Also Published As
Publication number | Publication date |
---|---|
CN111586001B (en) | 2022-11-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111586001B (en) | Abnormal user identification method and device, electronic equipment and storage medium | |
US11087329B2 (en) | Method and apparatus of identifying a transaction risk | |
CN104090967B (en) | Application program recommends method and recommendation apparatus | |
US10103942B2 (en) | Computer processing method and system for network data | |
CN108881250B (en) | Power communication network security situation prediction method, device, equipment and storage medium | |
RU2011148277A (en) | NETWORK COMPUTING SYSTEM AND METHOD FOR SOLVING A COMPUTING PROBLEM | |
CN110807488A (en) | Anomaly detection method and device based on user peer-to-peer group | |
CN110166344B (en) | Identity identification method, device and related equipment | |
CN110224859B (en) | Method and system for identifying a group | |
CN110033302A (en) | The recognition methods of malice account and device | |
CN108366012B (en) | Social relationship establishing method and device and electronic equipment | |
CN113726783B (en) | Abnormal IP address identification method and device, electronic equipment and readable storage medium | |
US8930773B2 (en) | Determining root cause | |
CN110876072B (en) | Batch registered user identification method, storage medium, electronic device and system | |
CN115509875A (en) | Server health degree evaluation method and device | |
CN114116705B (en) | Method and device for determining contribution value of participants in joint learning | |
CN108076032A (en) | A kind of abnormal behaviour user identification method and device | |
JP6600939B2 (en) | Data classification device, data classification program, and data classification method | |
KR101928822B1 (en) | System and method for computing a user's trust value of unknown device in IoT | |
US11468191B2 (en) | Method and apparatus for identifying applets of risky content based on differential privacy preserving | |
CN113313505B (en) | Abnormality positioning method and device and computing equipment | |
CN111026816B (en) | High-net-value customer group identification method and device based on knowledge graph and storage medium | |
CN110933079A (en) | Method and device for identifying fake MAC address group | |
JP6623564B2 (en) | Data classification device, data classification program and data classification method | |
CN116305220B (en) | Big data-based resource data processing method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |