CN110876072A - Batch registered user identification method, storage medium, electronic device and system - Google Patents

Batch registered user identification method, storage medium, electronic device and system Download PDF

Info

Publication number
CN110876072A
CN110876072A CN201811014021.3A CN201811014021A CN110876072A CN 110876072 A CN110876072 A CN 110876072A CN 201811014021 A CN201811014021 A CN 201811014021A CN 110876072 A CN110876072 A CN 110876072A
Authority
CN
China
Prior art keywords
user
user account
similarity
suspicion
registration
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811014021.3A
Other languages
Chinese (zh)
Other versions
CN110876072B (en
Inventor
王璐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Douyu Network Technology Co Ltd
Original Assignee
Wuhan Douyu Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Douyu Network Technology Co Ltd filed Critical Wuhan Douyu Network Technology Co Ltd
Priority to CN201811014021.3A priority Critical patent/CN110876072B/en
Publication of CN110876072A publication Critical patent/CN110876072A/en
Application granted granted Critical
Publication of CN110876072B publication Critical patent/CN110876072B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/258Client or end-user data management, e.g. managing client capabilities, user preferences or demographics, processing of multiple end-users preferences to derive collaborative data
    • H04N21/25866Management of end-user data

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Computer Graphics (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method, a storage medium, electronic equipment and a system for identifying users registered in batch, which relate to the field of big data wind control, and the method comprises the following steps: acquiring a user account set; in the user account set, similarity calculation is carried out on any user account and other user accounts to obtain similarity w; constructing a user structure map according to each user account and each similarity w in the user account set; clustering the user structure atlas to obtain a plurality of user clusters M; calculating the registration suspicion score of each user account to be identified according to each user cluster M; and when the batch registration suspicion score exceeds a preset batch registration suspicion threshold, indicating that the batch registration operation exists for the corresponding user account to be identified. The invention identifies the user account, judges whether the batch registration behavior exists, has high identification efficiency and provides help for the staff to manage the user account.

Description

Batch registered user identification method, storage medium, electronic device and system
Technical Field
The invention relates to the field of big data wind control, in particular to a batch registered user identification method, a storage medium, electronic equipment and a system.
Background
With the development of live broadcast, the content related to live broadcast is more and more extensive, more and more people watching the live broadcast are provided, more and more user accounts are generated by registration, but some batch registered user accounts are rarely existed in a large number of user accounts, and the batch registered user accounts are usually used for abnormal operations such as malicious comment brushing or attention brushing, so that a live broadcast room platform needs to identify the user accounts, screen out the batch registered user accounts, and take corresponding processing measures for the batch registered user accounts;
the traditional batch registration identification means mostly set multiple attribute indexes for workers and manually identify the user account, and the method has low efficiency, complex operation procedures and large task amount and brings inconvenience to the identification work of the user account;
therefore, a new batch registration user identification method is urgently needed to improve the efficiency of user account identification work.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a batch registration user identification method, which is used for identifying user accounts and judging whether batch registration behaviors exist, is high in identification efficiency and provides help for workers to manage the user accounts.
In order to achieve the above purposes, the technical scheme adopted by the invention is as follows:
in a first aspect, the present invention provides a method for identifying users registered in batch, comprising the following steps:
acquiring a user account set, wherein the user account set comprises a suspected user account set known to be in batch registration suspicion and a to-be-identified user account set to be identified whether to be in batch registration suspicion or not;
in the user account set, similarity calculation is carried out on any user account and other user accounts to obtain similarity w;
constructing a user structure map according to each user account and each similarity w in the user account set;
clustering the user structure atlas to obtain a plurality of user clusters M;
calculating the registration suspicion score of each user account to be identified according to each user cluster M;
and when the batch registration suspicion score exceeds a preset batch registration suspicion threshold, indicating that the batch registration operation exists for the corresponding user account to be identified.
On the basis of the technical scheme, in the user account set, similarity calculation is carried out on any user account and other user accounts, and when the similarity w is obtained, a customized similarity formula is applied, wherein the customized similarity formula is as follows:
Figure BDA0001785714510000021
wherein: w is auvIs the similarity score between user u and user v;
nickuuser u registers nickname text, nickvIs user v registers nickname text;
edit(nicku,nickv) The user u registers the edit distance between the nickname text and the nickname text registered by v;
len(nicku) User u registers nickname text length, len (nick)v) The length of the nickname text registered by the user v;
x is an attribute index which is possessed by the registered account u and the registered account v in advance, and the attribute index is at least one or more of IP (Internet protocol) usage, equipment usage or registration time;
xuiis the i-th attribute index, x, associated with the registered account uviIs the ith attribute index related to the registered account v, and the total number of the attribute indexes is N;
I(xui=xvi) Is an indicative function if xui=xviThen xui=xviTaking 1, otherwise, taking 0;
wi(i is 1,2) is a weight coefficient, ranges from 0 to 1, and satisfies
Figure BDA0001785714510000031
On the basis of the technical scheme, the method for constructing the user structure map according to each user account and each similarity w in the user account set specifically comprises the following steps:
a1, each user account in the user account set is used as a node;
a2, setting a similarity threshold, and connecting the nodes represented by the two user accounts when the similarity w of the two user accounts exceeds the preset similarity threshold to form a user structure map.
On the basis of the technical scheme, the clustering process is carried out on the user structure atlas to obtain a plurality of user clusters M, and the method specifically comprises the following steps:
b1, calculating k-order common neighbors of user accounts at two ends of each connection line in a user structure graph, merging the k-order common neighbors of two user accounts of the same connection line as a neighbor set C, and obtaining a plurality of neighbor sets C, wherein the k-order common neighbors are all node sets of which the number of vertices required by reaching a user is less than k, and k is a positive integer;
and B2, aggregating the neighbor sets C to obtain a plurality of user clusters M.
On the basis of the above technical solution, when the neighbor sets C are aggregated, the specific operations are as follows:
select any 2 neighbor sets CpAnd CqWhen is coming into contact with
Figure BDA0001785714510000041
When it is, then pair CpAnd CqAggregating to obtain user cluster Mn
Wherein, | Cp∩CqL is CpAnd CqNumber of users in public, | Cp∪CqL is CpAnd CqThe number of all users.
On the basis of the technical scheme, the registration suspicion score of each user account to be identified is calculated according to each user cluster M and applied to a custom suspicion formula, wherein the custom suspicion formula is as follows:
Figure BDA0001785714510000042
wherein S isuRegistering suspicion score, Z, for user u in batchesuA set of suspected user accounts of the user u in each user cluster M where the user u is located, i is a set ZuAny element of (1), | ZuIs set ZuNumber of elements of (2), wiuIs the similarity score between user u and user i.
In a second aspect, the present invention further provides a storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements the batch registered user identification method.
In a third aspect, the present invention further provides an electronic device, including a memory and a processor, where the memory stores a computer program running on the processor, and the electronic device is characterized in that: the processor realizes the batch registered user identification method when executing the computer program.
In a fourth aspect, the present invention further provides a system for batch registration of subscriber identities, comprising:
the account acquisition unit is used for acquiring a user account set, wherein the user account set comprises a suspected user account set known to be in batch registration suspicion and a to-be-identified user account set to be identified whether to be identified or not;
the similarity calculation unit is used for performing similarity calculation on any user account and other user accounts in the user account set to obtain similarity w;
the user structure map building unit is used for building a user structure map according to each user account and each similarity w in the user account set, using each user account as a node in the user structure map, and connecting the nodes represented by the two user accounts when the similarity w of the two user accounts exceeds a preset similarity threshold;
the user cluster acquisition unit is used for carrying out clustering processing on the user structure atlas to obtain a plurality of user clusters M;
and the batch registration suspicion calculating unit is used for calculating the registration suspicion score of each user account to be identified according to each user cluster M, and when the batch registration suspicion score exceeds a preset batch registration suspicion threshold, the batch registration suspicion calculating unit indicates that the corresponding user account to be identified has batch registration operation.
On the basis of the technical scheme, when the batch registration suspicion calculation unit calculates the registration suspicion score of each user account to be identified according to each user cluster M, the batch registration suspicion calculation unit applies to a custom suspicion formula, wherein the custom suspicion formula is as follows:
Figure BDA0001785714510000051
wherein S isuRegistering suspicion score, Z, for user u in batchesuA set of suspected user accounts of the user u in each user cluster M where the user u is located, i is a set ZuAny element of (1), | ZuIs set ZuNumber of elements of (2), wiuIs the similarity score between user u and user i.
Compared with the prior art, the invention has the advantages that:
(1) according to the method, the similarity among users and the user structure spectrogram constructed according to the similarity are taken as the basis, the suspected user account set with the confirmed batch registration behaviors is taken as a comparison reference, the batch registration behaviors of all the users are identified, and the users with the batch registration behaviors are identified.
Drawings
FIG. 1 is a flow chart of a method for batch registration of subscriber identification in accordance with the present invention;
FIG. 2 is a flowchart of a method for constructing a user structure graph in a batch registration user identification method according to the present invention;
FIG. 3 is a flowchart of obtaining a user cluster M in a batch registration user identification method according to the present invention;
FIG. 4 is a schematic diagram of a batch registration subscriber identification system according to the present invention;
in the figure: 1. an account number obtaining unit; 2. a similarity calculation unit; 3. a user structure map construction unit; 4. a user cluster acquisition unit; 5. and registering the suspected computing units in batches.
Detailed Description
Embodiments of the present invention will be described in further detail below with reference to the accompanying drawings.
The embodiment of the invention provides a batch registration user identification method, a storage medium, electronic equipment and a system, wherein the method is characterized in that the similarity among users and a user structure spectrogram constructed according to the similarity are taken as a basis, and then a suspected user account set which is confirmed to have batch registration behaviors is taken as a comparison standard to identify the batch registration behaviors of all users and identify the users having the batch registration behaviors.
In order to achieve the technical effects, the general idea of the application is as follows:
a method for batch registration of subscriber identification, comprising the steps of:
s1, acquiring a user account set, wherein the user account set comprises a suspected user account set known to have batch registration suspicion and a to-be-identified user account set to be identified whether to have batch registration suspicion;
s2, in the user account set, similarity calculation is carried out on any user account and other user accounts to obtain similarity w;
s3, constructing a user structure map according to each user account and each similarity w in the user account set;
s4, clustering the user structure map to obtain a plurality of user clusters M;
s5, calculating the registration suspicion score of each user account to be identified according to each user cluster M;
and S6, when the suspicion score of batch registration exceeds a preset suspicion threshold of batch registration, indicating that the batch registration operation exists for the corresponding user account to be identified.
Example one
Referring to fig. 1 to fig. 3, an embodiment of the present invention provides a method for identifying users registered in batch, including the following steps:
s1, acquiring a user account set, wherein the user account set comprises a suspected user account set known to have batch registration suspicion and a to-be-identified user account set to be identified whether to have batch registration suspicion;
s2, in the user account set, similarity calculation is carried out on any user account and other user accounts to obtain similarity w;
preferably, in the embodiment of the present invention, in the user account set, similarity calculation is performed between any user account and other user accounts to obtain similarity w, that is, when the similarity w between any two accounts in the user account set is calculated, a customized similarity formula may be applied, where the customized similarity formula is:
Figure BDA0001785714510000081
wherein: w is auvIs the similarity score between user u and user v;
nickuuser u registers nickname text, nickvIs user v registers nickname text;
edit(nicku,nickv) The user u registers the edit distance between the nickname text and the nickname text registered by v;
len(nicku) User u registers nickname text length, len (nick)v) The length of the nickname text registered by the user v;
x is an attribute index which is possessed by the registered account u and the registered account v in advance, and the attribute index is at least one or more of IP (Internet protocol) usage, equipment usage or registration time;
xuiis the i-th attribute index, x, associated with the registered account uviIs the ith attribute index related to the registered account v, and the total number of the attribute indexes is N;
I(xui=xvi) Is an indicative function if xui=xviThen xui=xviTaking 1, otherwise, taking 0;
wi(i is 1,2) is a weight coefficient, ranges from 0 to 1, and satisfies
Figure BDA0001785714510000082
Note that it is bit (nick)u,nickv) The method is characterized in that the editing distance between u registered nickname text and v registered nickname text of a user, namely Levenshtein distance, is the minimum number of editing operations required for converting one character into another character between two character strings, wherein permitted editing operations comprise replacing one character with another character, inserting one character and deleting one character, and generally, the smaller the editing distance is, the greater the similarity of the two character strings is;
here, in the embodiment of the present invention, a specific example is given:
the method comprises the steps that two user accounts are assumed, the edit distance between the registration nickname of the account A and the registration nickname of the account B is 2, the length of the nickname of the account A is 5, and the degree of the nickname of the account B is 4; consider 4 attribute indexes, where two of A and B are the same; the weighting coefficients are each 0.5, and the similarity between a and B is:
0.5*(2/max(4,5))+0.5*(2/4)=0.45。
s3, constructing a user structure map according to each user account and each similarity w in the user account set;
specifically, in the embodiment of the present invention, the step of constructing the user structure graph specifically includes the following steps:
a1, each user account in the user account set is used as a node;
a2, setting a similarity threshold, and connecting the nodes represented by the two user accounts when the similarity w of the two user accounts exceeds the preset similarity threshold to form a user structure map;
s4, clustering the user structure map to obtain a plurality of user clusters M;
specifically, in the embodiment of the present invention, the step of obtaining the user cluster M specifically includes the following steps:
b1, calculating k-order common neighbors of user accounts at two ends of each connection line in the user structure map, merging the k-order common neighbors of two user accounts of the same connection line as a neighbor set C, and obtaining a plurality of neighbor sets C, wherein the k-order common neighbors are all node sets of which the number of vertices required by reaching a user is less than k, and k is a positive integer;
and B2, aggregating the neighbor sets C to obtain a plurality of user clusters M.
Specifically, in the embodiment of the present invention, during the specific operation, since the similarity w is theoretically a decimal smaller than 1, the preset similarity threshold, which can be written as η, may be set to 0.1;
and the k value of the k-order common neighbor can be selected in different quantities according to the number of the user accounts in the user account set, and the value of k is 5-10 in normal operation.
Preferably, in the embodiment of the present invention, when aggregating each neighbor set C, the specific operations are:
select any 2 neighbor sets CpAnd CqWhen is coming into contact with
Figure BDA0001785714510000101
When it is, then pair CpAnd CqAggregating to obtain user cluster Mn
Wherein, | Cp∩CqL is CpAnd CqNumber of users in public, | Cp∪CqL is CpAnd CqThe number of all users;
the neighbor sets are clustered to obtain user clusters, the number of sets can be reduced, the workload of calculating the registration suspicion score is reduced, and the accuracy is improved.
S5, calculating the registration suspicion score of each user account to be identified according to each user cluster M;
specifically, in the embodiment of the present invention, when the registration suspicion score of each user account to be identified is calculated according to each user cluster M, the registration suspicion score is applied to a custom suspicion formula, where the custom suspicion formula is:
Figure BDA0001785714510000102
wherein S isuRegistering suspicion score, Z, for user u in batchesuA set of suspected user accounts of the user u in each user cluster M where the user u is located, i is a set ZuAny element of (1), | ZuIs set ZuNumber of elements of (2), wiuA similarity score between user u and user i;
in example of the present invention, ZuThe obtaining mode is that each user cluster M of the users u is counted firstly, and then the account numbers of the suspected users in the user clusters M are concentrated to form a set Zu
And S6, when the suspicion score of batch registration exceeds a preset suspicion threshold of batch registration, indicating that the batch registration operation exists for the corresponding user account to be identified.
In the embodiment of the invention, a worker acquires a user account set, wherein the user account set can be a user account set of a newly registered user of a server on the same day or a set of a plurality of user accounts acquired in other manners, the user account set comprises a plurality of user accounts, the user accounts in the user account set can be respectively classified into a suspected user account set or a to-be-identified user account set, the suspected user account set comprises user accounts with suspicion of batch registration, namely suspected user accounts, and the to-be-identified user account set comprises to-be-identified user accounts with suspicion of batch registration or not;
furthermore, in the user account set, similarity calculation is carried out on any user account and other user accounts to obtain similarity w, so that the similarity of each user account and other user accounts is obtained;
then, constructing a user structure map by each user account and each similarity w in the user account set, taking each user account as a node in the user structure map, connecting the nodes represented by the two user accounts when the similarity w of the two user accounts exceeds a preset similarity threshold, and taking the user structure map as prepared data for subsequent user clustering;
then, clustering the user structure maps to obtain a plurality of user clustering M structure maps, and clustering to obtain a plurality of user clustering M;
and finally, presetting a batch registration suspicion threshold, and when the batch registration suspicion score exceeds the preset batch registration suspicion threshold, indicating that the batch registration operation exists for the corresponding user account to be identified.
In the embodiment of the invention, the similarity among users and a user structure spectrogram constructed according to the similarity are taken as a basis, a suspected user account set with the confirmed batch registration behaviors is taken as a comparison reference, the batch registration behaviors of all users are identified, and the users with the batch registration behaviors are identified.
It should be noted that the suspected user account in the suspected user account set and the to-be-identified user account in the to-be-identified user account set are equal to the user account set in the user account set.
Based on the same inventive concept, the application provides a batch registered user identification system corresponding to the embodiment, which is detailed in the second embodiment.
Example two
As shown in fig. 4, a second embodiment of the present invention further provides a batch registered subscriber identification system, which includes:
the account acquisition unit 1 is used for acquiring a user account set, wherein the user account set comprises a suspected user account set known to be in batch registration suspicion and a to-be-identified user account set to be identified whether to be identified or not;
the similarity calculation unit 2 is configured to perform similarity calculation on any user account and other user accounts in the user account set to obtain a similarity w;
the user structure map building unit 3 is used for building a user structure map according to each user account in the user account set and each similarity w;
a user cluster obtaining unit 4, configured to perform clustering processing on the user structure atlas to obtain a plurality of user clusters M;
and the batch registration suspicion calculating unit 5 is used for calculating the registration suspicion score of each user account to be identified according to each user cluster M, and when the batch registration suspicion score exceeds a preset batch registration suspicion threshold, it indicates that the batch registration operation exists for the corresponding user account to be identified.
In the embodiment of the invention, a worker acquires a user account set, wherein the user account set can be a user account set of a newly registered user of a server on the same day or a set of a plurality of user accounts acquired in other manners, the user account set comprises a plurality of user accounts, the user accounts in the user account set can be respectively classified into a suspected user account set or a to-be-identified user account set, the suspected user account set comprises user accounts with suspicion of batch registration, namely suspected user accounts, and the to-be-identified user account set comprises to-be-identified user accounts with suspicion of batch registration or not;
furthermore, in the user account set, similarity calculation is carried out on any user account and other user accounts to obtain similarity w, so that the similarity of each user account and other user accounts is obtained;
then, constructing a user structure map by each user account and each similarity w in the user account set, taking each user account as a node in the user structure map, connecting the nodes represented by the two user accounts when the similarity w of the two user accounts exceeds a preset similarity threshold, and taking the user structure map as prepared data for subsequent user clustering;
then, clustering the user structure maps to obtain a plurality of user clustering M structure maps, and clustering to obtain a plurality of user clustering M;
and finally, presetting a batch registration suspicion threshold, and when the batch registration suspicion score exceeds the preset batch registration suspicion threshold, indicating that the batch registration operation exists for the corresponding user account to be identified.
In the embodiment of the invention, the similarity among users and a user structure spectrogram constructed according to the similarity are taken as a basis, a suspected user account set with the confirmed batch registration behaviors is taken as a comparison reference, the batch registration behaviors of all users are identified, and the users with the batch registration behaviors are identified.
It should be noted that the suspected user account in the suspected user account set and the to-be-identified user account in the to-be-identified user account set are equal to the user account set in the user account set.
Preferably, in the embodiment of the present invention, in the similarity calculation unit 2, in the user account set, similarity calculation is performed between any user account and other user accounts to obtain the similarity w;
namely, when calculating the similarity w between any two accounts in the user account set, a user-defined similarity formula is applied, and the user-defined similarity formula is as follows:
Figure BDA0001785714510000141
wherein: w is auvIs the similarity score between user u and user v;
nickuuser u registers nickname text, nickvIs user v registers nickname text;
edit(nicku,nickv) The user u registers the edit distance between the nickname text and the nickname text registered by v;
len(nicku) User u registers nickname text length, len (nick)v) The length of the nickname text registered by the user v;
x is an attribute index which is possessed by the registered account u and the registered account v in advance, and the attribute index is at least one or more of IP (Internet protocol) usage, equipment usage or registration time;
xuiis the i-th attribute index, x, associated with the registered account uviIs the ith attribute index related to the registered account v, and the total number of the attribute indexes is N;
I(xui=xvi) Is an indicative function if xui=xviThen xui=xviTaking 1, otherwise, taking 0;
wi(i is 1,2) is a weight coefficient, ranges from 0 to 1, and satisfies
Figure BDA0001785714510000151
Note that it is bit (nick)u,nickv) The editing distance between the u registered nickname text and the v registered nickname text of the user is called Levenshtein distance, which is the minimum number of editing operations required for converting one character into another character between two character strings, wherein the permitted editing operations comprise replacing one character with another character, inserting one character and deleting one character, and generally, the smaller the editing distance is, the greater the similarity of the two character strings is.
Specifically, in the embodiment of the present invention, the process of constructing the user structure graph by the user structure graph constructing unit 3 specifically includes the following steps:
a1, each user account in the user account set is used as a node;
a2, setting a similarity threshold, and connecting the nodes represented by the two user accounts when the similarity w of the two user accounts exceeds the preset similarity threshold to form a user structure map.
Specifically, in the embodiment of the present invention, the process of acquiring the user cluster M by the user cluster acquiring unit 4 specifically includes the following steps: b1, calculating k-order common neighbors of user accounts at two ends of each connection line in the user structure map, merging the k-order common neighbors of two user accounts of the same connection line as a neighbor set C, and obtaining a plurality of neighbor sets C, wherein the k-order common neighbors are all node sets of which the number of vertices required by reaching a user is less than k, and k is a positive integer;
and B2, aggregating the neighbor sets C to obtain a plurality of user clusters M.
Specifically, in the embodiment of the present invention, during the specific operation, since the similarity w is theoretically a decimal smaller than 1, the preset similarity threshold, which can be written as η, may be set to 0.1;
and the k value of the k-order common neighbor can be selected in different quantities according to the number of the user accounts in the user account set, and the value of k is 5-10 in normal operation.
Preferably, in the embodiment of the present invention, when the user cluster obtaining unit 4 performs aggregation on each neighbor set C, the specific operations are as follows:
select any 2 neighbor sets CpAnd CqWhen is coming into contact with
Figure BDA0001785714510000161
When it is, then pair CpAnd CqAggregating to obtain user cluster Mn
Wherein, | Cp∩CqL is CpAnd CqNumber of users in public, | Cp∪CqL is CpAnd CqThe number of all users;
the neighbor sets are clustered to obtain user clusters, the number of sets can be reduced, the workload of calculating the registration suspicion score is reduced, and the accuracy is improved.
Specifically, in the embodiment of the present invention, when the batch registration suspicion calculating unit 5 calculates the registration suspicion score of each user account to be identified according to each user cluster M, the batch registration suspicion calculating unit applies to a custom suspicion formula, where the custom suspicion formula is:
Figure BDA0001785714510000162
wherein S isuRegistering suspicion score, Z, for user u in batchesuA set of suspected user accounts of the user u in each user cluster M where the user u is located, i is a set ZuAny element of (1), | ZuIs set ZuNumber of elements of (2), wiuIs the similarity score between user u and user i.
Based on the same inventive concept, the present application provides an embodiment of a storage medium corresponding to the embodiment described in the third embodiment
EXAMPLE III
A third embodiment of the invention provides a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out all or part of the method steps of the first embodiment.
The present invention can implement all or part of the flow in the method of the first embodiment, and can also be implemented by using a computer program to instruct related hardware, where the computer program can be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments can be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, recording medium, U.S. disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution media, and the like. It should be noted that the computer readable medium may contain other components which may be suitably increased or decreased as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, in accordance with legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunications signals.
Based on the same inventive concept, the present application provides an embodiment of an electronic device corresponding to the first embodiment, which is detailed in the fourth embodiment
Example four
The fourth embodiment of the present invention further provides an electronic device, which includes a memory and a processor, wherein the memory stores a computer program running on the processor, and the processor executes the computer program to implement all or part of the method steps in the first embodiment.
The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, the processor being the control center of the computer device and the various interfaces and lines connecting the various parts of the overall computer device.
The memory may be used to store computer programs and/or modules, and the processor may implement various functions of the computer device by executing or executing the computer programs and/or modules stored in the memory, as well as by invoking data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, video data, etc.) created according to the use of the cellular phone, etc. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, server, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), servers and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (10)

1. A method for identifying a batch of registered users is characterized by comprising the following steps:
acquiring a user account set, wherein the user account set comprises a suspected user account set known to be in batch registration suspicion and a to-be-identified user account set to be identified whether to be in batch registration suspicion or not;
in the user account set, similarity calculation is carried out on any user account and other user accounts to obtain similarity w;
constructing a user structure map according to each user account and each similarity w in the user account set;
clustering the user structure atlas to obtain a plurality of user clusters M;
calculating the registration suspicion score of each user account to be identified according to each user cluster M;
and when the batch registration suspicion score exceeds a preset batch registration suspicion threshold, indicating that the batch registration operation exists for the corresponding user account to be identified.
2. The method according to claim 1, wherein any user account in the user account set performs similarity calculation with other user accounts, and when obtaining the similarity w, a customized similarity formula is applied, and the customized similarity formula is:
Figure FDA0001785714500000011
wherein: w is auvIs the similarity score between user u and user v;
nickuuser u registers nickname text, nickvIs user v registers nickname text;
edit(nicku,nickv) The user u registers the edit distance between the nickname text and the nickname text registered by v;
len(nicku) User u registers nickname text length, len (nick)v) The length of the nickname text registered by the user v;
x is an attribute index which is possessed by the registered account u and the registered account v in advance, and the attribute index is at least one or more of IP (Internet protocol) usage, equipment usage or registration time;
xuiis the i-th attribute index, x, associated with the registered account uviIs the ith attribute index related to the registered account v, and the total number of the attribute indexes is N;
I(xui=xvi) Is an indicative function if xui=xviThen xui=xviTaking 1, otherwise, taking 0;
wi(i is 1,2) is a weight coefficient, ranges from 0 to 1, and satisfies
Figure FDA0001785714500000021
3. The identification method according to claim 1, wherein the constructing a user structure graph according to each user account and each similarity w in the user account set specifically includes the following steps:
a1, each user account in the user account set is used as a node;
a2, setting a similarity threshold, and connecting the nodes represented by the two user accounts when the similarity w of the two user accounts exceeds the preset similarity threshold to form a user structure map.
4. The identification method according to claim 1, wherein the clustering process is performed on the user structure graph to obtain a plurality of user clusters M, and specifically comprises the following steps:
b1, calculating k-order common neighbors of user accounts at two ends of each connection line in a user structure graph, merging the k-order common neighbors of two user accounts of the same connection line as a neighbor set C, and obtaining a plurality of neighbor sets C, wherein the k-order common neighbors are all node sets of which the number of vertices required by reaching a user is less than k, and k is a positive integer;
and B2, aggregating the neighbor sets C to obtain a plurality of user clusters M.
5. The identification method according to claim 4, wherein the aggregating of each neighbor set C is specifically performed by:
select any 2 neighbor sets CpAnd CqWhen is coming into contact with
Figure FDA0001785714500000031
When it is, then pair CpAnd CqAggregating to obtain user cluster Mn
Wherein, | Cp∩CqL is CpAnd CqNumber of users in public, | Cp∪CqL is CpAnd CqThe number of all users.
6. The identification method according to claim 1, wherein the registration suspicion score of each user account to be identified is calculated according to each user cluster M and applied to a custom suspicion formula, wherein the custom suspicion formula is as follows:
Figure FDA0001785714500000032
wherein S isuRegistering suspicion score, Z, for user u in batchesuFor the useru is the set of suspected user accounts in each user cluster M where the u is located, i is the set ZuAny element of (1), | ZuIs set ZuNumber of elements of (2), wiuIs the similarity score between user u and user i.
7. A storage medium having a computer program stored thereon, characterized in that: the computer program, when executed by a processor, implements the method of any of claims 1 to 5.
8. An electronic device comprising a memory and a processor, the memory having stored thereon a computer program that runs on the processor, characterized in that: the processor, when executing the computer program, implements the method of any of claims 1 to 5.
9. A batch registered user identification system, comprising:
the account acquisition unit is used for acquiring a user account set, wherein the user account set comprises a suspected user account set known to be in batch registration suspicion and a to-be-identified user account set to be identified whether to be identified or not;
the similarity calculation unit is used for performing similarity calculation on any user account and other user accounts in the user account set to obtain similarity w;
the user structure map building unit is used for building a user structure map according to each user account and each similarity w in the user account set;
the user cluster acquisition unit is used for carrying out clustering processing on the user structure atlas to obtain a plurality of user clusters M;
and the batch registration suspicion calculating unit is used for calculating the registration suspicion score of each user account to be identified according to each user cluster M, and when the batch registration suspicion score exceeds a preset batch registration suspicion threshold, the batch registration suspicion calculating unit indicates that the corresponding user account to be identified has batch registration operation.
10. The identification system according to claim 9, wherein the batch registration suspicion calculation unit applies a custom suspicion formula when calculating the registration suspicion score of each user account to be identified according to each user cluster M, the custom suspicion formula being:
Figure FDA0001785714500000041
wherein S isuRegistering suspicion score, Z, for user u in batchesuA set of suspected user accounts of the user u in each user cluster M where the user u is located, i is a set ZuAny element of (1), | ZuIs set ZuNumber of elements of (2), wiuIs the similarity score between user u and user i.
CN201811014021.3A 2018-08-31 2018-08-31 Batch registered user identification method, storage medium, electronic device and system Active CN110876072B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811014021.3A CN110876072B (en) 2018-08-31 2018-08-31 Batch registered user identification method, storage medium, electronic device and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811014021.3A CN110876072B (en) 2018-08-31 2018-08-31 Batch registered user identification method, storage medium, electronic device and system

Publications (2)

Publication Number Publication Date
CN110876072A true CN110876072A (en) 2020-03-10
CN110876072B CN110876072B (en) 2022-02-08

Family

ID=69715352

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811014021.3A Active CN110876072B (en) 2018-08-31 2018-08-31 Batch registered user identification method, storage medium, electronic device and system

Country Status (1)

Country Link
CN (1) CN110876072B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111586001A (en) * 2020-04-28 2020-08-25 咪咕文化科技有限公司 Abnormal user identification method and device, electronic equipment and storage medium
CN112000711A (en) * 2020-07-21 2020-11-27 微梦创科网络科技(中国)有限公司 Method and system for determining evaluation user based on Spark
CN112116007A (en) * 2020-09-18 2020-12-22 四川长虹电器股份有限公司 Batch registration account detection method based on graph algorithm and clustering algorithm
CN117874724A (en) * 2023-02-22 2024-04-12 南京宏润达企业管理有限公司 Internet risk assessment method

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100011032A1 (en) * 2008-07-11 2010-01-14 Canon Kabushiki Kaisha Document management apparatus, document management system, and document management method
CN105550175A (en) * 2014-10-28 2016-05-04 阿里巴巴集团控股有限公司 Malicious account identification method and apparatus
CN105634855A (en) * 2014-11-06 2016-06-01 阿里巴巴集团控股有限公司 Method and device for recognizing network address abnormity
CN105991620A (en) * 2015-03-05 2016-10-05 阿里巴巴集团控股有限公司 Malicious account identification method and device
CN106339615A (en) * 2016-08-29 2017-01-18 北京红马传媒文化发展有限公司 Abnormal registration behavior recognition method, system and equipment
CN106407212A (en) * 2015-07-31 2017-02-15 阿里巴巴集团控股有限公司 Network account category determination method and apparatus, and object clustering method and apparatus
CN106685898A (en) * 2015-11-09 2017-05-17 阿里巴巴集团控股有限公司 Method and device for identifying batch-registered accounts
CN107733883A (en) * 2017-10-09 2018-02-23 武汉斗鱼网络科技有限公司 A kind of method and device for detecting batch registration account
CN107835154A (en) * 2017-10-09 2018-03-23 武汉斗鱼网络科技有限公司 A kind of batch registration account recognition methods and system
CN108052543A (en) * 2017-11-23 2018-05-18 北京工业大学 A kind of similar account detection method of microblogging based on map analysis cluster

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100011032A1 (en) * 2008-07-11 2010-01-14 Canon Kabushiki Kaisha Document management apparatus, document management system, and document management method
CN105550175A (en) * 2014-10-28 2016-05-04 阿里巴巴集团控股有限公司 Malicious account identification method and apparatus
CN105634855A (en) * 2014-11-06 2016-06-01 阿里巴巴集团控股有限公司 Method and device for recognizing network address abnormity
CN105991620A (en) * 2015-03-05 2016-10-05 阿里巴巴集团控股有限公司 Malicious account identification method and device
CN106407212A (en) * 2015-07-31 2017-02-15 阿里巴巴集团控股有限公司 Network account category determination method and apparatus, and object clustering method and apparatus
CN106685898A (en) * 2015-11-09 2017-05-17 阿里巴巴集团控股有限公司 Method and device for identifying batch-registered accounts
CN106339615A (en) * 2016-08-29 2017-01-18 北京红马传媒文化发展有限公司 Abnormal registration behavior recognition method, system and equipment
CN107733883A (en) * 2017-10-09 2018-02-23 武汉斗鱼网络科技有限公司 A kind of method and device for detecting batch registration account
CN107835154A (en) * 2017-10-09 2018-03-23 武汉斗鱼网络科技有限公司 A kind of batch registration account recognition methods and system
CN108052543A (en) * 2017-11-23 2018-05-18 北京工业大学 A kind of similar account detection method of microblogging based on map analysis cluster

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
方勇等: "基于层次聚类的虚假用户检测", 《清华大学学报(自然科学版)》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111586001A (en) * 2020-04-28 2020-08-25 咪咕文化科技有限公司 Abnormal user identification method and device, electronic equipment and storage medium
CN112000711A (en) * 2020-07-21 2020-11-27 微梦创科网络科技(中国)有限公司 Method and system for determining evaluation user based on Spark
CN112116007A (en) * 2020-09-18 2020-12-22 四川长虹电器股份有限公司 Batch registration account detection method based on graph algorithm and clustering algorithm
CN117874724A (en) * 2023-02-22 2024-04-12 南京宏润达企业管理有限公司 Internet risk assessment method

Also Published As

Publication number Publication date
CN110876072B (en) 2022-02-08

Similar Documents

Publication Publication Date Title
CN110876072B (en) Batch registered user identification method, storage medium, electronic device and system
CN108174296A (en) Malicious user recognition methods and device
CN108600414B (en) Equipment fingerprint construction method and device, storage medium and terminal
CN108985954B (en) Method for establishing association relation of each identifier and related equipment
CN109858441A (en) A kind of monitoring abnormal state method and apparatus for construction site
CN114116705B (en) Method and device for determining contribution value of participants in joint learning
CN107885716B (en) Text recognition method and device
WO2019052162A1 (en) Method, apparatus and device for improving data cleaning efficiency, and readable storage medium
CN110457704B (en) Target field determination method and device, storage medium and electronic device
CN110222790B (en) User identity identification method and device and server
CN110516752A (en) Cluster quality evaluation method, device and equipment and storage medium
CN109102468A (en) Image enhancement method and device, terminal equipment and storage medium
CN113283351B (en) Video plagiarism detection method using CNN optimization similarity matrix
CN108463813B (en) Method and device for processing data
CN111414528B (en) Method and device for determining equipment identification, storage medium and electronic equipment
CN110222297B (en) Identification method of tag user and related equipment
CN107277640A (en) Interactive approach, device and storage medium based on live platform
CN110750681B (en) Account similarity calculation method, storage medium, electronic device and system
CN114723652A (en) Cell density determination method, cell density determination device, electronic apparatus, and storage medium
CN107092650A (en) A kind of Web Log Analysis method and device
CN110866437A (en) Color value determination model optimization method and device, electronic equipment and storage medium
CN115834231A (en) Honeypot system identification method and device, terminal equipment and storage medium
CN110647805B (en) Reticulate pattern image recognition method and device and terminal equipment
CN109995613B (en) Flow calculation method and device
CN114339689A (en) Internet of things machine card binding pool control method and device and related medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant