CN110876072A - Batch registered user identification method, storage medium, electronic device and system - Google Patents
Batch registered user identification method, storage medium, electronic device and system Download PDFInfo
- Publication number
- CN110876072A CN110876072A CN201811014021.3A CN201811014021A CN110876072A CN 110876072 A CN110876072 A CN 110876072A CN 201811014021 A CN201811014021 A CN 201811014021A CN 110876072 A CN110876072 A CN 110876072A
- Authority
- CN
- China
- Prior art keywords
- user
- user account
- similarity
- suspicion
- registration
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/25—Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
- H04N21/258—Client or end-user data management, e.g. managing client capabilities, user preferences or demographics, processing of multiple end-users preferences to derive collaborative data
- H04N21/25866—Management of end-user data
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Computer Graphics (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a method, a storage medium, electronic equipment and a system for identifying users registered in batch, which relate to the field of big data wind control, and the method comprises the following steps: acquiring a user account set; in the user account set, similarity calculation is carried out on any user account and other user accounts to obtain similarity w; constructing a user structure map according to each user account and each similarity w in the user account set; clustering the user structure atlas to obtain a plurality of user clusters M; calculating the registration suspicion score of each user account to be identified according to each user cluster M; and when the batch registration suspicion score exceeds a preset batch registration suspicion threshold, indicating that the batch registration operation exists for the corresponding user account to be identified. The invention identifies the user account, judges whether the batch registration behavior exists, has high identification efficiency and provides help for the staff to manage the user account.
Description
Technical Field
The invention relates to the field of big data wind control, in particular to a batch registered user identification method, a storage medium, electronic equipment and a system.
Background
With the development of live broadcast, the content related to live broadcast is more and more extensive, more and more people watching the live broadcast are provided, more and more user accounts are generated by registration, but some batch registered user accounts are rarely existed in a large number of user accounts, and the batch registered user accounts are usually used for abnormal operations such as malicious comment brushing or attention brushing, so that a live broadcast room platform needs to identify the user accounts, screen out the batch registered user accounts, and take corresponding processing measures for the batch registered user accounts;
the traditional batch registration identification means mostly set multiple attribute indexes for workers and manually identify the user account, and the method has low efficiency, complex operation procedures and large task amount and brings inconvenience to the identification work of the user account;
therefore, a new batch registration user identification method is urgently needed to improve the efficiency of user account identification work.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a batch registration user identification method, which is used for identifying user accounts and judging whether batch registration behaviors exist, is high in identification efficiency and provides help for workers to manage the user accounts.
In order to achieve the above purposes, the technical scheme adopted by the invention is as follows:
in a first aspect, the present invention provides a method for identifying users registered in batch, comprising the following steps:
acquiring a user account set, wherein the user account set comprises a suspected user account set known to be in batch registration suspicion and a to-be-identified user account set to be identified whether to be in batch registration suspicion or not;
in the user account set, similarity calculation is carried out on any user account and other user accounts to obtain similarity w;
constructing a user structure map according to each user account and each similarity w in the user account set;
clustering the user structure atlas to obtain a plurality of user clusters M;
calculating the registration suspicion score of each user account to be identified according to each user cluster M;
and when the batch registration suspicion score exceeds a preset batch registration suspicion threshold, indicating that the batch registration operation exists for the corresponding user account to be identified.
On the basis of the technical scheme, in the user account set, similarity calculation is carried out on any user account and other user accounts, and when the similarity w is obtained, a customized similarity formula is applied, wherein the customized similarity formula is as follows:
wherein: w is auvIs the similarity score between user u and user v;
nickuuser u registers nickname text, nickvIs user v registers nickname text;
edit(nicku,nickv) The user u registers the edit distance between the nickname text and the nickname text registered by v;
len(nicku) User u registers nickname text length, len (nick)v) The length of the nickname text registered by the user v;
x is an attribute index which is possessed by the registered account u and the registered account v in advance, and the attribute index is at least one or more of IP (Internet protocol) usage, equipment usage or registration time;
xuiis the i-th attribute index, x, associated with the registered account uviIs the ith attribute index related to the registered account v, and the total number of the attribute indexes is N;
I(xui=xvi) Is an indicative function if xui=xviThen xui=xviTaking 1, otherwise, taking 0;
wi(i is 1,2) is a weight coefficient, ranges from 0 to 1, and satisfies
On the basis of the technical scheme, the method for constructing the user structure map according to each user account and each similarity w in the user account set specifically comprises the following steps:
a1, each user account in the user account set is used as a node;
a2, setting a similarity threshold, and connecting the nodes represented by the two user accounts when the similarity w of the two user accounts exceeds the preset similarity threshold to form a user structure map.
On the basis of the technical scheme, the clustering process is carried out on the user structure atlas to obtain a plurality of user clusters M, and the method specifically comprises the following steps:
b1, calculating k-order common neighbors of user accounts at two ends of each connection line in a user structure graph, merging the k-order common neighbors of two user accounts of the same connection line as a neighbor set C, and obtaining a plurality of neighbor sets C, wherein the k-order common neighbors are all node sets of which the number of vertices required by reaching a user is less than k, and k is a positive integer;
and B2, aggregating the neighbor sets C to obtain a plurality of user clusters M.
On the basis of the above technical solution, when the neighbor sets C are aggregated, the specific operations are as follows:
select any 2 neighbor sets CpAnd CqWhen is coming into contact withWhen it is, then pair CpAnd CqAggregating to obtain user cluster Mn;
Wherein, | Cp∩CqL is CpAnd CqNumber of users in public, | Cp∪CqL is CpAnd CqThe number of all users.
On the basis of the technical scheme, the registration suspicion score of each user account to be identified is calculated according to each user cluster M and applied to a custom suspicion formula, wherein the custom suspicion formula is as follows:
wherein S isuRegistering suspicion score, Z, for user u in batchesuA set of suspected user accounts of the user u in each user cluster M where the user u is located, i is a set ZuAny element of (1), | ZuIs set ZuNumber of elements of (2), wiuIs the similarity score between user u and user i.
In a second aspect, the present invention further provides a storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements the batch registered user identification method.
In a third aspect, the present invention further provides an electronic device, including a memory and a processor, where the memory stores a computer program running on the processor, and the electronic device is characterized in that: the processor realizes the batch registered user identification method when executing the computer program.
In a fourth aspect, the present invention further provides a system for batch registration of subscriber identities, comprising:
the account acquisition unit is used for acquiring a user account set, wherein the user account set comprises a suspected user account set known to be in batch registration suspicion and a to-be-identified user account set to be identified whether to be identified or not;
the similarity calculation unit is used for performing similarity calculation on any user account and other user accounts in the user account set to obtain similarity w;
the user structure map building unit is used for building a user structure map according to each user account and each similarity w in the user account set, using each user account as a node in the user structure map, and connecting the nodes represented by the two user accounts when the similarity w of the two user accounts exceeds a preset similarity threshold;
the user cluster acquisition unit is used for carrying out clustering processing on the user structure atlas to obtain a plurality of user clusters M;
and the batch registration suspicion calculating unit is used for calculating the registration suspicion score of each user account to be identified according to each user cluster M, and when the batch registration suspicion score exceeds a preset batch registration suspicion threshold, the batch registration suspicion calculating unit indicates that the corresponding user account to be identified has batch registration operation.
On the basis of the technical scheme, when the batch registration suspicion calculation unit calculates the registration suspicion score of each user account to be identified according to each user cluster M, the batch registration suspicion calculation unit applies to a custom suspicion formula, wherein the custom suspicion formula is as follows:
wherein S isuRegistering suspicion score, Z, for user u in batchesuA set of suspected user accounts of the user u in each user cluster M where the user u is located, i is a set ZuAny element of (1), | ZuIs set ZuNumber of elements of (2), wiuIs the similarity score between user u and user i.
Compared with the prior art, the invention has the advantages that:
(1) according to the method, the similarity among users and the user structure spectrogram constructed according to the similarity are taken as the basis, the suspected user account set with the confirmed batch registration behaviors is taken as a comparison reference, the batch registration behaviors of all the users are identified, and the users with the batch registration behaviors are identified.
Drawings
FIG. 1 is a flow chart of a method for batch registration of subscriber identification in accordance with the present invention;
FIG. 2 is a flowchart of a method for constructing a user structure graph in a batch registration user identification method according to the present invention;
FIG. 3 is a flowchart of obtaining a user cluster M in a batch registration user identification method according to the present invention;
FIG. 4 is a schematic diagram of a batch registration subscriber identification system according to the present invention;
in the figure: 1. an account number obtaining unit; 2. a similarity calculation unit; 3. a user structure map construction unit; 4. a user cluster acquisition unit; 5. and registering the suspected computing units in batches.
Detailed Description
Embodiments of the present invention will be described in further detail below with reference to the accompanying drawings.
The embodiment of the invention provides a batch registration user identification method, a storage medium, electronic equipment and a system, wherein the method is characterized in that the similarity among users and a user structure spectrogram constructed according to the similarity are taken as a basis, and then a suspected user account set which is confirmed to have batch registration behaviors is taken as a comparison standard to identify the batch registration behaviors of all users and identify the users having the batch registration behaviors.
In order to achieve the technical effects, the general idea of the application is as follows:
a method for batch registration of subscriber identification, comprising the steps of:
s1, acquiring a user account set, wherein the user account set comprises a suspected user account set known to have batch registration suspicion and a to-be-identified user account set to be identified whether to have batch registration suspicion;
s2, in the user account set, similarity calculation is carried out on any user account and other user accounts to obtain similarity w;
s3, constructing a user structure map according to each user account and each similarity w in the user account set;
s4, clustering the user structure map to obtain a plurality of user clusters M;
s5, calculating the registration suspicion score of each user account to be identified according to each user cluster M;
and S6, when the suspicion score of batch registration exceeds a preset suspicion threshold of batch registration, indicating that the batch registration operation exists for the corresponding user account to be identified.
Example one
Referring to fig. 1 to fig. 3, an embodiment of the present invention provides a method for identifying users registered in batch, including the following steps:
s1, acquiring a user account set, wherein the user account set comprises a suspected user account set known to have batch registration suspicion and a to-be-identified user account set to be identified whether to have batch registration suspicion;
s2, in the user account set, similarity calculation is carried out on any user account and other user accounts to obtain similarity w;
preferably, in the embodiment of the present invention, in the user account set, similarity calculation is performed between any user account and other user accounts to obtain similarity w, that is, when the similarity w between any two accounts in the user account set is calculated, a customized similarity formula may be applied, where the customized similarity formula is:
wherein: w is auvIs the similarity score between user u and user v;
nickuuser u registers nickname text, nickvIs user v registers nickname text;
edit(nicku,nickv) The user u registers the edit distance between the nickname text and the nickname text registered by v;
len(nicku) User u registers nickname text length, len (nick)v) The length of the nickname text registered by the user v;
x is an attribute index which is possessed by the registered account u and the registered account v in advance, and the attribute index is at least one or more of IP (Internet protocol) usage, equipment usage or registration time;
xuiis the i-th attribute index, x, associated with the registered account uviIs the ith attribute index related to the registered account v, and the total number of the attribute indexes is N;
I(xui=xvi) Is an indicative function if xui=xviThen xui=xviTaking 1, otherwise, taking 0;
wi(i is 1,2) is a weight coefficient, ranges from 0 to 1, and satisfies
Note that it is bit (nick)u,nickv) The method is characterized in that the editing distance between u registered nickname text and v registered nickname text of a user, namely Levenshtein distance, is the minimum number of editing operations required for converting one character into another character between two character strings, wherein permitted editing operations comprise replacing one character with another character, inserting one character and deleting one character, and generally, the smaller the editing distance is, the greater the similarity of the two character strings is;
here, in the embodiment of the present invention, a specific example is given:
the method comprises the steps that two user accounts are assumed, the edit distance between the registration nickname of the account A and the registration nickname of the account B is 2, the length of the nickname of the account A is 5, and the degree of the nickname of the account B is 4; consider 4 attribute indexes, where two of A and B are the same; the weighting coefficients are each 0.5, and the similarity between a and B is:
0.5*(2/max(4,5))+0.5*(2/4)=0.45。
s3, constructing a user structure map according to each user account and each similarity w in the user account set;
specifically, in the embodiment of the present invention, the step of constructing the user structure graph specifically includes the following steps:
a1, each user account in the user account set is used as a node;
a2, setting a similarity threshold, and connecting the nodes represented by the two user accounts when the similarity w of the two user accounts exceeds the preset similarity threshold to form a user structure map;
s4, clustering the user structure map to obtain a plurality of user clusters M;
specifically, in the embodiment of the present invention, the step of obtaining the user cluster M specifically includes the following steps:
b1, calculating k-order common neighbors of user accounts at two ends of each connection line in the user structure map, merging the k-order common neighbors of two user accounts of the same connection line as a neighbor set C, and obtaining a plurality of neighbor sets C, wherein the k-order common neighbors are all node sets of which the number of vertices required by reaching a user is less than k, and k is a positive integer;
and B2, aggregating the neighbor sets C to obtain a plurality of user clusters M.
Specifically, in the embodiment of the present invention, during the specific operation, since the similarity w is theoretically a decimal smaller than 1, the preset similarity threshold, which can be written as η, may be set to 0.1;
and the k value of the k-order common neighbor can be selected in different quantities according to the number of the user accounts in the user account set, and the value of k is 5-10 in normal operation.
Preferably, in the embodiment of the present invention, when aggregating each neighbor set C, the specific operations are:
select any 2 neighbor sets CpAnd CqWhen is coming into contact withWhen it is, then pair CpAnd CqAggregating to obtain user cluster Mn;
Wherein, | Cp∩CqL is CpAnd CqNumber of users in public, | Cp∪CqL is CpAnd CqThe number of all users;
the neighbor sets are clustered to obtain user clusters, the number of sets can be reduced, the workload of calculating the registration suspicion score is reduced, and the accuracy is improved.
S5, calculating the registration suspicion score of each user account to be identified according to each user cluster M;
specifically, in the embodiment of the present invention, when the registration suspicion score of each user account to be identified is calculated according to each user cluster M, the registration suspicion score is applied to a custom suspicion formula, where the custom suspicion formula is:
wherein S isuRegistering suspicion score, Z, for user u in batchesuA set of suspected user accounts of the user u in each user cluster M where the user u is located, i is a set ZuAny element of (1), | ZuIs set ZuNumber of elements of (2), wiuA similarity score between user u and user i;
in example of the present invention, ZuThe obtaining mode is that each user cluster M of the users u is counted firstly, and then the account numbers of the suspected users in the user clusters M are concentrated to form a set Zu。
And S6, when the suspicion score of batch registration exceeds a preset suspicion threshold of batch registration, indicating that the batch registration operation exists for the corresponding user account to be identified.
In the embodiment of the invention, a worker acquires a user account set, wherein the user account set can be a user account set of a newly registered user of a server on the same day or a set of a plurality of user accounts acquired in other manners, the user account set comprises a plurality of user accounts, the user accounts in the user account set can be respectively classified into a suspected user account set or a to-be-identified user account set, the suspected user account set comprises user accounts with suspicion of batch registration, namely suspected user accounts, and the to-be-identified user account set comprises to-be-identified user accounts with suspicion of batch registration or not;
furthermore, in the user account set, similarity calculation is carried out on any user account and other user accounts to obtain similarity w, so that the similarity of each user account and other user accounts is obtained;
then, constructing a user structure map by each user account and each similarity w in the user account set, taking each user account as a node in the user structure map, connecting the nodes represented by the two user accounts when the similarity w of the two user accounts exceeds a preset similarity threshold, and taking the user structure map as prepared data for subsequent user clustering;
then, clustering the user structure maps to obtain a plurality of user clustering M structure maps, and clustering to obtain a plurality of user clustering M;
and finally, presetting a batch registration suspicion threshold, and when the batch registration suspicion score exceeds the preset batch registration suspicion threshold, indicating that the batch registration operation exists for the corresponding user account to be identified.
In the embodiment of the invention, the similarity among users and a user structure spectrogram constructed according to the similarity are taken as a basis, a suspected user account set with the confirmed batch registration behaviors is taken as a comparison reference, the batch registration behaviors of all users are identified, and the users with the batch registration behaviors are identified.
It should be noted that the suspected user account in the suspected user account set and the to-be-identified user account in the to-be-identified user account set are equal to the user account set in the user account set.
Based on the same inventive concept, the application provides a batch registered user identification system corresponding to the embodiment, which is detailed in the second embodiment.
Example two
As shown in fig. 4, a second embodiment of the present invention further provides a batch registered subscriber identification system, which includes:
the account acquisition unit 1 is used for acquiring a user account set, wherein the user account set comprises a suspected user account set known to be in batch registration suspicion and a to-be-identified user account set to be identified whether to be identified or not;
the similarity calculation unit 2 is configured to perform similarity calculation on any user account and other user accounts in the user account set to obtain a similarity w;
the user structure map building unit 3 is used for building a user structure map according to each user account in the user account set and each similarity w;
a user cluster obtaining unit 4, configured to perform clustering processing on the user structure atlas to obtain a plurality of user clusters M;
and the batch registration suspicion calculating unit 5 is used for calculating the registration suspicion score of each user account to be identified according to each user cluster M, and when the batch registration suspicion score exceeds a preset batch registration suspicion threshold, it indicates that the batch registration operation exists for the corresponding user account to be identified.
In the embodiment of the invention, a worker acquires a user account set, wherein the user account set can be a user account set of a newly registered user of a server on the same day or a set of a plurality of user accounts acquired in other manners, the user account set comprises a plurality of user accounts, the user accounts in the user account set can be respectively classified into a suspected user account set or a to-be-identified user account set, the suspected user account set comprises user accounts with suspicion of batch registration, namely suspected user accounts, and the to-be-identified user account set comprises to-be-identified user accounts with suspicion of batch registration or not;
furthermore, in the user account set, similarity calculation is carried out on any user account and other user accounts to obtain similarity w, so that the similarity of each user account and other user accounts is obtained;
then, constructing a user structure map by each user account and each similarity w in the user account set, taking each user account as a node in the user structure map, connecting the nodes represented by the two user accounts when the similarity w of the two user accounts exceeds a preset similarity threshold, and taking the user structure map as prepared data for subsequent user clustering;
then, clustering the user structure maps to obtain a plurality of user clustering M structure maps, and clustering to obtain a plurality of user clustering M;
and finally, presetting a batch registration suspicion threshold, and when the batch registration suspicion score exceeds the preset batch registration suspicion threshold, indicating that the batch registration operation exists for the corresponding user account to be identified.
In the embodiment of the invention, the similarity among users and a user structure spectrogram constructed according to the similarity are taken as a basis, a suspected user account set with the confirmed batch registration behaviors is taken as a comparison reference, the batch registration behaviors of all users are identified, and the users with the batch registration behaviors are identified.
It should be noted that the suspected user account in the suspected user account set and the to-be-identified user account in the to-be-identified user account set are equal to the user account set in the user account set.
Preferably, in the embodiment of the present invention, in the similarity calculation unit 2, in the user account set, similarity calculation is performed between any user account and other user accounts to obtain the similarity w;
namely, when calculating the similarity w between any two accounts in the user account set, a user-defined similarity formula is applied, and the user-defined similarity formula is as follows:
wherein: w is auvIs the similarity score between user u and user v;
nickuuser u registers nickname text, nickvIs user v registers nickname text;
edit(nicku,nickv) The user u registers the edit distance between the nickname text and the nickname text registered by v;
len(nicku) User u registers nickname text length, len (nick)v) The length of the nickname text registered by the user v;
x is an attribute index which is possessed by the registered account u and the registered account v in advance, and the attribute index is at least one or more of IP (Internet protocol) usage, equipment usage or registration time;
xuiis the i-th attribute index, x, associated with the registered account uviIs the ith attribute index related to the registered account v, and the total number of the attribute indexes is N;
I(xui=xvi) Is an indicative function if xui=xviThen xui=xviTaking 1, otherwise, taking 0;
wi(i is 1,2) is a weight coefficient, ranges from 0 to 1, and satisfies
Note that it is bit (nick)u,nickv) The editing distance between the u registered nickname text and the v registered nickname text of the user is called Levenshtein distance, which is the minimum number of editing operations required for converting one character into another character between two character strings, wherein the permitted editing operations comprise replacing one character with another character, inserting one character and deleting one character, and generally, the smaller the editing distance is, the greater the similarity of the two character strings is.
Specifically, in the embodiment of the present invention, the process of constructing the user structure graph by the user structure graph constructing unit 3 specifically includes the following steps:
a1, each user account in the user account set is used as a node;
a2, setting a similarity threshold, and connecting the nodes represented by the two user accounts when the similarity w of the two user accounts exceeds the preset similarity threshold to form a user structure map.
Specifically, in the embodiment of the present invention, the process of acquiring the user cluster M by the user cluster acquiring unit 4 specifically includes the following steps: b1, calculating k-order common neighbors of user accounts at two ends of each connection line in the user structure map, merging the k-order common neighbors of two user accounts of the same connection line as a neighbor set C, and obtaining a plurality of neighbor sets C, wherein the k-order common neighbors are all node sets of which the number of vertices required by reaching a user is less than k, and k is a positive integer;
and B2, aggregating the neighbor sets C to obtain a plurality of user clusters M.
Specifically, in the embodiment of the present invention, during the specific operation, since the similarity w is theoretically a decimal smaller than 1, the preset similarity threshold, which can be written as η, may be set to 0.1;
and the k value of the k-order common neighbor can be selected in different quantities according to the number of the user accounts in the user account set, and the value of k is 5-10 in normal operation.
Preferably, in the embodiment of the present invention, when the user cluster obtaining unit 4 performs aggregation on each neighbor set C, the specific operations are as follows:
select any 2 neighbor sets CpAnd CqWhen is coming into contact withWhen it is, then pair CpAnd CqAggregating to obtain user cluster Mn;
Wherein, | Cp∩CqL is CpAnd CqNumber of users in public, | Cp∪CqL is CpAnd CqThe number of all users;
the neighbor sets are clustered to obtain user clusters, the number of sets can be reduced, the workload of calculating the registration suspicion score is reduced, and the accuracy is improved.
Specifically, in the embodiment of the present invention, when the batch registration suspicion calculating unit 5 calculates the registration suspicion score of each user account to be identified according to each user cluster M, the batch registration suspicion calculating unit applies to a custom suspicion formula, where the custom suspicion formula is:
wherein S isuRegistering suspicion score, Z, for user u in batchesuA set of suspected user accounts of the user u in each user cluster M where the user u is located, i is a set ZuAny element of (1), | ZuIs set ZuNumber of elements of (2), wiuIs the similarity score between user u and user i.
Based on the same inventive concept, the present application provides an embodiment of a storage medium corresponding to the embodiment described in the third embodiment
EXAMPLE III
A third embodiment of the invention provides a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out all or part of the method steps of the first embodiment.
The present invention can implement all or part of the flow in the method of the first embodiment, and can also be implemented by using a computer program to instruct related hardware, where the computer program can be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments can be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, recording medium, U.S. disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution media, and the like. It should be noted that the computer readable medium may contain other components which may be suitably increased or decreased as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, in accordance with legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunications signals.
Based on the same inventive concept, the present application provides an embodiment of an electronic device corresponding to the first embodiment, which is detailed in the fourth embodiment
Example four
The fourth embodiment of the present invention further provides an electronic device, which includes a memory and a processor, wherein the memory stores a computer program running on the processor, and the processor executes the computer program to implement all or part of the method steps in the first embodiment.
The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, the processor being the control center of the computer device and the various interfaces and lines connecting the various parts of the overall computer device.
The memory may be used to store computer programs and/or modules, and the processor may implement various functions of the computer device by executing or executing the computer programs and/or modules stored in the memory, as well as by invoking data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, video data, etc.) created according to the use of the cellular phone, etc. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, server, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), servers and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.
Claims (10)
1. A method for identifying a batch of registered users is characterized by comprising the following steps:
acquiring a user account set, wherein the user account set comprises a suspected user account set known to be in batch registration suspicion and a to-be-identified user account set to be identified whether to be in batch registration suspicion or not;
in the user account set, similarity calculation is carried out on any user account and other user accounts to obtain similarity w;
constructing a user structure map according to each user account and each similarity w in the user account set;
clustering the user structure atlas to obtain a plurality of user clusters M;
calculating the registration suspicion score of each user account to be identified according to each user cluster M;
and when the batch registration suspicion score exceeds a preset batch registration suspicion threshold, indicating that the batch registration operation exists for the corresponding user account to be identified.
2. The method according to claim 1, wherein any user account in the user account set performs similarity calculation with other user accounts, and when obtaining the similarity w, a customized similarity formula is applied, and the customized similarity formula is:
wherein: w is auvIs the similarity score between user u and user v;
nickuuser u registers nickname text, nickvIs user v registers nickname text;
edit(nicku,nickv) The user u registers the edit distance between the nickname text and the nickname text registered by v;
len(nicku) User u registers nickname text length, len (nick)v) The length of the nickname text registered by the user v;
x is an attribute index which is possessed by the registered account u and the registered account v in advance, and the attribute index is at least one or more of IP (Internet protocol) usage, equipment usage or registration time;
xuiis the i-th attribute index, x, associated with the registered account uviIs the ith attribute index related to the registered account v, and the total number of the attribute indexes is N;
I(xui=xvi) Is an indicative function if xui=xviThen xui=xviTaking 1, otherwise, taking 0;
3. The identification method according to claim 1, wherein the constructing a user structure graph according to each user account and each similarity w in the user account set specifically includes the following steps:
a1, each user account in the user account set is used as a node;
a2, setting a similarity threshold, and connecting the nodes represented by the two user accounts when the similarity w of the two user accounts exceeds the preset similarity threshold to form a user structure map.
4. The identification method according to claim 1, wherein the clustering process is performed on the user structure graph to obtain a plurality of user clusters M, and specifically comprises the following steps:
b1, calculating k-order common neighbors of user accounts at two ends of each connection line in a user structure graph, merging the k-order common neighbors of two user accounts of the same connection line as a neighbor set C, and obtaining a plurality of neighbor sets C, wherein the k-order common neighbors are all node sets of which the number of vertices required by reaching a user is less than k, and k is a positive integer;
and B2, aggregating the neighbor sets C to obtain a plurality of user clusters M.
5. The identification method according to claim 4, wherein the aggregating of each neighbor set C is specifically performed by:
select any 2 neighbor sets CpAnd CqWhen is coming into contact withWhen it is, then pair CpAnd CqAggregating to obtain user cluster Mn;
Wherein, | Cp∩CqL is CpAnd CqNumber of users in public, | Cp∪CqL is CpAnd CqThe number of all users.
6. The identification method according to claim 1, wherein the registration suspicion score of each user account to be identified is calculated according to each user cluster M and applied to a custom suspicion formula, wherein the custom suspicion formula is as follows:
wherein S isuRegistering suspicion score, Z, for user u in batchesuFor the useru is the set of suspected user accounts in each user cluster M where the u is located, i is the set ZuAny element of (1), | ZuIs set ZuNumber of elements of (2), wiuIs the similarity score between user u and user i.
7. A storage medium having a computer program stored thereon, characterized in that: the computer program, when executed by a processor, implements the method of any of claims 1 to 5.
8. An electronic device comprising a memory and a processor, the memory having stored thereon a computer program that runs on the processor, characterized in that: the processor, when executing the computer program, implements the method of any of claims 1 to 5.
9. A batch registered user identification system, comprising:
the account acquisition unit is used for acquiring a user account set, wherein the user account set comprises a suspected user account set known to be in batch registration suspicion and a to-be-identified user account set to be identified whether to be identified or not;
the similarity calculation unit is used for performing similarity calculation on any user account and other user accounts in the user account set to obtain similarity w;
the user structure map building unit is used for building a user structure map according to each user account and each similarity w in the user account set;
the user cluster acquisition unit is used for carrying out clustering processing on the user structure atlas to obtain a plurality of user clusters M;
and the batch registration suspicion calculating unit is used for calculating the registration suspicion score of each user account to be identified according to each user cluster M, and when the batch registration suspicion score exceeds a preset batch registration suspicion threshold, the batch registration suspicion calculating unit indicates that the corresponding user account to be identified has batch registration operation.
10. The identification system according to claim 9, wherein the batch registration suspicion calculation unit applies a custom suspicion formula when calculating the registration suspicion score of each user account to be identified according to each user cluster M, the custom suspicion formula being:
wherein S isuRegistering suspicion score, Z, for user u in batchesuA set of suspected user accounts of the user u in each user cluster M where the user u is located, i is a set ZuAny element of (1), | ZuIs set ZuNumber of elements of (2), wiuIs the similarity score between user u and user i.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811014021.3A CN110876072B (en) | 2018-08-31 | 2018-08-31 | Batch registered user identification method, storage medium, electronic device and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811014021.3A CN110876072B (en) | 2018-08-31 | 2018-08-31 | Batch registered user identification method, storage medium, electronic device and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110876072A true CN110876072A (en) | 2020-03-10 |
CN110876072B CN110876072B (en) | 2022-02-08 |
Family
ID=69715352
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811014021.3A Active CN110876072B (en) | 2018-08-31 | 2018-08-31 | Batch registered user identification method, storage medium, electronic device and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110876072B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111586001A (en) * | 2020-04-28 | 2020-08-25 | 咪咕文化科技有限公司 | Abnormal user identification method and device, electronic equipment and storage medium |
CN112000711A (en) * | 2020-07-21 | 2020-11-27 | 微梦创科网络科技(中国)有限公司 | Method and system for determining evaluation user based on Spark |
CN112116007A (en) * | 2020-09-18 | 2020-12-22 | 四川长虹电器股份有限公司 | Batch registration account detection method based on graph algorithm and clustering algorithm |
CN117874724A (en) * | 2023-02-22 | 2024-04-12 | 南京宏润达企业管理有限公司 | Internet risk assessment method |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100011032A1 (en) * | 2008-07-11 | 2010-01-14 | Canon Kabushiki Kaisha | Document management apparatus, document management system, and document management method |
CN105550175A (en) * | 2014-10-28 | 2016-05-04 | 阿里巴巴集团控股有限公司 | Malicious account identification method and apparatus |
CN105634855A (en) * | 2014-11-06 | 2016-06-01 | 阿里巴巴集团控股有限公司 | Method and device for recognizing network address abnormity |
CN105991620A (en) * | 2015-03-05 | 2016-10-05 | 阿里巴巴集团控股有限公司 | Malicious account identification method and device |
CN106339615A (en) * | 2016-08-29 | 2017-01-18 | 北京红马传媒文化发展有限公司 | Abnormal registration behavior recognition method, system and equipment |
CN106407212A (en) * | 2015-07-31 | 2017-02-15 | 阿里巴巴集团控股有限公司 | Network account category determination method and apparatus, and object clustering method and apparatus |
CN106685898A (en) * | 2015-11-09 | 2017-05-17 | 阿里巴巴集团控股有限公司 | Method and device for identifying batch-registered accounts |
CN107733883A (en) * | 2017-10-09 | 2018-02-23 | 武汉斗鱼网络科技有限公司 | A kind of method and device for detecting batch registration account |
CN107835154A (en) * | 2017-10-09 | 2018-03-23 | 武汉斗鱼网络科技有限公司 | A kind of batch registration account recognition methods and system |
CN108052543A (en) * | 2017-11-23 | 2018-05-18 | 北京工业大学 | A kind of similar account detection method of microblogging based on map analysis cluster |
-
2018
- 2018-08-31 CN CN201811014021.3A patent/CN110876072B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100011032A1 (en) * | 2008-07-11 | 2010-01-14 | Canon Kabushiki Kaisha | Document management apparatus, document management system, and document management method |
CN105550175A (en) * | 2014-10-28 | 2016-05-04 | 阿里巴巴集团控股有限公司 | Malicious account identification method and apparatus |
CN105634855A (en) * | 2014-11-06 | 2016-06-01 | 阿里巴巴集团控股有限公司 | Method and device for recognizing network address abnormity |
CN105991620A (en) * | 2015-03-05 | 2016-10-05 | 阿里巴巴集团控股有限公司 | Malicious account identification method and device |
CN106407212A (en) * | 2015-07-31 | 2017-02-15 | 阿里巴巴集团控股有限公司 | Network account category determination method and apparatus, and object clustering method and apparatus |
CN106685898A (en) * | 2015-11-09 | 2017-05-17 | 阿里巴巴集团控股有限公司 | Method and device for identifying batch-registered accounts |
CN106339615A (en) * | 2016-08-29 | 2017-01-18 | 北京红马传媒文化发展有限公司 | Abnormal registration behavior recognition method, system and equipment |
CN107733883A (en) * | 2017-10-09 | 2018-02-23 | 武汉斗鱼网络科技有限公司 | A kind of method and device for detecting batch registration account |
CN107835154A (en) * | 2017-10-09 | 2018-03-23 | 武汉斗鱼网络科技有限公司 | A kind of batch registration account recognition methods and system |
CN108052543A (en) * | 2017-11-23 | 2018-05-18 | 北京工业大学 | A kind of similar account detection method of microblogging based on map analysis cluster |
Non-Patent Citations (1)
Title |
---|
方勇等: "基于层次聚类的虚假用户检测", 《清华大学学报(自然科学版)》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111586001A (en) * | 2020-04-28 | 2020-08-25 | 咪咕文化科技有限公司 | Abnormal user identification method and device, electronic equipment and storage medium |
CN112000711A (en) * | 2020-07-21 | 2020-11-27 | 微梦创科网络科技(中国)有限公司 | Method and system for determining evaluation user based on Spark |
CN112116007A (en) * | 2020-09-18 | 2020-12-22 | 四川长虹电器股份有限公司 | Batch registration account detection method based on graph algorithm and clustering algorithm |
CN117874724A (en) * | 2023-02-22 | 2024-04-12 | 南京宏润达企业管理有限公司 | Internet risk assessment method |
Also Published As
Publication number | Publication date |
---|---|
CN110876072B (en) | 2022-02-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110876072B (en) | Batch registered user identification method, storage medium, electronic device and system | |
CN108174296A (en) | Malicious user recognition methods and device | |
CN108600414B (en) | Equipment fingerprint construction method and device, storage medium and terminal | |
CN108985954B (en) | Method for establishing association relation of each identifier and related equipment | |
CN109858441A (en) | A kind of monitoring abnormal state method and apparatus for construction site | |
CN114116705B (en) | Method and device for determining contribution value of participants in joint learning | |
CN107885716B (en) | Text recognition method and device | |
WO2019052162A1 (en) | Method, apparatus and device for improving data cleaning efficiency, and readable storage medium | |
CN110457704B (en) | Target field determination method and device, storage medium and electronic device | |
CN110222790B (en) | User identity identification method and device and server | |
CN110516752A (en) | Cluster quality evaluation method, device and equipment and storage medium | |
CN109102468A (en) | Image enhancement method and device, terminal equipment and storage medium | |
CN113283351B (en) | Video plagiarism detection method using CNN optimization similarity matrix | |
CN108463813B (en) | Method and device for processing data | |
CN111414528B (en) | Method and device for determining equipment identification, storage medium and electronic equipment | |
CN110222297B (en) | Identification method of tag user and related equipment | |
CN107277640A (en) | Interactive approach, device and storage medium based on live platform | |
CN110750681B (en) | Account similarity calculation method, storage medium, electronic device and system | |
CN114723652A (en) | Cell density determination method, cell density determination device, electronic apparatus, and storage medium | |
CN107092650A (en) | A kind of Web Log Analysis method and device | |
CN110866437A (en) | Color value determination model optimization method and device, electronic equipment and storage medium | |
CN115834231A (en) | Honeypot system identification method and device, terminal equipment and storage medium | |
CN110647805B (en) | Reticulate pattern image recognition method and device and terminal equipment | |
CN109995613B (en) | Flow calculation method and device | |
CN114339689A (en) | Internet of things machine card binding pool control method and device and related medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |