CN115329212A

CN115329212A - Account number obtaining method and device, computer equipment and storage medium

Info

Publication number: CN115329212A
Application number: CN202210938124.9A
Authority: CN
Inventors: 琚诚诚; 赵强
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2022-08-05
Filing date: 2022-08-05
Publication date: 2022-11-11

Abstract

The disclosure relates to an account number obtaining method, an account number obtaining device, computer equipment and a storage medium, and belongs to the technical field of computers. The method comprises the following steps: determining a first account number which executes forward interactive behaviors on the sensitive content item based on the sensitive content item issued by the reference account number; determining a candidate account for issuing a target content item based on the target content item of which the forward interaction behavior is executed by the first account; generating an account relation graph associated with the forward interactive behavior based on the reference account, the first account and the candidate account; and screening target accounts forming an account group with the reference account from the candidate accounts based on the account relation graph. According to the method and the system, the sensitive content item and the reference account are introduced as the monitoring signal, and community mining is carried out under the action of the monitoring signal, so that an account group which is contrary to the service meaning can be prevented from being mined, and the identification accuracy of a target account which potentially issues sensitive content is greatly improved.

Description

Account number obtaining method and device, computer equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to an account obtaining method and apparatus, a computer device, and a storage medium.

Background

With the development of computer technology and the diversification of terminal functions, a user can browse contents published on a platform by an author (such as a main broadcaster, an up owner, and the like) at any time and any place by using a terminal, but some authors publish sensitive contents in the platform, so how to accurately identify and hit an account publishing the sensitive contents becomes a problem that needs to be solved urgently in a security risk control scene of the platform.

Currently, when identifying an account issuing sensitive content, a community (i.e., an account group) formed by accounts issuing sensitive content frequently is attempted to be mined based on some unsupervised community mining algorithms, such as a Fast Unfolding Algorithm (Fast Unfolding) Algorithm, a Label Propagation Algorithm (LPA) and the like for mining communities based on modularity, but in a security risk control scenario, the community mining Algorithm may identify some accounts issuing homogeneous content frequently as a community, but the accounts included in the communities never issue sensitive content, that is, the community mining Algorithm is easy to output communities which are contrary to business meaning, and thus the identification accuracy of the account issuing sensitive content is poor.

Disclosure of Invention

The disclosure provides an account number obtaining method, an account number obtaining device, computer equipment and a storage medium, so as to at least improve the identification accuracy of an account number for issuing sensitive content. The technical scheme of the disclosure is as follows:

according to an aspect of the embodiments of the present disclosure, an account obtaining method is provided, including:

determining a first account number which executes a forward interactive behavior on the sensitive content item based on the sensitive content item issued by a reference account number, wherein the forward interactive behavior refers to an interactive behavior which carries out a forward form on the sensitive content item;

determining a candidate account for issuing a target content item based on the target content item of which the forward interaction behavior is executed by the first account;

generating an account relation graph associated with the forward interaction behavior based on the reference account, the first account and the candidate account, wherein the account relation graph is used for representing a topological structure of a social relation between publisher accounts of content items of which the forward interaction behavior is executed by the first account;

and screening the candidate accounts to obtain target accounts forming an account group with the reference account based on the account relation graph.

In some embodiments, the generating an account relationship graph associated with the forward interaction activity based on the reference account number, the first account number, and the candidate account number comprises:

constructing nodes in the account relation graph based on the reference account and the candidate accounts;

and constructing edges for connecting nodes in the account relation graph based on the forward interactive behaviors executed by the first account.

In some embodiments, the constructing edges for connecting nodes in the account relationship graph based on the forward interactive behavior performed by the first account includes:

and under the condition that any one first account performs the forward interaction behavior on the sensitive content items issued by the reference account and the target content items issued by the candidate account, generating an edge for connecting the node of the reference account and the node of the candidate account in the account relation graph.

In some embodiments, the screening, based on the account relationship diagram, target accounts forming an account group with the reference account from the candidate accounts includes:

acquiring a maximum spanning tree of the account relation graph, wherein the maximum spanning tree has a maximum weight in a plurality of spanning trees of the account relation graph;

clustering the accounts indicated by the nodes contained in the maximum spanning tree to obtain a plurality of candidate account groups;

merging the candidate account groups to obtain a target account group;

and determining the candidate accounts included in the target account group as the target accounts.

In some embodiments, the obtaining the maximum spanning tree of the account relationship diagram includes:

based on the node similarity between two nodes connected with each edge in the account relation graph, assigning a weight to the edge;

and generating the maximum spanning tree based on the weight of each edge in the account relation graph, wherein the sum of the weights of the edges contained in the maximum spanning tree is maximum in the plurality of spanning trees.

In some embodiments, the node similarity refers to a ratio between the number of common neighbor nodes of the two nodes and a sum of the number of respective neighbor nodes of the two nodes.

In some embodiments, the generating the maximum spanning tree based on the weight of each edge in the account relationship graph includes:

initializing an empty tree, wherein the empty tree comprises a node set and an edge set, the node set comprises a randomly selected starting node, and the edge set is an empty set;

acquiring a plurality of candidate edges for connecting nodes in the node set and nodes outside the node set from the account relation graph;

adding a target edge with the maximum weight value in the candidate edges to the edge set, and adding nodes outside the node set connected by the target edge to the node set;

and repeating the operation of adding the target edges into the edge set until each target edge is added into the edge set, and determining the spanning tree formed by the edge set and the node set when the addition is stopped as the maximum spanning tree.

In some embodiments, the clustering the accounts indicated by the nodes included in the maximum spanning tree to obtain a plurality of candidate account groups includes:

screening each node contained in the maximum spanning tree to obtain a plurality of core nodes;

determining a plurality of candidate account groups by taking the plurality of core nodes as clustering centers respectively;

and clustering the nodes except the core nodes in the maximum spanning tree into a candidate account group in which the core node with the highest path similarity with the nodes is positioned.

In some embodiments, the screening, from the nodes included in the maximum spanning tree, a plurality of core nodes includes:

for any node contained in the maximum spanning tree, determining at least one neighbor node connected with the node existing edge in the maximum spanning tree;

in the at least one neighbor node, screening to obtain a target neighbor node of which the weight of a connecting edge between the node and the neighbor node is greater than or equal to a first weight threshold;

and under the condition that the sum of the weights of the connecting edges between the node and each target neighbor node is greater than a second weight threshold, determining the node as the core node.

In some embodiments, the method further comprises:

for any node except the core node in the maximum spanning tree, determining a communication path between the node and any core node in the maximum spanning tree, wherein the communication path is a path starting from the node and reaching the core node through a plurality of edges;

and determining the path similarity between the node and the core node based on the weight of each edge contained in the communication path.

In some embodiments, the merging the candidate account groups to obtain the target account group includes:

for any candidate account number group in the candidate account number groups, merging the candidate account number group and other candidate account number groups under the condition that a connecting edge exists between a core node of the candidate account number group and a core node of other candidate account number groups and the weight of the connecting edge is greater than a first weight threshold;

otherwise, merging the candidate account group with other candidate account groups with the maximum group similarity;

and repeatedly executing the operation of merging the candidate account groups until no candidate account groups can be merged, and screening the target account groups from the account groups obtained by merging each time.

In some embodiments, the method further comprises:

for any candidate account number group in the multiple candidate account number groups, determining the sum value of degree parameters of nodes associated with each account number contained in the candidate account number group as the group degree parameter of the candidate account number group, wherein the degree parameter of the node represents the number of edges connected with the node in the maximum spanning tree;

determining group similarity between the candidate account group and the other candidate account groups based on the group degree parameter of the candidate account group, the group degree parameters of the other candidate account groups, and the number of common edges in the candidate account group and the other candidate account groups.

In some embodiments, the filtering the target account group from the account groups obtained from each merging includes:

for the account group obtained by each combination, acquiring the modularity of the account group, wherein the modularity is used for measuring the dividing quality of the account group divided from the account relation graph;

and determining the account group with the highest modularity as the target account group.

In some embodiments, the forward interaction behavior comprises at least one of: a forward-form behavior for the content item, an attention behavior for a publisher account of the content item, a sharing behavior, or a downloading behavior.

In some embodiments, after determining to publish the candidate account number of the target content item, the method further includes:

and deleting the authenticated candidate account and the preset organization registered candidate account.

According to another aspect of the embodiments of the present disclosure, an account acquisition apparatus is provided, including:

the system comprises a determining unit, a judging unit and a processing unit, wherein the determining unit is configured to execute a sensitive content item issued based on a reference account, and determine a first account which executes a forward interactive behavior on the sensitive content item, and the forward interactive behavior refers to an interactive behavior which carries out a forward form on the sensitive content item;

the determining unit is further configured to execute a target content item of which the forward interaction behavior is executed based on the first account, and determine a candidate account for issuing the target content item;

a generating unit configured to generate an account relation graph associated with the forward interaction behavior based on the reference account, the first account and the candidate account, wherein the account relation graph is used for representing a topological structure of a social relation between publisher accounts of content items of which the forward interaction behavior is performed by the first account;

and the screening unit is configured to screen the candidate accounts to obtain target accounts forming an account group with the reference account based on the account relation graph.

In some embodiments, the generating unit comprises:

a node construction subunit configured to perform construction of a node in the account relationship graph based on the reference account and the candidate account;

and the edge construction subunit is configured to execute forward interaction behaviors executed based on the first account, and construct an edge used for connecting nodes in the account relationship graph.

In some embodiments, the edge construction subunit is configured to perform:

In some embodiments, the screening unit comprises:

an obtaining subunit, configured to perform obtaining a maximum spanning tree of the account relationship diagram, where the maximum spanning tree has a maximum weight among a plurality of spanning trees of the account relationship diagram;

the clustering subunit is configured to perform clustering on the accounts indicated by the nodes contained in the maximum spanning tree to obtain a plurality of candidate account groups;

a merging subunit, configured to merge the multiple candidate account groups to obtain a target account group;

and the determining subunit is configured to determine the candidate accounts included in the target account group as the target accounts.

In some embodiments, the obtaining subunit comprises:

the evaluation subunit is configured to execute node similarity between two nodes connected by each edge in the account relation graph, and assign a weight to the edge;

and the generating sub-unit is configured to execute generating the maximum spanning tree based on the weight of each edge in the account relation graph, wherein the sum of the weights of the edges contained in the maximum spanning tree is maximum in the plurality of spanning trees.

In some embodiments, the generating subunit is configured to perform:

In some embodiments, the clustering subunit comprises:

a screening subunit configured to perform screening to obtain a plurality of core nodes from each node included in the maximum spanning tree;

the determining subunit is configured to determine a plurality of candidate account groups by taking the plurality of core nodes as clustering centers respectively;

and the clustering subunit is configured to cluster the nodes except the core nodes in the maximum spanning tree into the candidate account group in which the core node with the highest path similarity with the nodes is positioned.

In some embodiments, the screening subunit is configured to perform:

In some embodiments, the determining subunit is further configured to perform:

In some embodiments, the merging subunit comprises:

the merging sub-unit is configured to execute merging of any one of the candidate account groups and the other candidate account groups when a connecting edge exists between a core node of the candidate account group and core nodes of the other candidate account groups and the weight of the connecting edge is greater than a first weight threshold;

the merging subunit is further configured to perform merging on the candidate account group and other candidate account groups with the largest group similarity if the merging subunit is not configured to perform merging;

the merging sub-unit is also configured to execute the operation of repeatedly merging the candidate account number groups until no candidate account number group can be merged;

and the group screening subunit is configured to perform screening to obtain the target account group from the account groups obtained by each merging.

In some embodiments, the merging subunit is further configured to perform:

In some embodiments, the group screening subunit is configured to perform:

In some embodiments, the apparatus further comprises:

and the deleting unit is configured to delete the authenticated candidate account and the preset organization registered candidate account.

According to another aspect of the embodiments of the present disclosure, there is provided a computer apparatus including:

one or more processors;

one or more memories for storing the one or more processor-executable instructions;

wherein the one or more processors are configured to perform the account acquisition method in any one of the possible implementations of the above-mentioned aspect.

According to another aspect of embodiments of the present disclosure, a computer-readable storage medium is provided, where at least one instruction of the computer-readable storage medium, when executed by one or more processors of a computer device, enables the computer device to perform an account acquisition method in any one of the possible implementations of the above-described aspect.

According to another aspect of the embodiments of the present disclosure, there is provided a computer program product including one or more instructions executable by one or more processors of a computer device, so that the computer device can perform the account obtaining method in any one of the possible implementations of the above-mentioned aspect.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

the method has the advantages that the reference account for releasing the sensitive content item and the candidate account for releasing the target content item can be connected in the account relation graph through the characteristic that the first account executes forward interactive behaviors on different content items, so that the account relation graph generated based on the first account which executes the forward interactive behaviors on the content items released by different accounts is constructed, and then community mining is carried out on the basis of the constructed account relation graph.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

Fig. 1 is a schematic diagram of an implementation environment of an account acquisition method according to an exemplary embodiment;

fig. 2 is a flowchart illustrating an account acquisition method according to an exemplary embodiment;

FIG. 3 is a flow chart illustrating a method for account acquisition according to an exemplary embodiment;

FIG. 4 is a schematic diagram illustrating comparison between a homo-composition and a hetero-composition provided by an embodiment of the present disclosure;

FIG. 5 is a flowchart of a method for obtaining a maximum spanning tree according to an embodiment of the present disclosure;

fig. 6 is a flowchart of acquiring a candidate account group according to an embodiment of the present disclosure;

fig. 7 is a flowchart of merging candidate account groups according to an embodiment of the present disclosure;

fig. 8 is a schematic diagram of negative account suppression performed after a target account is screened based on an account relationship diagram according to the embodiment of the present disclosure;

fig. 9 is a schematic flowchart of an account acquiring method according to an embodiment of the present disclosure;

fig. 10 is a schematic effect diagram of an account obtaining method according to an embodiment of the present disclosure;

fig. 11 is a block diagram illustrating a logical structure of an account acquisition apparatus according to an exemplary embodiment;

fig. 12 is a schematic structural diagram of a computer device according to an embodiment of the present disclosure.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

Information (including but not limited to user device information, user personal information, etc.), data (including but not limited to data for analysis, stored data, presented data, etc.), and signals to which the present disclosure relates are all authorized by the user or sufficiently authorized by various parties, and the collection, use, and processing of the relevant data requires compliance with relevant laws and regulations and standards in the relevant countries and regions. For example, the account number and the interactive behavior performed by the account number in the present disclosure are obtained under sufficient authorization.

In some embodiments, the meaning of a and/or B includes: a and B, and three cases of A and B.

Hereinafter, terms related to the embodiments of the present disclosure are explained.

Degree parameter (Degree): in a topological graph, the degree parameter of each node refers to the number of edges connected with the node in the topological graph, that is, the number of edges starting from the node in the topological graph is the value of the degree parameter of the node.

Modularity (Modularity): the modularity is also called as modularization metric, which is a method for measuring the structural strength of a network community, and the symbol "Q" is usually used in a community mining algorithm to represent the modularity. The value of the modularity depends on community distribution of nodes in the social network (namely, community division conditions of the network, which refer to the division conditions of account groups in the embodiment of the disclosure), the modularity can be used for quantitatively measuring the division quality of the network community, the larger the value of the modularity is, the stronger the strength of the community structure representing the network division is, the better the division quality of the community is, and conversely, the smaller the value of the modularity is, the weaker the strength of the community structure representing the network division is, and the worse the division quality of the community is. Therefore, when the account group with the highest modularity is selected, the division mode of the account group, which is the optimal community in the social network, can be obtained, and the process is also called as the process of searching the optimal division of the social network.

With the development of computer technology and the diversification of terminal functions, a user can browse contents published on a platform by creators (such as a main broadcaster, an up-host, and the like) at any time and any place by using a terminal, but some creators publish sensitive contents in the platform, so how to accurately identify and hit accounts publishing sensitive contents becomes a problem that needs to be solved urgently in a security risk control scenario of the platform.

Currently, when identifying accounts issuing sensitive content, a community (i.e., an account group) formed by accounts frequently issuing sensitive content is sought to be mined based on some unsupervised community mining algorithms, such as Fast Unfolding Algorithm (Fast Unfolding) Algorithm, label Propagation Algorithm (LPA) and the like, which mine communities based on modularity.

For the Fast Unfolding algorithm, in the community mining process, only the accounts outputting the homogeneous content are divided into the same community, but the accounts contained in the communities may never release the sensitive content, that is, an elimination mechanism for invalid nodes is lacked, the invalid nodes influence the community construction, so that many accounts which are contrary to the business meaning are easily defined, and the identification accuracy of the accounts releasing the sensitive content is poor. In addition, because the Fast Unfolding algorithm is an unsupervised community mining algorithm, how to evaluate communities, find communities with business significance, how to construct edges and nodes with business significance, how to set weights of edges, and the like are difficult points in a security risk control scene.

For the LPA algorithm, by propagating tags in a data set and allocating the tags to unmarked nodes, each node initializes a tag at an initial stage, and in each iteration, the tag with the largest number of tags in the nodes connected to the node is updated to the tag of itself, and as the community tags are continuously propagated, eventually, the nodes connected tightly will have a common tag, so the LPA algorithm itself lacks a mechanism for removing the noisy nodes, i.e., invalid nodes, and how to configure a screening mechanism for the tags of the nodes is also a technical difficulty, and the updating sequence of the node tags is random, but it is obvious that the earlier the more important nodes are updated, the convergence process is accelerated, and in addition, if more than one tag with the largest number of tags in other nodes connected to a certain node is provided, the most tags are randomly selected as the tags of the node, and the randomness may cause an effect, i.e., the error of the division of the community due to the randomness just started may be continuously amplified as the tags are propagated, and therefore, the LPA algorithm may issue the identification accuracy of the sensitive content is relatively poor.

In view of this, the disclosed embodiments relate to an account acquisition method, and relate to a Common Consumer-based Community Mining (CCCM) algorithm, in the field of security risk control, in the case of a part of reference accounts that issue sensitive content items, a known reference account and an unknown candidate account are joined by a Common Consumer, so as to construct a complex Community, and an unknown target account that may issue a sensitive content item is mined from the complex Community, where the Common Consumer refers to a viewer who has performed the same forward interactive behavior on the sensitive content item issued by the reference account and the target content item issued by the candidate account, therefore, potential security risks can be discovered on the basis of the CCCM algorithm auxiliary platform, complex communities can be constructed aiming at application scenes of various different content items (such as videos, live broadcasts, comments and the like), and accordingly automatic discovery of the security risks can be achieved.

Hereinafter, a system architecture according to an embodiment of the present disclosure will be explained.

Fig. 1 is a schematic diagram of an implementation environment of an account acquisition method according to an exemplary embodiment, and referring to fig. 1, in the implementation environment, a terminal 101 and a server 102 may be included, where the terminal 101 and the server 102 are both a computer device, and details are described below.

The terminal 101 may be any computer device supporting a content item browsing service, and an application program for browsing a content item is installed on the terminal 101, where the content item refers to an information item carrying a certain content in the form of a multimedia resource, for example, the content item includes but is not limited to audio, video, live broadcast, short video, information, comments, and the like, and the application program may include at least one of a short video application, a live broadcast application, an audio-video application, or a social application, for example.

In some embodiments, after a user registers an account in an application program, the user can log in the account in the application program, and then the user can also publish his video works in the platform through the account, or start live broadcasting after being authenticated as a main broadcasting.

In some embodiments, after the user logs in the account, the server 102 may push some content items to the account (for example, in the form of a video stream, a feeds stream, an information stream, and the like), and when the user is interested in the pushed content item, the user may click to browse the content item and initiate an interactive behavior on the content item using the account, and for example, a video work issued by the content item by another person, the interactive behavior may be divided into a positive interactive behavior and a negative interactive behavior, where the positive interactive behavior refers to approval of the video work, an account of a publisher who pays attention to the video work, sharing of the video work, downloading of the video work, and the negative interactive behavior refers to reporting of the video work or other negative behaviors.

The terminal 101 and the server 102 may be connected through a wired network or a wireless network.

The server 102 may be a computer device for providing a background service for the application program, and the server 102 may include at least one of a server, a plurality of servers, a cloud computing platform, or a virtualization center. Alternatively, the server 102 may undertake primary computational tasks and the terminal 101 may undertake secondary computational tasks; or, the server 102 undertakes the secondary computing work, and the terminal 101 undertakes the primary computing work; alternatively, the terminal 101 and the server 102 cooperatively compute by adopting a distributed computing architecture.

In some embodiments, when performing security risk control on the platform, the server 102 may learn a part of known and found reference accounts which frequently issue sensitive content items, for example, such reference accounts may refer to malicious accounts, negative accounts, abnormal accounts, and the like, typically by filtering the report information of the user, manually screening the content items, and the like. By the account acquisition method provided by the embodiment of the disclosure, the potential target accounts forming the account group with the reference accounts in the platform can be mined based on the CCCM algorithm on the basis of the known reference accounts and the known sensitive content items, and the target accounts can be accurately pressed, so that the purposes of automatically pressing abnormal accounts and negative videos are achieved. Compared with the mode that whether the content items issued by each account are sensitive content items is judged from content dimensions, and then the issuer accounts which are identified as the sensitive content items are pressed, the method has higher account identification accuracy, because misjudgment and other conditions are easily caused when a machine judges the sensitive content items, and the connection between creators of the content items is ignored, but the method is based on a community mining algorithm connected with common consumers, the nodes and the edges in the account relation graph are constructed based on the common consumers, and under the condition that the nodes and the edges in the account relation graph have business significance, the whole community mining process has higher accuracy, the inherent connection between the creators of the sensitive content items can be found, the method is beneficial to finding and pressing more abnormal accounts, the abnormal accounts can be subjected to batch processing, and a complex relation network is established through the found reference accounts, so that more potential and unknown targets are found, accurate community risk discovery is realized, and account risks and the account has better business risks and business effects.

Optionally, the terminal 101 may refer to one of a plurality of terminals in general, and the device type of the terminal 101 includes but is not limited to: the mobile terminal comprises at least one of a vehicle-mounted terminal, a television, a smart phone, a smart sound box, a tablet computer, an electronic book reader, an MP3 (Moving Picture Experts Group Audio Layer III, motion Picture Experts compression Standard Audio Layer 3) player, an MP4 (Moving Picture Experts Group Audio Layer IV, motion Picture Experts compression Standard Audio Layer 4) player, a laptop portable computer or a desktop computer. The following embodiments are exemplified in which the terminal 101 includes a smartphone.

Those skilled in the art will appreciate that the number of terminals 101 may be greater or less. For example, the number of the terminals 101 may be only one, or the number of the terminals 101 may be several tens or hundreds, or more. The number and the device type of the terminals 101 are not limited in the embodiment of the present disclosure.

Fig. 2 is a flowchart of an account obtaining method according to an exemplary embodiment, and referring to fig. 2, the account obtaining method is applied to a computer device, and the following description takes the computer device as a server as an example.

In step 201, the server determines, based on a sensitive content item issued by a reference account, a first account that has performed a forward interactive behavior on the sensitive content item, where the forward interactive behavior is an interactive behavior in a forward form on the sensitive content item.

The reference account related in the embodiment of the present disclosure refers to a marked, known, and found account that has issued a sensitive content item once, for example, the reference account refers to a malicious account, a negative account, an abnormal account, and the like.

The sensitive content item in the embodiment of the present disclosure refers to a content item that includes sensitive information or implies sensitive information, and the content item refers to an information item that is in a form of a multimedia resource and carries specific content, for example, the content item may be a video, an audio, a live broadcast, information, a comment, and the like, which is not specifically limited in the embodiment of the present disclosure.

The interactive behavior related in the embodiment of the present disclosure refers to a behavior for implementing human-computer interaction with a browsed sensitive content item, and optionally, the interactive behavior relates to a positive interactive behavior and a negative interactive behavior, the positive interactive behavior refers to an interactive behavior for performing a positive morphological operation on the sensitive content item, and the negative interactive behavior refers to an interactive behavior for performing a negative morphological operation on the sensitive content item, for example, the positive interactive behavior includes at least one of the following: a forward-form behavior for the content item, an attention behavior for a publisher account of the content item, a sharing behavior, or a downloading behavior, and the like, wherein the forward-form behavior includes, but is not limited to: such as praise to content items, reward content items, favorite content items, etc., and the negative interaction behavior comprises at least one of: the reporting sensitive content item, the complaint sensitive content item, a publisher account of the reporting content item, or a publisher account of the complaint content item, and the like, which are not specifically limited in this embodiment of the disclosure.

In some embodiments, the server may add a sensitive tag to a found, known, detected account that has issued a sensitive content item, and determine an account carrying the sensitive tag in the database as a reference account, for example, the server adds a sensitive tag to an account that is reported and verified to issue a sensitive content item, or adds a sensitive tag to an account that has issued a sensitive content item and is prohibited in a target time period, and the like. The target time period is a time period set by a technician, for example, the target time period is the last 1 week, or the last 1 month, and the like, which is not specifically limited in the embodiment of the present disclosure.

In some embodiments, after the reference account is obtained, if a sensitive content item issued by the reference account has already been marked, the marked sensitive content item may be directly obtained, or if a content item issued by the reference account has not been marked, the content item issued by the reference account may be detected to determine whether a current content item is a sensitive content item, so as to obtain the detected sensitive content item.

In some embodiments, after the server obtains the labeled reference account numbers in the platform and the labeled sensitive content items issued by the reference account numbers, the account numbers that have performed forward interaction with the sensitive content items are determined as the first account numbers, and optionally, since there may be more than one forward interaction behavior, the server may perform the method provided by the embodiments of the present disclosure on each type of forward interaction behavior, so that different first account numbers can be defined for different forward interaction behaviors, and more potential target account numbers that issue the sensitive content items are mined from dimensions of multiple forward interaction behaviors.

In step 202, the server determines a candidate account for issuing a target content item based on the target content item for which the first account performed the forward interaction behavior.

The target content item related to the embodiment of the present disclosure refers to other content items, except the sensitive content item, for which the first account performed a forward interactive behavior, for example, in a case that the forward interactive behavior is a forward morphological behavior for the content item, where the forward morphological behavior includes, but is not limited to: for example, when the forward form behavior is the like of likes and dislikes for content items, such as content item likes and dislikes, after the sensitive content items issued by the reference account and the reference account are acquired, the first account that has liked the sensitive content items is determined, and then, other content items that have liked by the first account are acquired as target content items.

In some embodiments, after the server acquires the first account through the step 201, it may determine a target content item for which the first account has performed a forward interaction behavior, for example, the target content item for which the forward interaction behavior has been performed is queried from a historical browsing record of the first account for content items, or a historical interaction list is maintained for each content item for which the forward interaction behavior has been performed by the first account, and all content items on the historical interaction list are determined as the target content item.

Optionally, in a case that a technician sets a statistical time period, the server may only obtain a target content item for which the first account executed the forward interaction behavior within the statistical time period, because the interest preference of the first account may dynamically change at different periods, and therefore, the target content item is obtained only for the preset statistical time period, and a pre-screening mechanism for the target content item can be implemented, for example, the statistical time period is the last 1 week, or the last 1 month, and the like, and for example, the statistical time period is the first 1 week and the last 1 week after the first account browses the sensitive content item, and a setting manner of the statistical time period is not specifically limited in the embodiment of the present disclosure.

In some embodiments, after determining the target content item, the server determines an account for issuing the target content item, that is, an account of an issuer of the target content item, as a candidate account, it should be noted that, since the reference accounts referred to in step 201 may be one or more, one or more first accounts that have performed forward interaction with the sensitive content item issued by each reference account may also be one or more, and one or more target content items that have also performed forward interaction with each first account may also be one or more, the candidate account that is finally determined to issue the target content item may also be one or more, which is not specifically limited in the embodiment of the present disclosure.

In step 203, the server generates an account relationship diagram associated with the forward interaction behavior based on the reference account, the first account and the candidate account, where the account relationship diagram is used to characterize a topology structure of a social relationship between publisher accounts of content items that have been executed by the first account for the forward interaction behavior.

In some embodiments, the server constructs nodes in an account relationship graph by using a reference account and a candidate account, and constructs edges used for connecting the nodes in the account relationship graph based on whether the same first account performs a forward interaction behavior on both a sensitive content item issued by the reference account and a target content item issued by the candidate account, so as to finally obtain the account relationship graph formed by each node and each edge, for example, the account relationship graph may be characterized by using G (V, E), G represents the account relationship graph, V represents a node set formed by the nodes in the account relationship graph, and E represents an edge set formed by the edges in the account relationship graph.

In step 204, the server screens the candidate accounts to obtain target accounts forming an account group with the reference account based on the account relationship diagram.

In some embodiments, after constructing the original account relationship diagram, the server may perform community mining on the basis of the account relationship diagram based on a CCCM algorithm to find account groups formed by clustering known reference accounts and unknown candidate accounts, where the candidate accounts included in the account groups are target accounts capable of forming account groups with the reference accounts.

In the process, the reference account number for issuing the sensitive content item and the candidate account number for issuing the target content item can be connected in the account relation graph through the characteristic that the forward interaction action is executed on different content items by the first account number, so that the account relation graph based on common consumers (the first account number for executing the forward interaction action on the content items issued by the different account numbers) is constructed, and then the community mining is carried out on the basis of the constructed account relation graph, so that the identification accuracy of the target account number is improved, and more potential target account numbers are mined.

According to the method provided by the embodiment of the disclosure, by the characteristic that the first account performs forward interactive behaviors on different content items, the reference account for issuing the sensitive content item and the candidate account for issuing the target content item can be linked in the account relationship graph, so that the account relationship graph generated based on the first accounts performing the forward interactive behaviors on the content items issued by different accounts is constructed, and then community mining is performed on the basis of the constructed account relationship graph.

In one possible embodiment, generating an account relationship graph associated with the forward interaction behavior based on the reference account, the first account, and the candidate account includes:

constructing nodes in the account relation graph based on the reference account and the candidate account;

and constructing edges for connecting nodes in the account relation graph based on the forward interaction behavior executed by the first account.

In one possible embodiment, constructing edges for connecting nodes in the account relationship graph based on the forward interaction behavior performed by the first account includes:

and under the condition that any one first account performs the forward interaction behavior on the sensitive content items issued by the reference account and the target content items issued by the candidate account, generating an edge for connecting the node of the reference account and the node of the candidate account in the account relation diagram.

In a possible implementation manner, screening, based on the account relationship diagram, target accounts forming an account group with the reference account from the candidate accounts includes:

obtaining a maximum spanning tree of the account relation graph, wherein the maximum spanning tree has a maximum weight in a plurality of spanning trees of the account relation graph;

clustering accounts indicated by nodes contained in the maximum spanning tree to obtain a plurality of candidate account groups;

merging the candidate account groups to obtain a target account group;

and determining the candidate accounts contained in the target account group as the target account.

In one possible implementation, obtaining the maximum spanning tree of the account relationship graph includes:

based on the node similarity between two nodes connected with each edge in the account number relation graph, assigning a weight to the edge;

and generating the maximum spanning tree based on the weight of each edge in the account number relation graph, wherein the sum of the weights of the edges contained in the maximum spanning tree is maximum in the plurality of spanning trees.

In one possible implementation, the node similarity refers to a ratio between the number of common neighbor nodes of the two nodes and the sum of the number of respective neighbor nodes of the two nodes.

In one possible embodiment, generating the maximum spanning tree based on the weight of each edge in the account relationship graph includes:

acquiring a plurality of candidate edges for connecting the nodes in the node set and the nodes outside the node set from the account relation graph;

adding a target edge with the maximum weight value in the candidate edges to the edge set, and adding nodes outside the node set connected with the target edge to the node set;

and repeating the operation of adding the target edges into the edge set until all the target edges are added into the edge set, and determining the spanning tree formed by the edge set and the node set when the addition is stopped as the maximum spanning tree.

In one possible implementation, clustering accounts indicated by nodes included in the maximum spanning tree to obtain a plurality of candidate account groups includes:

screening a plurality of core nodes from each node contained in the maximum spanning tree;

and clustering nodes except the core node in the maximum spanning tree into a candidate account group in which the core node with the highest path similarity with the node is located.

In one possible embodiment, the screening a plurality of core nodes from the nodes included in the maximum spanning tree includes:

and determining the node as the core node under the condition that the sum of the weights of the connecting edges between the node and each target neighbor node is greater than a second weight threshold.

In one possible embodiment, the method further comprises:

for any node except the core node in the maximum spanning tree, determining a communication path between the node and any core node in the maximum spanning tree, wherein the communication path is a path which starts from the node and can reach the core node through a plurality of edges;

In a possible implementation manner, merging the multiple candidate account groups to obtain a target account group includes:

for any candidate account group in the plurality of candidate account groups, merging the candidate account group and other candidate account groups under the condition that a connecting edge exists between the core node of the candidate account group and the core nodes of other candidate account groups and the weight of the connecting edge is greater than a first weight threshold;

and repeatedly executing the operation of merging the candidate account groups until no candidate account group can be merged, and screening the target account group from the account groups obtained by merging each time.

In one possible embodiment, the method further comprises:

for any candidate account number group in the multiple candidate account number groups, determining a sum value of degree parameters of nodes associated with each account number contained in the candidate account number group as the group degree parameter of the candidate account number group, wherein the degree parameter of the node represents the number of edges connected with the node in the maximum spanning tree;

and determining the group similarity between the candidate account group and the other candidate account groups based on the group degree parameter of the candidate account group, the group degree parameters of the other candidate account groups and the number of the common edges in the candidate account group and the other candidate account groups.

In one possible embodiment, the screening of the target account group from the account groups obtained from each merging includes:

for the account group obtained by each combination, acquiring the modularity of the account group, wherein the modularity is used for measuring the dividing quality of the account group divided from the account relation diagram;

and determining the account group with the maximum modularity as the target account group.

In one possible embodiment, the forward interaction behavior comprises at least one of: a forward-form behavior for the content item, an attention behavior for a publisher account of the content item, a sharing behavior, or a downloading behavior.

In one possible implementation, after determining the candidate account for publishing the target content item, the method further includes:

and deleting the authenticated candidate account and the candidate account registered by the preset organization.

All the above optional technical solutions may be combined arbitrarily to form optional embodiments of the present disclosure, and are not described in detail herein.

Fig. 3 is a flowchart illustrating an account acquisition method according to an exemplary embodiment, where as shown in fig. 3, the account acquisition method is applied to a computer device, and is described by taking the computer device as a server as an example, where the embodiment includes the following steps.

In step 301, the server determines, based on a sensitive content item issued by a reference account, a first account that has performed a forward interactive behavior on the sensitive content item, where the forward interactive behavior is an interactive behavior in a forward form on the sensitive content item.

In some embodiments, the interactive behavior that the account can initiate on the browsed content item includes a positive interactive behavior and a negative interactive behavior, the positive interactive behavior refers to an interactive behavior for performing a positive appearance on the content item, and the negative interactive behavior refers to an interactive behavior for performing a negative appearance on the content item.

Optionally, the forward interaction behavior comprises at least one of: a forward-form behavior for the content item, an attention behavior for a publisher account of the content item, a sharing behavior, or a downloading behavior, and the like, wherein the forward-form behavior includes, but is not limited to: such as praise to content items, reward content items, favorite content items, etc.; optionally, the negative interaction behavior comprises at least one of: reporting a content item, a complaint content item, a publisher account of the reporting content item, or a publisher account of the complaint content item, which is not specifically limited in this embodiment of the disclosure.

In some embodiments, the server may add a sensitive tag to a found, known and detected account that has issued a sensitive content item, and determine an account carrying the sensitive tag in the database as a reference account, for example, the server adds a sensitive tag to an account that has been reported and verified to issue a sensitive content item, or the server adds a sensitive tag to an account that has issued a sensitive content item and is blocked in a target time period, and the like. The target time period is a time period set by a technician, for example, the target time period is the last 1 week, or the last 1 month, and the like, which is not specifically limited in the embodiment of the present disclosure.

In some embodiments, after the reference account is obtained, if the sensitive content item issued by the reference account has been marked, the marked sensitive content item may be directly obtained, or if the content item issued by the reference account has not been marked, the content item issued by the reference account may be detected to determine whether the current content item is a sensitive content item, so as to obtain the detected sensitive content item.

In some embodiments, after the server obtains a reference account labeled in the platform and a sensitive content item labeled by the reference account, the server determines an account that has performed a forward interaction behavior on the sensitive content item as a first account, and optionally, since there may be more than one forward interaction behavior, the server may perform the method provided by the embodiments of the present disclosure on each type of forward interaction behavior, so that different first accounts can be defined for different forward interaction behaviors, and different account relationship diagrams are constructed, so that a target account potentially publishing the sensitive content item is mined from dimensions of more forward interaction behaviors from account relationship diagrams associated with each of the multiple forward interaction behaviors, thereby improving a coverage of identifying the target account.

In step 302, the server determines a candidate account for issuing a target content item based on the target content item for which the first account performed the forward interaction behavior.

In some embodiments, after the server acquires the first account through step 301, it may determine a target content item for which the first account has performed the forward interaction behavior, for example, query the target content item for which the forward interaction behavior has been performed from a history browsing record of the first account for the content item, or maintain a history interaction list for each content item for which the first account has performed the forward interaction behavior, and determine all content items on the history interaction list as the target content item.

Optionally, in a case that a technician sets a statistical time period, the server may only obtain a target content item for which the first account performs the forward interaction behavior within the statistical time period, because the interest preference of the first account may dynamically change at different times, so that the target content item is obtained only for the preset statistical time period, and a pre-screening mechanism for the target content item can be implemented, for example, the statistical time period is the last 1 week, or the last 1 month, and the like, and for example, the statistical time period is the first 1 week and the last 1 week of when the first account browses the sensitive content item, and a setting manner of the statistical time period is not specifically limited in the embodiment of the present disclosure.

In some embodiments, after determining the target content item, the server determines an account for publishing the target content item, that is, an account of a publisher of the target content item, as a candidate account, it should be noted that, because the reference accounts related to step 301 may be one or more, the first account that has performed the forward interaction behavior on the sensitive content item published by each reference account may also be one or more, and the target content item that has also performed the forward interaction behavior by each first account may also be one or more, the candidate account that has finally determined to publish the target content item may also be one or more, which is not specifically limited in the embodiment of the present disclosure.

In some embodiments, after the server determines a plurality of candidate accounts, it may further perform preliminary screening on the candidate accounts, for example, the server deletes the authenticated candidate accounts and the candidate accounts registered by a preset organization, optionally, the preset organization may be an official organization or an organization authenticated by a platform, a white list may be created on the server side, and the official organization and the organization authenticated by the platform are added to the white list, so that the candidate accounts registered by the preset organization on the white list may be directly deleted, so as to improve the account deletion efficiency.

In step 303, the server constructs a node in an account relationship diagram based on the reference account and the candidate account, where the account relationship diagram is used to characterize a topology structure of a social relationship between publisher accounts of content items on which the first account has performed the forward interaction behavior.

In some embodiments, the server uses each reference account obtained in step 301 and each candidate account obtained in step 302 as a node in an account relationship diagram, where each node has a one-to-one association relationship with one account (reference account or candidate account). In other words, a corresponding node is constructed in the account relationship diagram for each reference account, and similarly, a corresponding node is constructed in the account relationship diagram for each candidate account.

In some embodiments, the notation G is used to characterize the account relationship diagram, and then when storing the account relationship diagram, a node set V and an edge set E may be used for storage, that is, G (V, E) represents the account relationship diagram G determined by the node set V and the edge set E, where V represents a node set formed by nodes in the account relationship diagram, and E represents an edge set formed by edges in the account relationship diagram.

In the above case, the server may add all reference accounts and all candidate accounts to one account set, then allocate a uniquely associated node identifier to each account in the account set, and add the respective associated node identifiers of all accounts to the node set V, so as to construct nodes in the account relationship diagram based on the reference accounts and the candidate accounts.

In step 304, the server constructs an edge for connecting nodes in the account relationship graph based on the forward interaction performed by the first account.

In some embodiments, the server constructs edges for connecting nodes in the account relationship graph based on whether the same first account performs a forward interaction behavior on both the sensitive content item issued by the reference account and the target content item issued by the candidate account, and finally obtains an account relationship graph formed by each node and each edge.

In some embodiments, for any reference account and any candidate account, when any first account performs the forward interaction behavior on both the sensitive content item issued by the reference account and the target content item issued by the candidate account, an edge for connecting the node of the reference account and the node of the candidate account is generated in the account relation graph, that is, the edge for connecting the node of the reference account and the node of the candidate account is added to the edge set E.

In an exemplary scenario, for a reference account i and a candidate account j, if there is a first account k, the first account k performs a forward interaction behavior (e.g., a praise behavior in a forward morphological behavior) on a sensitive content item issued by the reference account i, and also performs the same forward interaction behavior (e.g., a praise behavior in the forward morphological behavior) on a target content item issued by the candidate account j, that is, the first account k approves both the sensitive content item issued by the reference account i and the target content item issued by the candidate account j, then an edge < u, v > for connecting a node u of the reference account i and a node v of the candidate account j is constructed in an account relationship diagram, and the edge < u, v > is added to an edge set E.

From the business perspective, an account (including a reference account and a candidate account) for issuing a content item may be referred to as a content producer (e.g., a video producer, a live broadcast producer, or a host), and an account (e.g., a first account) for browsing the content item may be referred to as a content consumer (e.g., a video consumer, a live broadcast room viewer), and in the process of constructing the account relationship diagram, the content producer is used as a node, and an edge for connecting the node is constructed based on the content consumer, optionally, an account that is authenticated or pre-organized and registered is removed, and an account that is authenticated or pre-organized and registered is removed, so that a screening mechanism for an invalid node can be implemented, optionally, a negative interaction behavior executed by the content consumer is removed, an edge is constructed only for a positive interaction behavior, and a negative consumer (a negative interaction account is executed by the negative consumer for a sensitive content item, which indicates that the sensitive content item is not interested in itself, but a noise signal is introduced), so that a screening mechanism for the invalid edge can be implemented, and thus the accuracy of the constructed relationship diagram can be improved.

Taking the content item as an example for explanation, the process of constructing the account relationship diagram may be as shown in table 1:

TABLE 1

On the basis of table 1, a video producer is used as a node of an account relationship diagram, edges in the account relationship diagram are constructed based on a forward consumer, the constructed account relationship diagram belongs to an abnormal composition diagram of a community, the abnormal composition diagram is relative to a same composition diagram, the same composition diagram refers to a topological diagram which does not distinguish types of nodes and edges, namely node type + edge type =2, the heterogeneous diagram refers to a topological diagram which needs to distinguish types of nodes and edges, namely node type + edge type > 2, as shown in fig. 4, fig. 4 is a comparison schematic diagram of a same composition diagram and an abnormal composition diagram provided by the embodiment of the disclosure, and shows a same composition diagram 401 and a heterogeneous diagram 402, it can be seen that in the same composition diagram 401, social topological relationships of 4 social accounts a to D are constructed based on social friend relationships, wherein accounts a and D are in an account relationship, accounts B and C are in a social friend relationship with each other, and B and D are also in a friend relationship with each other, and in the same composition diagram 401, different types of nodes and different types of edges are not required to distinguish; in the heterogeneous graph 402, an account a executes a forward interactive behavior of consuming videos (i.e., browsing videos), an account B executes two forward interactive behaviors of comments and social privacy, and an account C executes a forward interactive behavior of watching live broadcasts, and different 4 account relationship graphs can be constructed according to different types of the forward interactive behaviors, that is, different edge types can be distinguished in the account relationship graphs according to different types of the forward interactive behaviors, so as to construct account relationship graphs associated with different forward interactive behaviors.

On one hand, account numbers which are registered by the authentication or preset organization are removed, so that invalid nodes can be deleted from the nodes of the account number relationship graph, and because the first account number which is interested in the sensitive content item and the account number which is not interested in the sensitive content item generally pay attention to the content items (such as news, hot spots, information and the like) issued by the account numbers which are registered by the authentication or preset organization, the account numbers which are registered by the authentication or preset organization can introduce noise interference to community mining, and the account numbers which are registered by the authentication or preset organization are removed in the preprocessing stage, so that the invalid nodes in the account number relationship graph can be effectively inhibited, and the accuracy of subsequent community mining is improved.

On the other hand, because only the positive consumers (i.e., the first account numbers performing the positive interaction behavior) are used to construct edges in the account relationship diagram, which is equivalent to removing the negative consumers, the negative consumers are because the negative consumers perform the negative interaction behavior on the sensitive content items, which indicates that the positive consumers are not interested in the sensitive content items, and thus noise interference is introduced instead, and the positive consumers perform the positive interaction behavior on the sensitive content items, which indicates that the positive consumers are relatively interested in the sensitive content items, and possibly browse some similar sensitive content items, therefore, the issuer account numbers (i.e., candidate account numbers) of the target content items on which the positive consumers have performed the positive interaction behavior are connected with the corresponding reference account numbers, the account relationship diagram can be constructed by the common positive consumers, thereby effectively suppressing invalid edges in the account relationship diagram, and improving the accuracy of subsequent community mining.

In the above steps 303-304, one possible embodiment of generating an account relationship diagram associated with the forward interaction behavior based on the reference account, the first account and the candidate account is shown, since there may be more than one forward interaction behavior, the server may determine a corresponding first account for each forward interaction behavior and generate an account relationship diagram associated with the forward interaction behavior, for example, generating an account relationship diagram G for the like behavior ₁ Generating an account relation graph G for the attention behavior (attention reference account or attention candidate account) of the publisher account of the content item ₂ Generating an account relation graph G for the sharing behavior ₃ Generating an account relation graph G for the downloading behavior ₄ For each account relationship graph, the target accounts screened by the corresponding forward interaction behavior and potentially releasing sensitive content can be mined through the following steps 305 to 308, so that the coverage of identifying the target accounts can be improved, and the security risk possibly existing in the platform can be comprehensively mined and discovered from multiple dimensions.

In step 305, the server obtains a maximum spanning tree of the account relationship graph, the maximum spanning tree having a maximum weight among the spanning trees of the account relationship graph.

It should be noted that a graph in which a circuit is connected and does not exist is called a tree, and if the generated subgraph T of a graph G is a tree, the tree T is called the spanning tree of the graph G.

In the embodiment of the present disclosure, the connected generator subgraph in the account number relationship graph G and without a loop is a generator subgraph of the account number relationship graph G, that is, the generator tree T of the account number relationship graph G, further, when the account number relationship graph G is a undirected weighted graph, each edge in the account number relationship graph is an undirected edge carrying a weight, so that the sum of weights of all edges included in the generator tree T can be obtained for each generator tree T, where the generator tree T having the largest weight (that is, the sum of weights of all edges in the generator tree) is referred to as a maximum generator tree.

In some embodiments, since the above steps 303 to 304 only describe how to construct nodes and edges in the account relationship graph, but to obtain the maximum spanning tree, a weight needs to be given to each edge, so that the server may give a weight to each edge through the following step 3051, and further obtain the maximum spanning tree based on the assigned undirected weighted account relationship graph through the following step 3052.

Fig. 5 is a flowchart of a method for obtaining a maximum spanning tree according to an embodiment of the present disclosure, and as shown in fig. 5, a possible implementation of obtaining the maximum spanning tree is as follows, involving steps 3051 and 3052:

in step 3051, the server assigns a weight to each edge based on the node similarity between two nodes connected by the edge in the account relationship graph.

In some embodiments, for each edge included in the edge set of the account number relationship graph, the server may determine two nodes connected by the edge, then obtain the node similarity between the two nodes, and further assign a weight to the edge according to the node similarity.

In one exemplary scenario, for connecting the kth node v _k And the (k + 1) th node v _k+1 Is not limited by<v _k ，v _k+1 >For the sake of example, the kth node v will be described _k And the (k + 1) th node v _k+1 Node similarity between as edges<v _k ，v _k+1 >The weight of (c) can be recorded as ω (upsilon) _k ，υ _k+1 ) Each edge in the edge set E of the account number relationship graph G can be assigned with a weight value in the above manner, and after all edges in the edge set E are traversed, all edges in the account number relationship graph G can be assigned with values.

In some embodiments, the node similarity between two nodes connected to each edge in the pairwise account number relationship graph may be determined based on the number of common neighbor nodes owned by the two nodes and the number of neighbor nodes owned by each of the two nodes, for example, a ratio between the number of common neighbor nodes of the two nodes and a sum of the numbers of neighbor nodes owned by each of the two nodes is used as the node similarity between the two nodes, in other words, the node similarity refers to a ratio between the number of common neighbor nodes of the two nodes and a sum of the numbers of neighbor nodes owned by each of the two nodes.

Illustratively, for the kth node v _k And the (k + 1) th node V _k+1 In other words, suppose node v _k Having N ₁ A neighbor node which can be connected with the node v through an edge in the account relation graph _k The number of connected nodes is N ₁ Put another way, a node v is used in the edge set of the account relation graph _k The number of edges as end points is N ₁ Suppose node v _k+1 Having N ₂ A neighbor node which can be connected with the node v through an edge in the account relation graph _k+1 The number of connected nodes is N ₂ In other words, a node v is used in an edge set of the account relation graph _k+1 The number of edges as end points is N ₂ Suppose node v _k And node v _k+1 The number of the owned common neighbor nodes is N ₃ I.e. the presence of N ₃ Each neighbor node simultaneously shares with the node v _k And node v _k+1 Are connected, therefore, node v _k And node v _k+1 The node similarity between the two is omega (upsilon) _k ，υ _k+1 )＝N ₃ /(N ₁ +N ₂ )。

In the process, the node similarity is obtained by using the ratio of the number of the common neighbor nodes between the two nodes to the sum of the number of the neighbor nodes of the two nodes, the node similarity can be used for representing the number of shared associated account numbers which construct a social topological relation based on common positive consumers and are between the two account numbers indicated by the two nodes, the ratio of the number of the shared associated account numbers to the sum of the number of the associated account numbers of the two account numbers indicates the coincidence ratio of the common consumers of content items issued by the two account numbers, so that the node similarity with business significance between the two account numbers can be provided, and the expression capability of the node similarity is improved.

In step 3052, the server generates the maximum spanning tree based on the weight of each edge in the account relationship diagram, wherein the sum of the weights of the edges included in the maximum spanning tree is maximum among the spanning trees.

In some embodiments, each edge in the account relationship graph G is given a sum of weights in the step 3051, and the account relationship graph G is changed from an undirected graph to a undirected weighted graph, at this time, all spanning trees of the account relationship graph G may be obtained first, then the sum of weights of all edges in the spanning trees is calculated for each spanning tree, and the spanning tree with the largest sum of weights is obtained as the largest spanning tree.

In other embodiments, the server may further generate a maximum spanning tree including n nodes and n-1 edges by first selecting and adding n-1 edges with the largest weight value to the set of empty trees T one by one, starting with constructing an empty tree T, through the following steps A1 to A4:

a1, initializing an empty tree, wherein the empty tree comprises a node set and an edge set, the node set comprises a starting node selected randomly, and the edge set is an empty set.

In some embodiments, when initializing the empty tree, a sample is randomly initialized from the account relation graph to a starting node x, and then, a node set of the empty tree is initialized to V _new = { x }, initialize the set of edges of the empty tree to an empty set, i.e., E _new ＝{}。

And A2, acquiring a plurality of candidate edges for connecting the nodes in the node set and the nodes outside the node set from the account relation graph.

In some embodiments, V is set for nodes _new Is determined for the connected node set V, node u _new Node u and node set V within _new Candidate edges for outer node v<u,v>To node set V _new Each element, i.e. each node, in each set performs the above-mentioned operations, the nodes in each set may form one or more candidate edges with nodes outside the sets, and after the traversal of the node set V is completed _new After all elements in the set, that is, all nodes in the set, summarizing a plurality of candidate edges acquired by all the nodes respectively, wherein the node V is an element in the node set V of the account relation graph but not the node set V of the current tree _new In other words, V ∈ V but V is not in the set of nodes V _new In (1).

And A3, adding the target edge with the maximum weight value in the candidate edges to the edge set, and adding the nodes outside the node set connected by the target edge to the node set.

In some embodiments, for all the candidate edges obtained in the step A2, the weight value given to each candidate edge in the step 3051 is obtained, the candidate edge with the largest weight value is determined as the target edge, and then the target edge is added to the edge set E of the maximum spanning tree _new In the method, since the candidate edge itself is an element in the edge set of the account relationship graph, it is assumed that the edge with the largest weight is the edge with the largest weight<u,v>Then, after a plurality of candidate edges are screened out from the edge set E of the account relationship graph through the step A2, the candidate edge with the largest weight in the edge set E is the target edge<u,v>Set of edges E added to current tree _new At this time, the target edge<u,v>Is added to the edge set E _new In (1), the target edge also needs to be matched<u,v>The included out-of-set node V is also added to the node set V _new In (1).

And A4, repeatedly executing the operations A2-A3 of adding the target edges to the edge set until all the target edges are added to the edge set, and determining the spanning tree formed by the edge set and the node set when the addition is stopped as the maximum spanning tree.

In some embodiments, the above steps A2-A3 are repeated, each time from the plurality of candidate edges, selecting the directional edge set E _new Adding the target edge with the maximum weight value into the node set V _new Adding the nodes outside the set connected with the target edge until any target edge cannot be found, and stopping adding the nodes to the edge set E _new In the node set V, a new edge is added, and naturally, the node set V is stopped _new In the method, a new node is added, which is equivalent to that all target edges are added to an edge set E _new And all interconnected nodes are added to the node set V _new In (2), the node set V formed at this time _new And edge set E _new A unique maximum spanning tree can be determined.

The maximum spanning tree is obtained through the steps A1-A4, so that the maximum spanning tree can be determined without obtaining all the spanning trees of the account relation graph and calculating the sum of the weights of all the edges contained in each spanning tree one by one, the calculation amount for obtaining the maximum spanning tree can be reduced, and the calculation efficiency for the maximum spanning tree is improved.

In the steps 3051-3052, the weight is assigned based on the node similarity, and the maximum spanning tree is generated based on the assigned account relation graph, so that nodes except for the maximum spanning tree can be eliminated during subsequent community mining, the calculation amount of the community mining algorithm is reduced, invalid nodes are further eliminated, and noise interference caused by the invalid nodes to the subsequent community mining algorithm is avoided.

In other embodiments, the maximum spanning tree may not be obtained, but the community mining algorithm may be directly executed on the original account relationship diagram, so that the amount of calculation in the process of obtaining the maximum spanning tree can be saved.

In step 306, the server clusters the accounts indicated by the nodes included in the maximum spanning tree to obtain a plurality of candidate account groups.

In some embodiments, the server clusters the accounts indicated by each node on the maximum spanning tree, so as to obtain a candidate account group formed by clustering the accounts when clustering is stopped, where this candidate account group is also referred to as a local community to be merged, before clustering starts, all the core nodes may be found from the maximum spanning tree, and then the core node is used as a clustering center of each candidate account group, and other nodes that are not core nodes are clustered to the candidate account group in which the core node that is most similar to the core node is located, so that the local community can be obtained by gradually clustering with the core node as the clustering center.

Fig. 6 is a flowchart for acquiring a candidate account group according to an embodiment of the present disclosure, and as shown in fig. 6, before a node clustering starts, all core nodes in a maximum spanning tree are found, and then the clustering starts with the core nodes as a clustering center, which is described below.

In step 3061, the server obtains a plurality of core nodes from each node included in the maximum spanning tree.

In some embodiments, for any node included in the maximum spanning tree, the server determines at least one neighbor node in the maximum spanning tree that is connected to the node by an edge, in other words, finds all neighbor nodes that can be connected to the node by an edge from the maximum spanning tree, that is, finds another end point of all edges existing from the maximum spanning tree with the node as an end point as a neighbor node of the node, and since the maximum spanning tree is a connected spanning subgraph of the account relationship graph, the number of neighbor nodes of each node is at least one, in other words, the number of neighbor nodes of each node is greater than or equal to 1.

In some embodiments, the server obtains, from the at least one neighbor node, a target neighbor node in which a weight of a connection edge between the node and the neighbor node is greater than or equal to a first weight threshold by screening, in other words, after obtaining the at least one neighbor node of each node, for each neighbor node, it may determine whether the weight of the connection edge used to connect the node and the neighbor node is greater than or equal to the first weight threshold, if the weight of the connection edge is greater than or equal to the first weight threshold, determine the current neighbor node as a target neighbor node, if the weight of the connection edge is less than the first weight threshold, continue to perform the above determination operation on a next neighbor node, and repeat the above operations until all neighbor nodes of the node are traversed, at this time, all the target neighbor nodes that are screened out may be obtained. The first weight threshold is a numerical value preset by a technician, or the first weight threshold is a numerical value sampled from a weight set formed by weights of all edges in the maximum spanning tree.

In some embodiments, when the sum of the weights of the connection edges between the node and each of the target neighbor nodes is greater than the second weight threshold, the server determines the node as the core node, in other words, after the server obtains all the target neighbor nodes through screening, the connection edges between the node and each target neighbor node can form a connection edge set: gamma-shaped _ε (u, υ) = { υ e belongs to Nghb (u) | omega (u, υ) ≧ epsilon }, wherein epsilon represents a first weight threshold, omega (u, υ) represents a weight of a connecting edge between a node u and a target neighbor node v of the node u, and Nghb (u) represents a neighbor node of the node u, namely the target neighbor node v is a node satisfying the condition that the weight omega (u, υ) of the connecting edge in the neighbor node of the Nghb (u) is greater than or equal to the first weight threshold epsilon, and all target neighbor nodes v and the connecting edge formed by the node u satisfy the condition<u，v>Can form a connecting edge set gamma _ε (u, upsilon), if connecting the edge set gamma _ε All connecting edges contained in (u, upsilon)<u，v>Is greater than a second weight threshold, assuming with | Γ | _ε (u, upsilon) | denotes Γ _ε All the connecting sides contained in (u, u)<u，v>Represents the second weight threshold value by mu, if | Γ is satisfied _ε And (u, upsilon) | > mu, determining the current node u as a core node.

In the process, for each node, a connecting edge with a weight greater than or equal to a first weight threshold is added to a connecting edge set, when the sum of the weights of the connecting edges included in the connecting edge set is greater than a second weight threshold, the current node is used as a core node, and the operations are repeatedly executed until all nodes in the maximum spanning tree are traversed, so that all core nodes in the maximum spanning tree can be screened.

In other embodiments, the server may select, as the core node, a node whose number of connection edges included in the connection edge set is greater than the number threshold, so that it can be ensured that the number of connection edges whose weights are greater than the number threshold is certainly greater than the number threshold. The quantity threshold may be any value greater than or equal to 1 preset by a technician, which is not specifically limited in the embodiment of the present disclosure.

In step 3062, the server determines a plurality of candidate account groups with the plurality of core nodes as clustering centers, respectively.

In some embodiments, each core node obtained by screening in step 3061 is initialized by using the core node as a clustering center to obtain a candidate account group only including an account indicated by the core node, and the operations are performed on each core node to obtain all initialized candidate account groups, before clustering starts, each candidate account group only includes an account indicated by a corresponding core node, where each candidate account group has a one-to-one association relationship with one core node.

In step 3063, the server clusters the nodes in the maximum spanning tree except the core node into the candidate account group where the core node with the highest path similarity with the node is located.

In some embodiments, the server needs to perform node clustering on all nodes (i.e., common nodes that are not core nodes) except for the core nodes in the maximum spanning tree, so as to divide the node into the candidate account group in which the core node with the highest path similarity is located. Optionally, for each node except the core node, the path similarity between the node and all the core nodes obtained by screening in step 3061 is calculated first, and then the node is clustered into the candidate account group where the core node with the highest path similarity is located, or the core nodes are sorted according to the sequence of the path similarities from large to small, and the node is divided into the candidate account group where the core node with the highest order of the path similarities is located.

In some embodiments, for any node in the maximum spanning tree other than the core node, the server may calculate the path similarity between the node and each core node filtered in step 3061 above by: in the maximum spanning tree, determining a communication path between the node and any core node, wherein the communication path refers to a path which starts from the node and can reach the core node through a plurality of edges, for example, for a non-core node s and a core node t, determining one or more communication paths p (s, t) which can connect the node s and the core node t; then, based on the weight of each edge included in the communication path, determining the path similarity between the node and the core node, for example, for each communication path p (s, t), the kth node v _k And its neighbor node v on the communication path _k+1 The weight value omega (upsilon) of the connecting edge between _k ，υ _k+1 ) Taking reciprocal, taking the sum of reciprocals of the weights of the connecting edges as the contribution degree of the current communication path p (S, t), taking the reciprocal of the sum of the contribution degrees of all the communication paths of the node S and the core node t as the path similarity degree S between the node S and the core node t, in other words, the path similarity degree S (S, t) between the node S and the core node t can be expressed as the following formula:

wherein S (S, t) represents the path similarity of the node S and the core node t, p (S, t) represents the communication path between the node S and the core node t, k represents the kth node v on the communication path p (S, t) _k Number of (v), ω (v) _k ，υ _k+1 ) Characterizing the kth node v _k And its neighbor node v on the communication path _k+1 The weight of the connecting edge between.

In the process, in the process of determining the path similarity of each node and each core node, the communication paths between the nodes and the core nodes are taken into consideration, so that the local information between the nodes and the core nodes and the global topology structure of the whole maximum spanning tree can be considered, the similarity between the nodes and the core nodes after the communication path factors are introduced can be more accurately measured, and the path similarity is more accurate, so that the nodes can be accurately divided into the candidate account groups where the most similar core nodes are located, namely the dividing accuracy of the local communities in the maximum spanning tree is greatly improved.

In other embodiments, the server may also perform node clustering without using the path similarity, but perform node clustering based on the node similarity introduced in step 3051, and since the node similarity is obtained in step 3051, the node similarity calculated in step 3051 may be directly reused, so as to save the computing resources of the server.

In the steps 3061-3063, the core nodes are screened out from the maximum spanning tree, and then the core nodes are used as the clustering centers to divide the local communities, namely the candidate account groups, so that when each candidate account group is initialized at first, the clustering centers of the candidate account groups are guaranteed to be the core nodes with higher weight contribution degree in the whole maximum spanning tree through the core nodes, compared with a mode of randomly sampling a plurality of nodes in the maximum spanning tree as the initial clustering centers, the clustering efficiency of converging to the candidate account groups can be improved, and the randomness of the initializing centers is reduced, so that the clustering accuracy is improved under the action of the core nodes.

In other embodiments, the server may also randomly sample a plurality of nodes from the maximum spanning tree as a clustering center, determine a plurality of candidate account groups based on the randomly sampled nodes, and then perform node clustering based on some KNN (K-nearest neighbor) algorithms or K-means (K-means) algorithms, where the node clustering method is not specifically limited in the embodiments of the present disclosure.

In step 307, the server merges the candidate account groups to obtain a target account group.

In some embodiments, after the nodes are clustered into candidate account groups through step 306, similar candidate account groups are merged to obtain one or more final target account groups. If the candidate account groups are regarded as local communities in the maximum spanning tree, the merging process is equivalent to merging the local communities to obtain a large community which can not be merged any more.

In some embodiments, before merging the candidate account groups, it may be determined whether core nodes of the candidate account groups are directly connected, if the core nodes of the two candidate account groups are directly connected and a weight of a connection edge is greater than a first weight threshold, the two candidate account groups are directly merged, after all the candidate account groups in which the core nodes are connected and the weight of the connection edge is greater than the first weight threshold are merged, for the remaining candidate account groups, each candidate account group is iteratively merged into another candidate account group with the highest group similarity, and finally, when all the candidate account groups cannot be merged again, the entire network may be presented as one only large community, and at this time, an account group with the largest modularity may be selected as a target account group from account groups generated by iterative merging each time.

Fig. 7 is a flowchart of merging candidate account groups according to an embodiment of the present disclosure, and as shown in fig. 7, a merging process of the candidate account groups by the server in the foregoing manner is shown, which is described below.

In step 3071, for any candidate account group in the plurality of candidate account groups, the server merges the candidate account group and another candidate account group if a connection edge exists between the core node of the candidate account group and the core nodes of the other candidate account group, and the weight of the connection edge is greater than the first weight threshold.

In some embodiments, before merging the candidate account groups, it is determined whether a situation that core nodes of the candidate account groups are directly connected exists or not, and if the core nodes of the two candidate account groups are directly connected and a weight of a connection edge between the two core nodes is greater than a first weight threshold, the two candidate account groups are directly merged.

In some embodiments, the server traverses all candidate account groups in all the candidate account groups, if a connection edge exists between a core node of the candidate account group and a core node of any other candidate account group, it represents that the core node of the candidate account group is directly connected with the core node of the other candidate account group, at this time, it is continuously determined whether a weight of the connection edge between the core node of the candidate account group and the core node of the other candidate account group is greater than a first weight threshold, if the weight of the connection edge between two core nodes is greater than the first weight threshold, the candidate account group and the other candidate account group are merged to obtain a new candidate account group, otherwise, if the weight of the connection edge between two core nodes is less than or equal to the first weight threshold, the account group cannot be merged directly even if the two core nodes are directly connected.

It should be noted that a connection edge may exist between the core node of each candidate account group and the core nodes of more than one other candidate account group, and then, under the condition that the weight values of the connection edges between the core nodes and more than one other core node are all greater than the first weight threshold value, the candidate account group and the plurality of core nodes may be directly connected, and the weight values of the connection edges are greater than the first weight threshold valueThe account groups are combined together, for example, the core node u of the candidate account group A _A The core nodes u of other candidate account groups B, C and D are respectively connected with _B 、u _C 、u _D There is a connecting edge between them, the core node u _A And core node u _B Connecting edge of<u _A ，u _B >Weight ω (u) of _A ，u _B ) If the weight value is larger than the first weight value epsilon, the kernel node u _A And core node u _C Connecting edge of<u _A ，u _C >Weight ω (u) of _A ，u _C ) Greater than a first weight threshold epsilon, kernel node u _A And core node u _D Of (2) connecting edge<u _A ，u _D >Weight ω (u) of _A ，u _D ) If the number of the candidate account groups a is smaller than the first weight threshold epsilon, the candidate account group a can be merged with other candidate account groups B and C, but cannot be merged with other candidate account groups D, that is, the three candidate account groups a, B, and C can be merged into a new candidate account group.

It should be noted that, if there is no connection edge between the core node of the current candidate account group and the core nodes of all other candidate account groups, it is indicated that there is no candidate account group to which the core node of the candidate account group is directly connected, or although there is no connection edge between the core node of the current candidate account group and the core nodes of other candidate account groups, the weight of any connection edge is not greater than the first weight threshold, at this time, merging needs to be performed through the following step 3072, after traversing all candidate account groups, it is possible to find and merge all candidate account groups in which there is a core node directly connected and the weight of the connection edge is greater than the first weight threshold, and after one round of traversal, it is possible to simultaneously satisfy the following two conditions: 1) A connecting edge exists between the core nodes of the two candidate account groups; 2) After one round of merging is performed on all candidate account groups with the weights of the connecting edges larger than the first weight threshold, that is, when it is ensured that no two original candidate account groups or the candidate account groups obtained by merging satisfy the two conditions 1) and 2), the following step 3072 is performed.

In other embodiments, in the first round of merging, the condition 2) that the weight of the connecting edge is greater than the first weight threshold may be further modified as follows: the path similarity between the two core nodes is larger than the first weight threshold, that is, the path similarity of the two core nodes is judged instead of the weight of the connecting edge of the two core nodes (it can be known from step 3051 that the weight of the connecting edge actually represents the node similarity between the two core nodes), so that the path similarity can consider the communication path between the two core nodes, the severity of direct merging of the candidate account groups can be improved, and the accuracy of the order of merging the communities can be improved to a certain extent.

In step 3072, otherwise, the server merges the candidate account group with other candidate account groups with the greatest group similarity.

In some embodiments, when there are no two account groups (which may be the original candidate account group or the candidate account group obtained by merging in step 3071) that satisfy both of the above two conditions 1) and 2), the server calculates the group similarity between the candidate account group and each other candidate account group for each remaining candidate account group after performing one round of merging in step 3071, and then merges the candidate account group with the other candidate account group with the highest group similarity.

In some embodiments, for any candidate account group in the plurality of candidate account groups, the server may calculate the group similarity between the candidate account group and any other candidate account group by: determining a sum value of degree parameters of nodes associated with each account included in the candidate account group as a group degree parameter of the candidate account group, wherein the degree parameter of the node represents the number of edges connected with the node in the maximum spanning tree, in other words, the degree parameters of the nodes associated with each account included in the candidate account group are obtained first, and then the sum value of the degree parameters of each node is obtained as the group degree parameter of the candidate account group; then, in the same way, obtaining the degree parameters of the nodes associated with the accounts in the other candidate account groups, and obtaining the sum value of the degree parameters of the nodes as the group degree parameters of the other candidate account groups; then, based on the cluster degree parameter of the candidate account group, the cluster degree parameters of the other candidate account groups, and the number of the common edges in the candidate account group and the other candidate account groups, the cluster similarity between the candidate account group and the other candidate account groups is determined, that is, a mapping relationship is established, so that the cluster similarity can be mapped under the influence of three variables by the cluster degree parameter of the candidate account group, the cluster degree parameters of the other candidate account groups, and the number of the common edges in the candidate account group and the other candidate account groups.

It should be noted that, because the candidate account number group is essentially a sub-graph on the maximum spanning tree, the common edge refers to the number of the same edges existing on the sub-graphs corresponding to the two candidate account number groups, and also represents the number of the same elements existing in the edge set of the sub-graphs corresponding to the two candidate account number groups, which is equivalent to reflecting the number of the common edges shared in the two local communities.

In some embodiments, the server assumes that the candidate account group is the ith candidate account group, and the other candidate account groups are the jth candidate account group c _j Determining the candidate account group c _i And the other candidate account number group c _j Number of common edges num (c) in _i ，c _j ) Then, the candidate account group c is obtained _i Group degree parameter of

And the other candidate account number group c _j Group degree parameter of

Product between

Then, for the product

Obtaining the arithmetic square root

The number of the common edges num (c) is then set _i ，c _j ) Divided by the square root of the arithmetic

The obtained value is used as the candidate account group c _i With the other candidate account group c _j Group similarity therebetween, in other words, the group similarity can be characterized by the following formula:

wherein, S (c) _i ，c _j ) Characterizing the ith candidate account group c _i And j candidate account group c _j Group similarity between them, num (c) _i ，c _j ) Characterizing a group of candidate accounts c _i And candidate account group c _j The number of common edges in (1) is,

characterizing a group of candidate account numbers c _i The group degree parameter of (2) is,

characterizing a group of candidate account numbers c _j C of the group degree parameter, | c _i | characterize a candidate account group c _i The total number of contained accounts (i.e. total number of nodes), upsilon _j Characterizing in a candidate account group c _i The j-th node in (d), degree (v) _j ) Characterizing node v _j The degree parameter of (c).

In the process, based on the respective group degree parameters of the two candidate account groups and the number of the public edges in the two candidate account groups, the group similarity representing the similarity degree between the two groups accurately can be obtained, the larger the group similarity value is, the higher the similarity degree between the two candidate account groups is, and the smaller the group similarity value is, the lower the similarity degree between the two candidate account groups is, so that the candidate account groups can be further merged under the guidance of the group similarity, and the merging order of the candidate account groups determines the modularity of the account groups obtained after each merging, so that the candidate account groups with the largest group similarity are preferentially merged under the guidance of the group similarity, and the target account groups with the optimal dividing mode can be screened and processed.

It should be noted that, since each candidate account group can find another candidate account group with the largest group similarity, when deciding which pair of candidate account groups to merge preferentially, the group similarities of all pairwise paired candidate account groups may be transversely compared, so that a pair of candidate account groups with the highest group similarity is merged preferentially, for example, the group similarity of the candidate account group a and the candidate account group B is highest and equal to 0.8, the group similarity of the candidate account group C and the candidate account group D is highest and equal to 0.5, the group similarity of the candidate account group E and the candidate account group F is highest and equal to 0.6, and at this time, in the entire merging process, the candidate account group a and the candidate account group B with the highest group similarity after transverse comparison are merged preferentially, so as to obtain a new candidate account group a + B.

In some embodiments, after merging each candidate account group with other candidate account groups with the greatest group similarity, which two candidate account groups are merged may be recorded, and simultaneously, the modularity of the merged account groups is recorded, and the merging operation on the account groups is iteratively performed until no candidate account groups can be merged continuously, which is equivalent to that all the candidate account groups are aggregated into a large account group, that is, a large community, and at this time, which merged account group is an optimally divided target account group may be screened according to the modularity recorded by the account groups obtained by merging each time.

In step 3073, the server repeatedly performs the operation of merging the candidate account groups until no candidate account group can be merged, and screens the target account group from the account groups obtained by merging each time.

In some embodiments, after any two candidate account groups are merged in step 3072, it is recorded that the account group obtained by merging this time is obtained by merging which candidate account groups, and for the account group obtained by merging this time, the modularity of the account group obtained by merging this time is also recorded, so that the account group obtained by merging with the largest modularity value can be finally selected as the target account group under the guidance of the modularity.

In other words, after performing the merging operation in step 3072 each time, the server obtains the modularity of the account group for the account group obtained by the merging this time, where the modularity is used to measure the dividing quality of the account group divided from the account relation diagram, and iteratively performs the operation of merging the candidate account groups in step 3072 until no candidate account group can be merged continuously, at this time, the recorded modularity of the account group obtained by merging each time may be queried, and the account group with the highest modularity is determined as the target account group.

In one example, the modularity of the account group is represented by a symbol Q, and it is assumed that when the account groups a and B are merged for the first time, a new candidate account group a + B is obtained, where the modularity of the new candidate account group a + B is Q ₁ When merging for the second time, merging the candidate account group C with the candidate account group a + B obtained by the previous merging again to obtain a new candidate account group a + B + C, wherein the modularity of the new candidate account group a + B + C is Q ₂ When merging for the third time, merging the candidate account group D with the candidate account group A + B + C obtained by the last merging again to obtain a new candidate account group A + B + C + D and a new candidate account groupThe modularity of the groups A + B + C + D is Q ₃ And then no other candidate account groups can continue to be merged, and the query obtains a queue (Q) ₁ ,Q ₂ ,Q ₃ Maximum value Q in _max ＝ Q ₂ At this time, it represents that the account group a + B + C obtained by the second merging is the optimal division of the account group in the current maximum spanning tree (i.e., has the highest community division quality), and therefore, the account group a + B + C obtained by the second merging is selected as the final output target account group.

In the process, in the process of iteratively executing the merging operation, the modularity of the account groups obtained by merging each time is recorded, and the target account groups with the optimal community division quality can be selected by taking the modularity as a guide, so that the output target account groups represent the optimal division of the divided account groups in the maximum spanning tree, the division precision of the selected target account groups can be improved, and the target account groups are ensured to have the optimal community division quality.

In the above steps 3071-3073, a second round of merging is performed on the candidate account groups by the connection condition based on the core nodes and the weight of the connection edges, then, the second round of merging is performed iteratively based on the group similarity between different candidate account groups, and finally, the finally output target account group is selected according to the modularity of the account groups obtained by each merging, so that account groups with tighter connection and higher similarity are preferentially merged, and the target account group with the optimal community division quality is selected, thereby ensuring the identification accuracy of the target account group and the community division quality.

In some embodiments, the above steps 305-307 correspond to providing a community mining algorithm based on common consumers, the input of the algorithm is a undirected network, i.e. an undirected graph G (V, E) of accounts with original unburdened, and the output of the algorithm is a mined group of target accounts (i.e. a mined community structure), a modularity Q of the group of target accounts, and a first weight threshold epsilon. Illustratively, this algorithm includes the steps of:

(I) Generating a maximum spanning tree: firstly, calculating the weight of each edge in the edge set E of the account relation graph G to convert the undirected unweighted network G (V, E) into an undirected weighted network G (V, E, omega), and then generating a maximum spanning tree T (V, E, omega) for the undirected weighted network G (V, E, omega) _T )；

(II) determining a core node: set of edges E to the maximum spanning tree T (V, ET) _T The weight values of the upper edges are sorted, and all the edges are sequentially recorded into a candidate queue W according to the sorting order _T In queue W, next _T A new edge weight is selected as a first weight threshold epsilon, and then a maximum spanning tree T (V, E) _T ) All core nodes are screened out by the method provided in step 3061;

(III) node clustering: calculating the path similarity S (S, t) from all nodes to the core nodes, and selecting the core node with the maximum path similarity for each node to be combined, thereby generating each local community (namely a candidate account group); then, judging whether the core nodes are directly connected or not for each candidate account group generated after the nodes are clustered, wherein the weight of a connecting edge is larger than a first weight threshold epsilon, if the conditions are met, directly combining local communities where the two core nodes are located, or judging whether the core nodes are directly connected or not, the path similarity between the two core nodes is larger than the first weight threshold epsilon, and if the conditions are met, directly combining the local communities where the two core nodes are located;

(IV) local community merging: calculating the group similarity between the local communities, selecting two local communities with the maximum group similarity for combination, namely calculating the group similarity between all the local communities pairwise, then combining the two local communities with the highest group similarity, and then calculating and recording the modularity Q value of the communities obtained by combination;

(V) repeating the step (IV) until the whole network is combined into a large community, finding the Q value with the maximum value and recording the Q value into a queue Q _S Performing the following steps;

(VI) determining candidate queue W _T If there is a remainder, whether the edge in (1) is completely traversedRepeating the steps (II) to (V) if the non-traversed edges do not exist, representing that the traversal is finished, ending the algorithm, and selecting a queue Q _S Of the largest Q values (e.g. Q) ^* ) Output and output Q ^* The corresponding community structure, namely the modularity value is Q ^* And in this case, a first weight threshold epsilon.

For the algorithm, assuming that the total number of nodes contained in the account number relationship graph is n and the total number of edges is m, the complexity in the worst case is O (mlogn) when the maximum spanning tree is calculated; assuming that the number of core nodes on the maximum spanning tree is k, when calculating the path similarity between each node and each core node, the complexity is nk; forming k local communities based on k nuclear node clustering, and gradually clustering the k local communities into a community at k-1 steps, so that the time complexity of local community combination is k-1; since the maximum spanning tree T (V, E) is generated in the algorithm _T ) The weight of the above edge is used as a candidate set for the first weight threshold epsilon, so in the worst case, the above process is repeated n-1 times. In summary, the time complexity of the community partitioning algorithm based on the maximum spanning tree is O (mn) in the whole operation process.

In step 308, the server determines the candidate accounts included in the target account group as the target accounts.

In some embodiments, after the target account group is obtained in step 307, since the target account group may include a reference account and a candidate account, and the reference account is a known account issuing a sensitive content item, when performing security risk control, only the candidate account included in the target account group needs to be output as the target account, so that the target account is accurately pressed and suppressed, the content item issued by the target account is also accurately pressed, and accurate pressing of accurate identification of the target account can be achieved.

In the above steps 305 to 308, a possible implementation manner is shown in which, based on the account number relationship diagram, the target account numbers forming the account number group with the reference account number are screened from the candidate account numbers, that is, node clustering and account number group merging are performed successively on the maximum spanning tree of the account number relationship diagram to mine an optimally partitioned target account number group, and then the target account numbers are screened from the target account number group, so that a part of invalid nodes located outside the maximum spanning tree can be extracted through the maximum spanning tree, then scattered nodes are merged into the candidate account number group where the most similar core node is located through the node clustering, and then the similar candidate account number groups are further merged, and under the condition that the modularity of the account number group obtained based on merging is used as a screening index, the target account number group is optimally partitioned from the maximum spanning tree, so that the identification accuracy of the target account number is greatly improved.

Fig. 8 is a schematic diagram of performing negative account pressing after a target account is screened based on an account relationship diagram according to the embodiment of the present disclosure, and as shown in fig. 8, it is assumed that two reference accounts are known: the video producer 801 and the video producer 802 then mine a target account 803 of a potential sensitive content item (namely, a potential negative video producer) in the account relation diagram through n (n is greater than or equal to 1) positive consumers C1 to Cn shared between the video producer 801 and the video producer 802, and then can perform a suppression measure such as blocking or banning for 30 days on the target account 803, so as to discover and control potential security risks in the platform in time.

Fig. 9 is a schematic flow chart of an account acquisition method provided in an embodiment of the present disclosure, as shown in fig. 9, in a stage of building an account relationship diagram, nodes in the account relationship diagram are built by a content producer a (e.g., a reference account) and a content producer B (e.g., a candidate account), edges for connecting the nodes in the account relationship diagram are built based on a common consumer (e.g., a first account), then, in a pruning stage, invalid nodes can be deleted by proposing an authenticated or officially registered account, in addition, by removing a negative consumer (i.e., a common consumer who performs a negative interaction behavior), invalid edges can be deleted, then, in a stage of discovering a cluster, a core node is used as a clustering center, node clustering and account group merging is performed to output an optimally partitioned target account group with a maximum Q value, then, in a stage of discovering a target account, known reference accounts a and B are removed, that is left, a target account group is mined, and a target account group C is identified by a machine, so that a risk of a target account group is reduced.

Fig. 10 is a schematic diagram illustrating the effect of the account number obtaining method provided in the embodiment of the present disclosure, as shown in fig. 10, it can be seen that black nodes represent known reference account numbers, white nodes represent mined target account numbers, and gray nodes represent normal account numbers.

In the testing process based on the account number obtaining method disclosed by the embodiment of the disclosure, after the platform is put into use, about 400 newly-added identified target account numbers are added every day, the identification accuracy of the target account numbers to the negative account numbers is up to 72%, and the service recall rate is only 24%, wherein the service recall rate refers to the gain rate of the newly-recalled negative account numbers after the CCCM algorithm is used compared with that before the CCCM algorithm is used. Further, in the process of testing by taking a sensitive video as an example, the test results are shown in table 2:

TABLE 2

Typical Community	Account number	Negative account number ratio	Gain of algorithm	Community label
					Reference account number	19	100％	/	Sensitive cartoon
Target account number	40	72％	49％	Sensitive cartoon

The negative account proportion refers to a ratio between a real negative account and a target account identified by an algorithm, and the algorithm gain refers to a gain value brought by identification of the target account before and after the CCCM algorithm is applied, namely, the number of the negative accounts which are identified more before and after the CCCM algorithm is applied is increased by a certain amount compared with the number of known reference accounts before and after the CCCM algorithm is applied.

In the embodiment of the disclosure, different content producers are connected based on a common consumer, a new idea is provided for the construction of an account relation graph, a supervised reference account is introduced, a malicious community (namely a target account group) formed by negative accounts is excavated under the influence of a supervision signal, and different account relation graphs can be constructed and malicious communities found by different dimensions can be excavated according to different positive interaction behaviors (such as approval, attention, sharing and downloading) executed by the common consumer, so that the safety risk control of a platform and the negative account suppression can be automatically realized.

Fig. 11 is a block diagram illustrating a logical structure of an account acquisition apparatus according to an exemplary embodiment. Referring to fig. 11, the apparatus includes a determination unit 1101, a generation unit 1102, and a filtering unit 1103.

A determining unit 1101 configured to execute a sensitive content item issued based on a reference account, and determine a first account that has executed a forward interactive behavior on the sensitive content item, where the forward interactive behavior is an interactive behavior in a forward form on the sensitive content item;

the determining unit 1101 is further configured to execute a target content item that has executed the forward interaction behavior based on the first account, and determine a candidate account for issuing the target content item;

a generating unit 1102, configured to execute generating an account relationship diagram associated with the forward interaction behavior based on the reference account, the first account, and the candidate account, where the account relationship diagram is used to represent a topological structure of a social relationship between publisher accounts of content items of which the forward interaction behavior is executed by the first account;

a screening unit 1103 configured to perform screening of target accounts forming an account group with the reference account from the candidate accounts based on the account relationship diagram.

According to the device provided by the embodiment of the disclosure, by the characteristic that the first account performs forward interactive behaviors on different content items, the reference account for issuing the sensitive content item and the candidate account for issuing the target content item can be linked in the account relationship graph, so that the account relationship graph generated based on the first accounts performing the forward interactive behaviors on the content items issued by different accounts is constructed, and then community mining is performed on the basis of the constructed account relationship graph.

In some embodiments, based on the apparatus composition of fig. 11, the generating unit 1102 includes:

a node construction subunit configured to perform construction of a node in the account relation graph based on the reference account and the candidate account;

and the edge construction subunit is configured to execute a forward interaction behavior executed based on the first account, and construct an edge used for connecting nodes in the account relation diagram.

In some embodiments, the edge construction subunit is configured to perform:

and under the condition that any one of the first accounts executes the forward interaction behavior on the sensitive content items issued by the reference account and the target content items issued by the candidate accounts, generating an edge for connecting the node of the reference account and the node of the candidate account in the account relation graph.

In some embodiments, based on the apparatus composition of fig. 11, the screening unit 1103 includes:

an obtaining subunit configured to perform obtaining a maximum spanning tree of the account relation diagram, the maximum spanning tree having a maximum weight among a plurality of spanning trees of the account relation diagram;

and the determining subunit is configured to perform determination of the candidate accounts included in the target account group as the target account.

In some embodiments, based on the apparatus composition of fig. 11, the acquiring subunit includes:

and the generating subunit is configured to execute generating the maximum spanning tree based on the weight values of all the edges in the account number relation graph, wherein the sum of the weight values of the edges contained in the maximum spanning tree is maximum in the plurality of spanning trees.

In some embodiments, the generating subunit is configured to perform:

In some embodiments, based on the apparatus composition of fig. 11, the clustering subunit includes:

a determining subunit configured to perform determining a plurality of candidate account groups with the plurality of core nodes as clustering centers, respectively;

and the clustering subunit is configured to cluster the nodes except the core node in the maximum spanning tree into the candidate account group in which the core node with the highest path similarity with the node is positioned.

In some embodiments, the screening subunit is configured to perform:

In some embodiments, the determining subunit is further configured to perform:

In some embodiments, based on the apparatus composition of fig. 11, the merging subunit includes:

the merging subunit is configured to merge, for any one of the candidate account groups, the candidate account group and another candidate account group when a connecting edge exists between a core node of the candidate account group and core nodes of the other candidate account groups and a weight of the connecting edge is greater than a first weight threshold;

the merging subunit is further configured to perform merging on the candidate account group and other candidate account groups with the maximum group similarity if the merging subunit is not configured to perform merging;

the merging sub-unit is also configured to execute the operation of repeatedly executing merging on the candidate account groups until no candidate account groups can be merged;

In some embodiments, the merging subunit is further configured to perform:

In some embodiments, the group filtering subunit is configured to perform:

In some embodiments, based on the apparatus composition of fig. 11, the apparatus further comprises:

All the above optional technical solutions may be combined arbitrarily to form the optional embodiments of the present disclosure, and are not described herein again.

With regard to the apparatuses in the above-described embodiments, the specific manner in which each unit performs operations has been described in detail in the embodiments related to the account number acquisition method, and will not be described in detail here.

Fig. 12 is a schematic structural diagram of a computer device according to an embodiment of the present disclosure, where the computer device 1200 may generate a relatively large difference due to a difference in configuration or performance, and may include one or more processors (CPUs) 1201 and one or more memories 1202, where the memory 1202 stores at least one program code, and the at least one program code is loaded and executed by the processors 1201 to implement the account obtaining method according to each of the embodiments. Certainly, the computer device 1200 may further have a wired or wireless network interface, a keyboard, an input/output interface, and other components to facilitate input and output, and the computer device 1200 may further include other components for implementing device functions, which are not described herein again.

In an exemplary embodiment, a computer-readable storage medium, such as a memory, including at least one instruction is also provided, where the at least one instruction is executable by a processor in a computer device to perform the account acquisition method in the above embodiments. Alternatively, the computer-readable storage medium may be a non-transitory computer-readable storage medium, for example, the non-transitory computer-readable storage medium may include a ROM (Read-Only Memory), a RAM (Random-Access Memory), a CD-ROM (Compact Disc Read-Only Memory), a magnetic tape, a floppy disk, an optical data storage device, and the like.

In an exemplary embodiment, a computer program product is further provided, which includes one or more instructions that can be executed by a processor of a computer device to implement the account obtaining method provided in the foregoing embodiments.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements that have been described above and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. An account number obtaining method is characterized by comprising the following steps:

determining a first account number which executes forward interactive behavior on a sensitive content item based on the sensitive content item issued by a reference account number, wherein the forward interactive behavior refers to the interactive behavior of performing forward form on the sensitive content item;

2. The account acquisition method according to claim 1, wherein the generating an account relationship diagram associated with the forward interaction behavior based on the reference account, the first account, and the candidate account includes:

3. The account acquisition method according to claim 2, wherein the constructing edges for connecting nodes in the account relationship diagram based on the forward interaction behavior executed by the first account includes:

and under the condition that any one first account performs the forward interaction behavior on the sensitive content items issued by the reference account and the target content items issued by the candidate accounts, generating an edge for connecting the node of the reference account and the node of the candidate account in the account relationship graph.

4. The account acquisition method according to claim 1, wherein the screening of the candidate accounts for the target accounts forming the account group with the reference account based on the account relationship diagram comprises:

merging the candidate account groups to obtain a target account group;

5. The account number obtaining method according to claim 4, wherein the obtaining the maximum spanning tree of the account number relationship diagram includes:

assigning a weight to each edge based on the node similarity between two nodes connected with the edge in the account relation graph;

and generating the maximum spanning tree based on the weight values of all edges in the account relation graph, wherein the sum of the weight values of the edges contained in the maximum spanning tree is maximum in the plurality of spanning trees.

6. The account acquisition method according to claim 5, wherein the node similarity is a ratio between the number of common neighbor nodes of the two nodes and the sum of the numbers of respective neighbor nodes of the two nodes.

7. The account acquisition method according to claim 5, wherein the generating the maximum spanning tree based on the weight of each edge in the account relationship diagram includes:

8. The account acquisition method according to claim 4, wherein the clustering the accounts indicated by the nodes included in the maximum spanning tree to obtain a plurality of candidate account groups comprises:

and clustering nodes except the core nodes in the maximum spanning tree into a candidate account group in which the core node with the highest path similarity with the nodes is located.

9. The account acquisition method according to claim 8, wherein the obtaining of the plurality of core nodes by screening from the nodes included in the maximum spanning tree comprises:

10. The account acquisition method according to claim 8, further comprising:

11. The account acquisition method according to claim 4, wherein the merging the candidate account groups to obtain the target account group comprises:

for any candidate account group in the plurality of candidate account groups, merging the candidate account group and other candidate account groups under the condition that a connecting edge exists between a core node of the candidate account group and core nodes of other candidate account groups and the weight of the connecting edge is greater than a first weight threshold;

12. The account acquisition method according to claim 11, wherein the method further comprises:

determining the sum value of degree parameters of nodes associated with all accounts contained in the candidate account group as the group degree parameters of the candidate account group for any candidate account group in the plurality of candidate account groups, wherein the degree parameters of the nodes represent the number of edges connected with the nodes in the maximum spanning tree;

13. The account acquisition method according to claim 11, wherein the screening of the target account group from the account groups obtained by merging each time includes:

14. The account acquisition method according to claim 1, wherein the forward interaction behavior includes at least one of: a forward-form behavior for the content item, an attention behavior for a publisher account of the content item, a sharing behavior, or a downloading behavior.

15. The account acquisition method according to claim 1, wherein after determining the candidate account for issuing the target content item, the method further includes:

16. An account acquisition apparatus, comprising:

the system comprises a determining unit, a judging unit and a judging unit, wherein the determining unit is configured to execute a sensitive content item issued based on a reference account and determine a first account which executes a forward interactive behavior on the sensitive content item, and the forward interactive behavior refers to an interactive behavior which carries out a forward form on the sensitive content item;

the determining unit is further configured to execute a target content item which is executed by the first account and performs the forward interaction behavior, and determine a candidate account for issuing the target content item;

17. A computer device, comprising:

one or more processors;

wherein the one or more processors are configured to execute the instructions to implement the account acquisition method of any one of claims 1 to 15.

18. A computer-readable storage medium, wherein at least one instruction of the computer-readable storage medium, when executed by one or more processors of a computer device, enables the computer device to perform the account acquisition method of any one of claims 1 to 15.

19. A computer program product comprising one or more instructions that when executed by one or more processors of a computer device enable the computer device to perform the account acquisition method of any one of claims 1 to 15.