CN114092268A - User community detection method and device, computer equipment and storage medium - Google Patents

User community detection method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN114092268A
CN114092268A CN202111432615.8A CN202111432615A CN114092268A CN 114092268 A CN114092268 A CN 114092268A CN 202111432615 A CN202111432615 A CN 202111432615A CN 114092268 A CN114092268 A CN 114092268A
Authority
CN
China
Prior art keywords
node
graph network
user
data
weight
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111432615.8A
Other languages
Chinese (zh)
Inventor
敖琦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Property and Casualty Insurance Company of China Ltd
Original Assignee
Ping An Property and Casualty Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Property and Casualty Insurance Company of China Ltd filed Critical Ping An Property and Casualty Insurance Company of China Ltd
Priority to CN202111432615.8A priority Critical patent/CN114092268A/en
Publication of CN114092268A publication Critical patent/CN114092268A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Computing Systems (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application belongs to the field of artificial intelligence and relates to a user community detection method, a user community detection device, computer equipment and a storage medium, wherein the method comprises the following steps: acquiring user activity information; determining node data and connection side data in the user activity information; constructing a first graph network according to the node data and the connection edge data; determining the node weight of each node in the first graph network through a preset ranking algorithm, and updating the first graph network according to the node weight to obtain a second graph network; determining the edge weight of each connecting edge in the second graph network based on the node weight of each node in the second graph network, and updating the second graph network according to the edge weight to obtain a third graph network; and calculating the third graph network through a community discovery algorithm to obtain a user community detection result. In addition, the present application also relates to blockchain techniques, where user activity information may be stored. The method and the device improve the accuracy of user community detection.

Description

User community detection method and device, computer equipment and storage medium
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to a user community detection method, apparatus, computer device, and storage medium.
Background
With the development of computer technology, more and more production and living activities are developed through the internet, and user objects participating in the production and living activities can form a graph network. The graph network comprises nodes and connecting edges, wherein parts of the nodes in the graph network may have strong association relationship and can form a community.
When data mining a graph network, it is often necessary to detect user communities in the graph network. Some community discovery algorithms have appeared, and the InfoMap algorithm is also called a community discovery algorithm, which is a more commonly used algorithm at present. The InfoMap algorithm can calculate points and edges in the graph network so as to mine user communities in the graph network, but the existing InfoMap algorithm ignores the importance degree of the points and the edges, so that the accuracy of user community detection is low.
Disclosure of Invention
An embodiment of the present application provides a user community detection method, an apparatus, a computer device, and a storage medium, so as to solve the problem of low user community detection accuracy.
In order to solve the above technical problem, an embodiment of the present application provides a user community detection method, which adopts the following technical solutions:
acquiring user activity information;
determining node data and connection side data in the user activity information;
constructing a first graph network according to the node data and the connection edge data;
determining the node weight of each node in the first graph network through a preset ranking algorithm, and updating the first graph network according to the node weight to obtain a second graph network;
determining edge weights of all connecting edges in the second graph network based on the node weights of all nodes in the second graph network, and updating the second graph network according to the edge weights to obtain a third graph network;
and calculating the third graph network through a community discovery algorithm to obtain a user community detection result.
In order to solve the above technical problem, an embodiment of the present application further provides a user community detection apparatus, which adopts the following technical solutions:
the information acquisition module is used for acquiring user activity information;
the data determining module is used for determining node data and connection edge data in the user activity information;
the network construction module is used for constructing a first graph network according to the node data and the connection edge data;
the first updating module is used for determining the node weight of each node in the first graph network through a preset ranking algorithm so as to update the first graph network according to the node weight to obtain a second graph network;
a second updating module, configured to determine an edge weight of each connection edge in the second graph network based on a node weight of each node in the second graph network, so as to update the second graph network according to the edge weight to obtain a third graph network;
and the network computing module is used for computing the third graph network through a community discovery algorithm to obtain a user community detection result.
In order to solve the above technical problem, an embodiment of the present application further provides a computer device, which adopts the following technical solutions:
acquiring user activity information;
determining node data and connection side data in the user activity information;
constructing a first graph network according to the node data and the connection edge data;
determining the node weight of each node in the first graph network through a preset ranking algorithm, and updating the first graph network according to the node weight to obtain a second graph network;
determining edge weights of all connecting edges in the second graph network based on the node weights of all nodes in the second graph network, and updating the second graph network according to the edge weights to obtain a third graph network;
and calculating the third graph network through a community discovery algorithm to obtain a user community detection result.
In order to solve the above technical problem, an embodiment of the present application further provides a computer-readable storage medium, which adopts the following technical solutions:
acquiring user activity information;
determining node data and connection side data in the user activity information;
constructing a first graph network according to the node data and the connection edge data;
determining the node weight of each node in the first graph network through a preset ranking algorithm, and updating the first graph network according to the node weight to obtain a second graph network;
determining edge weights of all connecting edges in the second graph network based on the node weights of all nodes in the second graph network, and updating the second graph network according to the edge weights to obtain a third graph network;
and calculating the third graph network through a community discovery algorithm to obtain a user community detection result.
Compared with the prior art, the embodiment of the application mainly has the following beneficial effects: after user activity information is acquired, extracting node data and connection side data, constructing a first graph network, and adding node weights to all nodes in the first graph network through a preset ranking algorithm to obtain a second graph network; then, based on the node weight of each node in the second graph network, determining the edge weight of each connecting edge, thereby obtaining a third graph network with the edge weight; the edge weight represents the importance of the connecting edge, and enriches the input information of the community discovery algorithm, so that the accuracy of the user community detection result calculated by the community discovery algorithm is improved.
Drawings
In order to more clearly illustrate the solution of the present application, the drawings needed for describing the embodiments of the present application will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and that other drawings can be obtained by those skilled in the art without inventive effort.
FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;
FIG. 2 is a flow diagram of one embodiment of a user community detection method according to the present application;
FIG. 3 is a schematic diagram of an embodiment of a user community detection apparatus according to the present application;
FIG. 4 is a schematic block diagram of one embodiment of a computer device according to the present application.
Detailed Description
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof, in the description and claims of this application and the description of the above figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or in the above-described drawings are used for distinguishing between different objects and not for describing a particular order.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may have various communication client applications installed thereon, such as a web browser application, a shopping application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like.
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture experts Group Audio Layer III, mpeg compression standard Audio Layer 3), MP4 players (Moving Picture experts Group Audio Layer IV, mpeg compression standard Audio Layer 4), laptop portable computers, desktop computers, and the like.
The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the terminal devices 101, 102, 103.
It should be noted that, the user community detection method provided in the embodiment of the present application is generally executed by a server, and accordingly, the user community detection apparatus is generally disposed in the server.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continued reference to FIG. 2, a flow diagram of one embodiment of a user community detection method in accordance with the present application is shown. The user community detection method comprises the following steps:
in step S201, user activity information is acquired.
In the present embodiment, the electronic device (e.g., the server shown in fig. 1) on which the user community detection method operates may communicate with the terminal by a wired connection manner or a wireless connection manner. It should be noted that the wireless connection means may include, but is not limited to, a 3G/4G/5G connection, a WiFi connection, a bluetooth connection, a WiMAX connection, a Zigbee connection, a uwb (ultra wideband) connection, and other wireless connection means now known or developed in the future.
Specifically, user activity information is first acquired. The user activity information may be information for recording user activities, and may include interaction information between different users in addition to basic information of a single user and activity operation information. The user activity information may be obtained from a database.
It is emphasized that, in order to further ensure the privacy and security of the user activity information, the user activity information may also be stored in a node of a block chain.
The block chain referred by the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
Step S202, determining node data and connection edge data in the user activity information.
Specifically, the method and the device need to build a graph network, and realize user community detection according to the calculation of the graph network. The graph network includes nodes and connecting edges, and the connecting edges are used for connecting the nodes. The description information corresponding to the node is node data, and the description information corresponding to the connection edge is connection edge data. The node data and the connection edge data may be extracted from the user activity information.
Step S203, a first graph network is constructed according to the node data and the connection edge data.
Specifically, the graph network established first is the first graph network. And constructing nodes according to the node data, and constructing connection edges among the nodes according to the connection edge data so as to obtain a first graph network, wherein the first graph network is a directed graph.
In one embodiment, the first graph network is constructed as follows:
(1) using a Python's panda _ csv module to import node data and connection edge data;
(2) using the network of Python to initialize a directed single-edge graph G; networkx is a package of Python used to build and manipulate complex graph structures, providing algorithms for analyzing graphs.
(3) And constructing a node for the directed single-edge graph G by using the add _ nodes _ from of the network x, and constructing a connecting edge for the directed single-edge graph G by using the add _ edges _ from of the network x, thereby obtaining the first graph network.
Step S204, determining the node weight of each node in the first graph network through a preset ranking algorithm, and updating the first graph network according to the node weight to obtain a second graph network.
Specifically, nodes in the first graph network do not have node weights and connecting edges do not have edge weights. The existing community discovery algorithm can directly calculate the first graph network, neglects the importance degree of nodes and connecting edges, and has low accuracy.
In the method, the node weight of each node in the first graph network is calculated by using a preset ranking algorithm, and the calculated node weight is added to each node, so that the first graph network is updated, and the second graph network is obtained.
When the preset ranking algorithm is used, the node weight of each node needs to be initialized, and then the initialized node weight of each node is updated through the preset ranking algorithm. In the initialization, the node weights of the nodes may be the same or different.
Step S205, determining an edge weight of each connection edge in the second graph network based on the node weight of each node in the second graph network, so as to update the second graph network according to the edge weight to obtain a third graph network.
Specifically, the connecting edge in the second graph network does not have an edge weight, and an edge weight needs to be added to the connecting edge in the second graph network. In a graph network, the higher the node weight, the more important the node is, and the more important the connecting edges associated with the node are. Therefore, according to the node weight of each node in the second graph network, an edge weight can be added to each connecting edge, the importance of the node is transferred to the connecting edge associated with the node, and the second graph network is updated to obtain a third graph network.
In one embodiment, the node weight of a node may be added to the connecting edge pointing to the node, resulting in an edge weight of the connecting edge.
Step S206, calculating the third graph network through a community discovery algorithm to obtain a user community detection result.
In particular, the connecting edges in the third graph network have edge weights, and a community discovery algorithm may be run for the third graph network, with the edge weights of the connecting edges as input information for the third graph network. The community discovery algorithm may detect user communities in the third graph network, within each user community, the connections between nodes are close. The detected community of users will be the user community detection result.
Further, the step S206 may include: calculating the third graph network with the edge weight through a community discovery algorithm to obtain a user community in the third graph network, wherein the community discovery algorithm is an InfoMap algorithm; and determining the obtained user community as a user community detection result.
Specifically, when calculating the third graph network using the community discovery algorithm, an edge weight connecting edges may be added to the calculation, and a node weight of a node may not be considered. The community discovery algorithm used may be the InfoMap algorithm. The InfoMap algorithm is used for calculating the random walk paths in the graph network from the perspective of information theory, and the smaller the minimum average bit required by any one path is, the better the community division is. The iterative process of the InfoMap algorithm is as follows: initializing, and regarding each node as an independent community; randomly traversing each node, and dividing each node into adjacent classes with the maximum reduction of the average bit value; and thirdly, repeating the second step (the random sequence is different each time) until the average bit value of the whole graph is unchanged. In the application, whether to jump from one node to another node or not when the InfoMap algorithm runs is influenced by the edge weight of the connecting edge between the two nodes. The greater the edge weight of a connecting edge, the greater the probability of jumping to another node through the connecting edge.
The InfoMap algorithm divides at least one user community from the third graph network, and the detected user community is used as a user community detection result.
In the embodiment, the community discovery algorithm is an InfoMap algorithm, and the InfoMap algorithm takes the edge weight of the connecting edge as input information, so that the community division can be performed more accurately, and the accuracy of the generated user community detection result is improved.
In this embodiment, after user activity information is acquired, node data and connection side data are extracted and a first graph network is constructed, and a node weight is added to each node in the first graph network through a preset ranking algorithm to obtain a second graph network; then, based on the node weight of each node in the second graph network, determining the edge weight of each connecting edge, thereby obtaining a third graph network with the edge weight; the edge weight represents the importance of the connecting edge, and enriches the input information of the community discovery algorithm, so that the accuracy of the user community detection result calculated by the community discovery algorithm is improved.
Further, the step S202 may include: identifying entity data and relationship data in the user activity information; acquiring a scene identifier detected by a user community; screening the entity data and the relation data according to the scene identification; and determining the entity data obtained after screening as node data, and determining the relationship data obtained after screening as connection side data.
The entity data may be description data of the entity, and the relationship data may be data describing a relationship of the entity. For example, in the user activity information, the user account may be an entity, and the user account is bound with a mobile phone number, and the mobile phone number may be entity data; an invitation relationship may exist between two user accounts, and such invitation relationship may serve as relationship data. The context identifier may be an identifier of an application context, and the user community detection may have a plurality of application contexts, for example, fraud group identification or product recommendation is performed on the detected user community, and different application contexts may be distinguished by the context identifier.
Specifically, entity data and relationship data in the user activity information may be identified through natural language processing or the like. Scene identification detected by the user community can be obtained, the scene identification represents an application scene, and required entity data and relation data may be different in different application scenes; in different application scenarios, the same entity data and relationship data may occupy different weights during calculation. Therefore, the identified entity data and relationship data can be screened according to the scene identifier, the entity data and relationship data required in the application scene corresponding to the scene identifier are left, the entity data obtained after screening is determined as node data, and the relationship data obtained after screening is determined as connection side data.
For example, in a session for a pull-new activity, there may be some malicious parties that register in bulk to gain interest, requiring fraudulent party detection. In the application scenario, 6 types of entity data are screened out as node data: the mobile phone numbers of the new and old clients participating in the activity in the first 30 days; mobile phone number registration time (accurate to day); the device number used by the mobile phone number in the first 30 days; the license plate number bound with the mobile phone number in the first 30 days; the frame number bound by the mobile phone number in the first 30 days; IP Address (internet protocol Address) used by the mobile phone number for the first 30 days. The following relationship data are screened out as connection edge data: the invitation relation between the number of the inviting mobile phone and the number of the invited mobile phone; the association relationship between the mobile phone number and the registration time; the association relationship between the mobile phone number and the equipment number; the association relationship between the mobile phone number and the license plate number; the association relation between the mobile phone number and the frame number; and the association relationship between the mobile phone number and the IP address.
In this embodiment, after the entity data and the relationship data in the user activity information are identified, the entity data and the relationship data are screened according to the scene identifier, so that the required node data and the required connection side data are obtained, and data preparation for user community detection is completed.
Further, the step S204 may include: the method comprises the steps of obtaining a scene identifier detected by a user community and a node type of each node in a first graph network; adding a first node weight to each node according to the scene identifier and the node type; and calculating the first graph network with the first node weight through a preset ranking algorithm so as to update the first node weight of each node to obtain a second node weight and generate a second graph network.
Specifically, the node weight of each node in the first graph network is calculated through a preset ranking algorithm, and the node weight is the embodiment of the node importance. When the ranking algorithm is used, the first graph network needs to be initialized first, and the first node weight of each node is obtained. The first graph network may have nodes of multiple node types, and in different application scenarios, the importance of each type of node is different.
Therefore, when the first node weight of each node is initialized, the scene identifier and the node type detected by the user community can be obtained, and the first node weight of each node is initialized according to the scene identifier and the node type instead of adopting a weight averaging mode. In one embodiment, a weight setting table may be obtained, the scene identifier and the node type are queried in the weight setting table to obtain a weight value, and then the weight value is assigned to the corresponding node. In one embodiment, the sum of the weight values of the nodes is 1.
The ranking algorithm used in the present application may be a PageRank algorithm, which is also known as a web page rank, a Google left rank, or a pecky rank, which is a technique calculated from hyperlinks between web pages. The PageRank algorithm may compute the impact of nodes (e.g., web pages) in the network by a random walk method. And calculating each node in the first graph network through a PageRank algorithm so as to update the first node weight of each node to obtain a second node weight. The result of the PageRank algorithm is a dictionary in the form of key-value pairs (key-values), where key is the node name and value is the PageRank value of the node, i.e., the weight of the node. For example, the result of the PageRank algorithm is {' 134 × ": 0.032, '196.1.1.*': 0.000212, … … }.
The result output of the PageRank algorithm is the importance of each node, and is used for giving weight to nodes and connecting edges used by the subsequent algorithm.
Taking the foregoing example as a support, the weight ratio of the first node during initialization is set as: mobile phone number: equipment number: IP address: license plate number: the number of the frame: registration time is 3: 2: 2: 1: 1: 1, the sum of which is 1.
And then constructing a PageRank parameter, wherein in Python, nstart is used for self-defining a PageRank initial value of each node, and nstart is defined as a dictionary form (key: value), namely init _ nstart is (mobile phone number node: 3, device number node: 2, IP node: 2, license plate number node: 1, frame number node: 1, registering a time node: 1}.
Then, using a PageRank algorithm (Networkx. PageRank algorithm) in the Python Networkx library, that is, Networkx. pageank (G, alpha is 0.85, personalization is None, max _ iter is 100, tol is 1e-06, nstart is init _ nstart, weight is 'weight', and dataling is None), the second node weight of each node can be obtained after the operation is completed.
After the second node weight of each node is obtained, the first graph network is updated, and a second graph network is obtained.
In the embodiment, the first node weights are correspondingly added to the nodes of different types according to the scene identifiers, so that the difference of the importance of each node is reflected during initialization, and the accuracy of the second node weights calculated through the ranking algorithm is ensured.
Further, in another embodiment of the present application, the step S204 may include: acquiring pre-stored activity information corresponding to the user activity information, wherein the pre-stored activity information is provided with a user tag; determining node data in pre-stored activity information according to the first graph network; calculating the characteristic contribution degree of the node data with the user label through a random forest; determining first node weights of all nodes in the first graph network according to the characteristic contribution degrees; and calculating the first graph network with the first node weight through a preset ranking algorithm so as to update the first node weight of each node to obtain a second node weight and generate a second graph network.
Specifically, in another embodiment of the present application, the first node weight of each node may also be determined by a random forest. First, pre-stored activity information is acquired, wherein the pre-stored activity information is the same type of information as the user activity information, but the pre-stored activity information is provided with a user tag. And screening out node data from the pre-stored activity information according to the first graph network, wherein the first graph network contains which types of node data, and which types of node data in the pre-stored activity information are reserved.
The node data provided from the pre-stored activity information points to a user, the user has a user label, at this time, each type of node data can be used as a user characteristic, a random forest is generated according to the user label, and the characteristic contribution degree of each type of user characteristic (node data) is calculated. The calculated feature contribution may be used as a first node weight for a corresponding node in the first graph network.
And then, calculating the first graph network with the first node weight through a preset ranking algorithm so as to update the first node weight of each node to obtain a second node weight, and obtaining a second graph network based on the obtained second node weight.
When the application scenario is fraud group detection, the user tag of the pre-stored activity information is used for marking whether the user is a fraud user; the user is bound with the mobile phone number, so that whether the mobile phone number exists in the blacklist or not can be used as the label. And taking the six types of node data as feature data, and calculating the feature contribution degrees of various types of nodes through a random forest to obtain the first node weights of various types of nodes in the first graph network.
In the embodiment, the pre-stored activity information which is the same as the user activity information in type and has the label can be obtained, the node data is determined in the pre-stored activity information according to the first graph network, and the characteristic contribution degree of the node data with the label is calculated through the random forest, so that the first node weights of various nodes are obtained, and the determination mode of the first node weights is enriched.
Further, the step S205 may include: dividing each node in the second graph network into a first node set and a second node set according to each connecting edge in the second graph network, wherein the nodes in the first node set point to the nodes in the second node set through the connecting edges; for each node in the second node set, assigning the node weight of the node to the connecting edge pointing to the node to obtain the edge weight of the connecting edge; and updating the second graph network according to the edge weight to obtain a third graph network.
Specifically, the connection edges in the second graph network have directivity, and one node is connected to another node through the connection edges. According to the directionality of each connection edge, each node in the second graph network is divided into a first node set list1 and a second node set list2, and a node1 in the first node set can be connected to a node2 in the second node set through the directional connection edge.
For node2 in the second node set, node2 has a second node weight calculated by the ranking algorithm, and the second node weight of node2 is assigned to the connecting edge pointing to node2, which obtains the edge weight. And obtaining edge weights of all the connecting edges in the second graph network to generate a third graph network, wherein the third graph network is a weighted directed graph.
In one embodiment, the edge weights of each connected edge form an edge weight list weights. Generating a weighted directed graph (i.e., a third graph network) using igraph, G2 ═ igraph. graph. tuplelist (list (node1, node2), direct ═ True); and then, applying an InfoMap algorithm to carry out community detection, and obtaining a user community result (G2. community _ InfoMap) (edge _ weights) in the third graph network.
In this embodiment, the weight of the node is assigned to the connection edge pointing to the node, so that the connection edge obtains the edge weight, and the edge weight is added to the calculation by the community discovery algorithm, thereby ensuring the accuracy of the user community detection.
Further, after step S206, the method may further include: calculating each user community in the user community detection results according to a preset fraud identification algorithm to obtain fraud user identification results; and sending the identification result of the fraudulent user to a terminal logged in by a preset account for displaying.
Specifically, the obtained user community detection result includes the detected user community. Subsequent processing may be performed on the user community, such as product recommendations for the user community, or fraud detection for the user community.
The user community can be calculated through a preset fraud recognition algorithm, and a fraud recognition result is obtained. The fraud identification algorithm can be calculated according to a set rule, various nodes in the user community are calculated firstly, and preliminary judgment is carried out on the user community according to a calculation result; and when the user community is judged to be the candidate cheating group at the initial step, detecting the user in the user community to obtain the cheating user identification result.
In connection with the above example of the present disclosure, first, various nodes in each user community are calculated, a user community with a number of mobile phone numbers greater than N (N is a preset threshold, for example, N is 4) is screened out, and the mobile phone numbers in the user community are calculated, and the proportions of the mobile phone numbers sharing the same device number, the same IP address, the same license plate number, the same vehicle frame number, and registered on the same day are determined respectively.
If the proportion of the number of the mobile phone numbers sharing the same equipment in the user community with the number of the mobile phone numbers larger than N to the total number of the mobile phone numbers contained in the user community exceeds 70%, or the proportion of the number of the mobile phone numbers sharing the same license plate number exceeds 70%, or the proportion of the number of the mobile phone numbers sharing the same vehicle frame number exceeds 70%, or the proportion of the number of the mobile phone numbers sharing the same IP address exceeds 70%, or the proportion of the number of the mobile phone numbers registered on the same day exceeds 70% (note that the 70% threshold value used when the proportion is measured here is taken as an example and can be other numerical values, and the shared equipment number, the same IP, the same license plate number, the same vehicle frame number and the proportion threshold value registered on the same day can be different), then the user community is judged to be a candidate fraud group, the mobile phone numbers in the user community are output, and meanwhile, the shared equipment number, the number and the number of the mobile phone number of the user community are shared equipment, The same IP address, the same license plate number, the same frame number, the occupation ratio registered on the same day will also be output.
Further detection of candidate fraudulent groups may be combined with an inventory of black and white lists (true users on the white list and false fraudulent users on the black list) to detect black and white users in candidate fraudulent groups. If a user is on the white list in the candidate fraudulent group, the user in the candidate fraudulent group is detected (e.g. slider authentication of the user), if the user passes the detection, the user is added to the white list, if the user does not pass the detection, the user is added to the black list. If no users of the candidate fraudulent group are on the white list, all users of the candidate fraudulent group are blacklisted. The newly blacklisted user will be determined to be a fraudulent user.
The fraud identification algorithm may also be other algorithms implemented based on artificial intelligence techniques.
And obtaining a fraud user identification result after the fraud identification algorithm is operated, wherein the fraud user identification result can show which users are fraud users and which users are not fraud users. And the identification result of the fraudulent user is sent to the terminal logged in by the preset account so as to be displayed at the terminal, and the fraudulent user is further monitored and controlled.
In the embodiment, the fraud recognition algorithm is used for carrying out fraud recognition on the user community in the user community detection result, and the user community detection result is more accurate, so that the obtained fraud user identification result is more accurate.
The embodiment of the application can acquire and process related data based on an artificial intelligence technology. Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.
The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the computer program is executed. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a Read-only Memory (ROM), or a Random Access Memory (RAM).
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
With further reference to fig. 3, as an implementation of the method shown in fig. 2, the present application provides an embodiment of a user community detection apparatus, which corresponds to the method embodiment shown in fig. 2 and can be applied to various electronic devices.
As shown in fig. 3, the user community detection apparatus 300 according to the present embodiment includes: an information acquisition module 301, a data determination module 302, a network construction module 303, a first update module 304, a second update module 305, and a network computation module 306, wherein:
an information obtaining module 301, configured to obtain user activity information.
A data determining module 302, configured to determine node data and connection edge data in the user activity information.
A network constructing module 303, configured to construct the first graph network according to the node data and the connection edge data.
The first updating module 304 is configured to determine a node weight of each node in the first graph network through a preset ranking algorithm, so as to update the first graph network according to the node weight, and obtain a second graph network.
A second updating module 305, configured to determine an edge weight of each connection edge in the second graph network based on the node weight of each node in the second graph network, so as to update the second graph network according to the edge weight to obtain a third graph network.
And the network computing module 306 is configured to compute the third graph network through a community discovery algorithm to obtain a user community detection result.
In this embodiment, after user activity information is acquired, node data and connection side data are extracted and a first graph network is constructed, and a node weight is added to each node in the first graph network through a preset ranking algorithm to obtain a second graph network; then, based on the node weight of each node in the second graph network, determining the edge weight of each connecting edge, thereby obtaining a third graph network with the edge weight; the edge weight represents the importance of the connecting edge, and enriches the input information of the community discovery algorithm, so that the accuracy of the user community detection result calculated by the community discovery algorithm is improved.
In some optional implementations of this embodiment, the data determining module 302 may include: the data identification submodule, the identification acquisition submodule, the data screening submodule and the data determination submodule, wherein:
and the data identification submodule is used for identifying the entity data and the relationship data in the user activity information.
And the identifier acquisition submodule is used for acquiring the scene identifier detected by the user community.
And the data screening submodule is used for screening the entity data and the relation data according to the scene identification.
And the data determining submodule is used for determining the entity data obtained after screening as node data and determining the relation data obtained after screening as connection edge data.
In this embodiment, after the entity data and the relationship data in the user activity information are identified, the entity data and the relationship data are screened according to the scene identifier, so that the required node data and the required connection side data are obtained, and data preparation for user community detection is completed.
In some optional implementations of this embodiment, the first updating module 304 may include: the method comprises an obtaining submodule, a weight adding submodule and a first updating submodule, wherein:
and the obtaining submodule is used for obtaining the scene identification of the user community detection and the node type of each node in the first graph network.
And the weight adding submodule is used for adding first node weight to each node according to the scene identification and the node type.
And the first updating submodule is used for calculating the first graph network with the first node weight through a preset ranking algorithm so as to update the first node weight of each node to obtain a second node weight and generate a second graph network.
In the embodiment, the first node weights are correspondingly added to the nodes of different types according to the scene identifiers, so that the difference of the importance of each node is reflected during initialization, and the accuracy of the second node weights calculated through the ranking algorithm is ensured.
In other alternative implementations of this embodiment, the first updating module 304 may include: the system comprises an information acquisition sub-module, a data determination sub-module, a contribution degree operator module, a weight determination sub-module and a weight updating sub-module, wherein:
and the information acquisition submodule is used for acquiring pre-stored activity information corresponding to the user activity information, and the pre-stored activity information is provided with a user tag.
And the data determining submodule is used for determining node data in the pre-stored activity information according to the first graph network.
And the contribution degree operator module is used for calculating the characteristic contribution degree of the node data with the user label through the random forest.
And the weight determining submodule is used for determining the first node weight of each node in the first graph network according to the characteristic contribution degree.
And the weight updating submodule is used for calculating the first graph network with the first node weight through a preset ranking algorithm so as to update the first node weight of each node to obtain a second node weight and generate a second graph network.
In the embodiment, the pre-stored activity information which is the same as the user activity information in type and has the label can be obtained, the node data is determined in the pre-stored activity information according to the first graph network, and the characteristic contribution degree of the node data with the label is calculated through the random forest, so that the first node weights of various nodes are obtained, and the determination mode of the first node weights is enriched.
In some optional implementations of this embodiment, the second updating module 305 may include: the node division submodule, the assignment submodule and the second updating submodule, wherein:
and the node division submodule is used for dividing each node in the second graph network into a first node set and a second node set according to each connecting edge in the second graph network, wherein the nodes in the first node set point to the nodes in the second node set through the connecting edges.
And the assignment submodule is used for assigning the node weight of the node to the connecting edge pointing to the node for each node in the second node set to obtain the edge weight of the connecting edge.
And the second updating submodule is used for updating the second graph network according to the edge weight to obtain a third graph network.
In this embodiment, the weight of the node is assigned to the connection edge pointing to the node, so that the connection edge obtains the edge weight, and the edge weight is added to the calculation by the community discovery algorithm, thereby ensuring the accuracy of the user community detection.
In some optional implementations of this embodiment, the network computing module 306 may include: a network computation submodule and a community determination submodule, wherein:
and the network calculation sub-module is used for calculating the third graph network with the edge weight through a community discovery algorithm to obtain a user community in the third graph network, wherein the community discovery algorithm is an InfoMap algorithm.
And the community determining submodule is used for determining the obtained user community as a user community detection result.
In the embodiment, the community discovery algorithm is an InfoMap algorithm, and the InfoMap algorithm takes the edge weight of the connecting edge as input information, so that the community division can be performed more accurately, and the accuracy of the generated user community detection result is improved.
In some optional implementations of the present embodiment, the user community detection apparatus 300 may further include: a fraud calculation module and a result sending module, wherein:
and the fraud calculation module is used for calculating each user community in the user community detection results according to a preset fraud identification algorithm to obtain the fraud user identification results.
And the result sending module is used for sending the identification result of the fraudulent user to the terminal logged in by the preset account for displaying.
In the embodiment, the fraud recognition algorithm is used for carrying out fraud recognition on the user community in the user community detection result, and the user community detection result is more accurate, so that the obtained fraud user identification result is more accurate.
In order to solve the technical problem, an embodiment of the present application further provides a computer device. Referring to fig. 4, fig. 4 is a block diagram of a basic structure of a computer device according to the present embodiment.
The computer device 4 comprises a memory 41, a processor 42, a network interface 43 communicatively connected to each other via a system bus. It is noted that only computer device 4 having components 41-43 is shown, but it is understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead. As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.
The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The computer equipment can carry out man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.
The memory 41 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the memory 41 may be an internal storage unit of the computer device 4, such as a hard disk or a memory of the computer device 4. In other embodiments, the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the computer device 4. Of course, the memory 41 may also include both internal and external storage devices of the computer device 4. In this embodiment, the memory 41 is generally used for storing an operating system installed in the computer device 4 and various types of application software, such as computer readable instructions of a user community detection method. Further, the memory 41 may also be used to temporarily store various types of data that have been output or are to be output.
The processor 42 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 42 is typically used to control the overall operation of the computer device 4. In this embodiment, the processor 42 is configured to execute computer readable instructions stored in the memory 41 or process data, such as computer readable instructions for executing the user community detection method.
The network interface 43 may comprise a wireless network interface or a wired network interface, and the network interface 43 is generally used for establishing communication connection between the computer device 4 and other electronic devices.
The computer device provided in the present embodiment may perform the user community detection method described above. The user community detection method here may be the user community detection method of the above-described embodiments.
In this embodiment, after user activity information is acquired, node data and connection side data are extracted and a first graph network is constructed, and a node weight is added to each node in the first graph network through a preset ranking algorithm to obtain a second graph network; then, based on the node weight of each node in the second graph network, determining the edge weight of each connecting edge, thereby obtaining a third graph network with the edge weight; the edge weight represents the importance of the connecting edge, and enriches the input information of the community discovery algorithm, so that the accuracy of the user community detection result calculated by the community discovery algorithm is improved.
The present application further provides another embodiment, which is to provide a computer-readable storage medium storing computer-readable instructions executable by at least one processor to cause the at least one processor to perform the steps of the user community detection method as described above.
In this embodiment, after user activity information is acquired, node data and connection side data are extracted and a first graph network is constructed, and a node weight is added to each node in the first graph network through a preset ranking algorithm to obtain a second graph network; then, based on the node weight of each node in the second graph network, determining the edge weight of each connecting edge, thereby obtaining a third graph network with the edge weight; the edge weight represents the importance of the connecting edge, and enriches the input information of the community discovery algorithm, so that the accuracy of the user community detection result calculated by the community discovery algorithm is improved.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.
It is to be understood that the above-described embodiments are merely illustrative of some, but not restrictive, of the broad invention, and that the appended drawings illustrate preferred embodiments of the invention and do not limit the scope of the invention. This application is capable of embodiments in many different forms and is provided for the purpose of enabling a thorough understanding of the disclosure of the application. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that the present application may be practiced without modification or with equivalents of some of the features described in the foregoing embodiments. All equivalent structures made by using the contents of the specification and the drawings of the present application are directly or indirectly applied to other related technical fields and are within the protection scope of the present application.

Claims (10)

1. A user community detection method, comprising:
acquiring user activity information;
determining node data and connection side data in the user activity information;
constructing a first graph network according to the node data and the connection edge data;
determining the node weight of each node in the first graph network through a preset ranking algorithm, and updating the first graph network according to the node weight to obtain a second graph network;
determining edge weights of all connecting edges in the second graph network based on the node weights of all nodes in the second graph network, and updating the second graph network according to the edge weights to obtain a third graph network;
and calculating the third graph network through a community discovery algorithm to obtain a user community detection result.
2. The method of claim 1, wherein the step of determining node data and connection edge data in the user activity information comprises:
identifying entity data and relationship data in the user activity information;
acquiring a scene identifier detected by a user community;
screening the entity data and the relationship data according to the scene identification;
and determining the entity data obtained after screening as node data, and determining the relationship data obtained after screening as connection side data.
3. The method of claim 1, wherein the determining node weights of the nodes in the first graph network by a predetermined ranking algorithm to update the first graph network according to the node weights to obtain a second graph network comprises:
acquiring a scene identifier detected by a user community and a node type of each node in the first graph network;
adding a first node weight to each node according to the scene identification and the node type;
and calculating the first graph network with the first node weight through a preset ranking algorithm so as to update the first node weight of each node to obtain a second node weight and generate a second graph network.
4. The method of claim 1, wherein the determining node weights of the nodes in the first graph network by a predetermined ranking algorithm to update the first graph network according to the node weights to obtain a second graph network comprises:
acquiring pre-stored activity information corresponding to the user activity information, wherein the pre-stored activity information is provided with a user tag;
determining node data in the pre-stored activity information according to the first graph network;
calculating the characteristic contribution degree of the node data with the user label through a random forest;
determining a first node weight of each node in the first graph network according to the characteristic contribution degree;
and calculating the first graph network with the first node weight through a preset ranking algorithm so as to update the first node weight of each node to obtain a second node weight and generate a second graph network.
5. The method of claim 1, wherein the determining edge weights of connecting edges in the second graph network based on node weights of nodes in the second graph network to obtain a third graph network by updating the second graph network according to the edge weights comprises:
dividing each node in the second graph network into a first node set and a second node set according to each connecting edge in the second graph network, wherein the nodes in the first node set point to the nodes in the second node set through the connecting edges;
for each node in the second node set, assigning the node weight of the node to a connecting edge pointing to the node to obtain an edge weight of the connecting edge;
and updating the second graph network according to the edge weight to obtain a third graph network.
6. The method of claim 1, wherein the step of computing the third graph network by a community discovery algorithm to obtain the user community detection result comprises:
calculating a third graph network with edge weight through a community discovery algorithm to obtain a user community in the third graph network, wherein the community discovery algorithm is an InfoMap algorithm;
and determining the obtained user community as a user community detection result.
7. The method of claim 1, wherein the step of computing the third graph network by a community discovery algorithm to obtain the user community detection result further comprises:
calculating each user community in the user community detection results according to a preset fraud identification algorithm to obtain fraud user identification results;
and sending the identification result of the fraudulent user to a terminal logged in by a preset account for displaying.
8. A user community detection apparatus, comprising:
the information acquisition module is used for acquiring user activity information;
the data determining module is used for determining node data and connection edge data in the user activity information;
the network construction module is used for constructing a first graph network according to the node data and the connection edge data;
the first updating module is used for determining the node weight of each node in the first graph network through a preset ranking algorithm so as to update the first graph network according to the node weight to obtain a second graph network;
a second updating module, configured to determine an edge weight of each connection edge in the second graph network based on a node weight of each node in the second graph network, so as to update the second graph network according to the edge weight to obtain a third graph network;
and the network computing module is used for computing the third graph network through a community discovery algorithm to obtain a user community detection result.
9. A computer device comprising a memory having computer readable instructions stored therein and a processor which when executed implements the steps of the user community detection method of any one of claims 1 to 7.
10. A computer-readable storage medium having computer-readable instructions stored thereon which, when executed by a processor, implement the steps of the user community detection method according to any one of claims 1 to 7.
CN202111432615.8A 2021-11-29 2021-11-29 User community detection method and device, computer equipment and storage medium Pending CN114092268A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111432615.8A CN114092268A (en) 2021-11-29 2021-11-29 User community detection method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111432615.8A CN114092268A (en) 2021-11-29 2021-11-29 User community detection method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN114092268A true CN114092268A (en) 2022-02-25

Family

ID=80305457

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111432615.8A Pending CN114092268A (en) 2021-11-29 2021-11-29 User community detection method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN114092268A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115174450A (en) * 2022-07-05 2022-10-11 中孚信息股份有限公司 Unknown equipment identification method and system based on network node representation

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115174450A (en) * 2022-07-05 2022-10-11 中孚信息股份有限公司 Unknown equipment identification method and system based on network node representation
CN115174450B (en) * 2022-07-05 2023-10-03 中孚信息股份有限公司 Unknown equipment identification method and system based on network node characterization

Similar Documents

Publication Publication Date Title
CN110428058B (en) Federal learning model training method, device, terminal equipment and storage medium
US11836643B2 (en) System for secure federated learning
Ghazal et al. DDoS Intrusion Detection with Ensemble Stream Mining for IoT Smart Sensing Devices
CN112116008B (en) Processing method of target detection model based on intelligent decision and related equipment thereof
CN108429718B (en) Account identification method and device
CN109600344B (en) Method and device for identifying risk group and electronic equipment
CN112132676B (en) Method and device for determining contribution degree of joint training target model and terminal equipment
CN111835561A (en) Abnormal user group detection method, device and equipment based on user behavior data
US20190012362A1 (en) Method and apparatus for processing information
CN112381236A (en) Data processing method, device, equipment and storage medium for federal transfer learning
CN111557014B (en) Method and system for providing multiple personal data
CN111641517A (en) Community division method and device for homogeneous network, computer equipment and storage medium
CN109272378A (en) A kind of discovery method and apparatus of risk group
CN114092268A (en) User community detection method and device, computer equipment and storage medium
CN111667018B (en) Object clustering method and device, computer readable medium and electronic equipment
CN117376000A (en) Block chain-based data processing method, device, equipment and storage medium
CN109992960B (en) Counterfeit parameter detection method and device, electronic equipment and storage medium
CN115576837A (en) Batch number making method and device, computer equipment and storage medium
CN109962907B (en) User identity recognition method based on big data and terminal equipment
CN112764923A (en) Computing resource allocation method and device, computer equipment and storage medium
CN112738213A (en) Block chain-based task demand response method, device, system and storage medium
KR20220076765A (en) Method, system, and computer program for setting categories of community
CN111898033A (en) Content pushing method and device and electronic equipment
KR20200009887A (en) Method and system for determining image similarity
CN113726785B (en) Network intrusion detection method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination