WO2022179384A1 - 一种社交群体的划分方法、划分系统及相关装置 - Google Patents

一种社交群体的划分方法、划分系统及相关装置 Download PDF

Info

Publication number
WO2022179384A1
WO2022179384A1 PCT/CN2022/074604 CN2022074604W WO2022179384A1 WO 2022179384 A1 WO2022179384 A1 WO 2022179384A1 CN 2022074604 W CN2022074604 W CN 2022074604W WO 2022179384 A1 WO2022179384 A1 WO 2022179384A1
Authority
WO
WIPO (PCT)
Prior art keywords
social
clustering
node
nodes
social network
Prior art date
Application number
PCT/CN2022/074604
Other languages
English (en)
French (fr)
Inventor
张灿
刘伟
牟奇
Original Assignee
山东英信计算机技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 山东英信计算机技术有限公司 filed Critical 山东英信计算机技术有限公司
Publication of WO2022179384A1 publication Critical patent/WO2022179384A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Definitions

  • the present application relates to the field of data processing, and in particular, to a method for dividing social groups, a dividing system and related devices.
  • Cluster analysis in social groups is a process of dividing a data sample into groupings consisting of similar objects. Each group is called a cluster, and the similarity of data objects in each cluster is large, while the similarity of objects in different clusters is small.
  • social network clustering can divide nodes into different clusters according to the specific location information of nodes in the network in actual situations, and show different organizational clusters hidden in the social network structure in actual situations, so as to improve the understanding of social networks. Network data mining and analysis capabilities.
  • the traditional social network division method only describes the dissemination and diffusion process of information in social networks to a certain extent, but due to the lack of pre-processing, the propagation paths with less influence are still calculated, which affects the division of different social networks. accuracy, resulting in an insignificant clustering effect.
  • the purpose of this application is to provide a social group division method, division system, computer-readable storage medium and electronic device, which improve the credibility of social networks by clustering social network graphs obtained by random walks.
  • the application provides a method for dividing social groups, and the specific technical solutions are as follows:
  • a preset clustering method is used to perform binary clustering on the nodes in the social network graph to obtain social groups that meet the clustering requirements.
  • performing a random walk according to the network structure and the node information to obtain a social network graph includes:
  • Apriori algorithm is used to select frequent itemsets in the walking path to obtain the social network graph.
  • the method further includes:
  • the probability transition matrix is W ij is the weight of the connection between node i and node j, and W ig is the sum of the i-th row of the network weight matrix;
  • performing random walks with a preset number of times and a preset number of steps include:
  • the method before using the probability transition matrix to determine the random walk probability of the node, the method further includes:
  • the network weight matrix corresponding to the social network graph is determined according to the node information and the network structure.
  • using a preset clustering method to perform binary clustering on the nodes in the social network graph includes:
  • Kernighan-Lin algorithm or spectral bisection method to perform bipartite clustering of nodes in social network graph.
  • obtaining social groups that meet the clustering requirements include:
  • the social network graph is randomly divided into two subgraphs, a node is taken from each of the two subgraphs to exchange, and the difference between the gain functions before and after the node exchange is calculated; the gain function is the two subgraphs The difference between the number of edges within and the number of edges between the two subgraphs;
  • the remaining nodes are repeatedly exchanged until the difference of the gain function is less than zero or all nodes in the existing subgraph are exchanged once to obtain the two subgraphs after the first iteration;
  • the present application also provides a system for dividing social groups, including:
  • a data acquisition module used for acquiring social data and clustering requirements, and determining the network structure and node information corresponding to the social data
  • a social network confirmation module configured to perform a random walk according to the network structure and the node information to obtain a social network graph
  • the clustering module is configured to perform binary clustering on the nodes in the social network graph by using a preset clustering method to obtain social groups that meet the clustering requirements.
  • the social network confirmation module includes:
  • a walking unit used for starting from each node in the network structure, performing random walks of a preset number of times and a preset number of steps, and recording the walking path of the random walk;
  • the social network generating unit is used for selecting the frequent itemsets in the walking path by using the Apriori algorithm to obtain the social network graph.
  • the present application also provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements the steps of the above-described method.
  • the present application also provides an electronic device, comprising a memory and a processor, wherein a computer program is stored in the memory, and the processor implements the steps of the above method when the computer program in the memory is invoked.
  • the present application provides a method for dividing social groups, including: acquiring social data and clustering requirements, and determining a network structure and node information corresponding to the social data; performing random walks according to the network structure and the node information, A social network graph is obtained; a preset clustering method is used to perform binary clustering on the nodes in the social network graph to obtain a social group that meets the clustering requirements.
  • This application first adopts a simple random walk mechanism, and uses all nodes as starting nodes to randomly walk to form a new social network graph. strong group.
  • the application is simple and convenient, and under the support of current big data technology, it is easy to simulate and realize the division of different social groups through software, and conforms to the social network structure of real life, which has certain practical significance.
  • the present application also provides a social group division system, a computer-readable storage medium and an electronic device, which have the above-mentioned beneficial effects, and will not be repeated here.
  • FIG. 1 is a flowchart of a method for dividing a social group provided by an embodiment of the present application
  • FIG. 2 is a schematic structural diagram of a system for dividing a social group according to an embodiment of the present application.
  • FIG. 1 is a flowchart of a method for dividing a social group provided by an embodiment of the present application. The method includes:
  • S101 Obtain social data and clustering requirements, and determine the network structure and node information corresponding to the social data;
  • the purpose of this step is to obtain social data and clustering requirements.
  • the social data refers to the original social network data, which may include user information, and communication information associated with the user, which may be presented in the form of communication records.
  • the communication information usually has a corresponding communication target, so as to form the communication between users, indicating that there is a connection between users in a social network.
  • the communication between users is directional, that is, the communication between user A and user B, and the communication process between user B and user A are two communication processes, each including communication attribute information such as communication frequency, and
  • the communication attribute information can be considered as a user's influence over another user.
  • each user is usually regarded as a node, and the influence relationship of a node to another node can also be obtained.
  • the clustering requirement refers to a clustering standard for social network data, and the specific content of the clustering requirement is not limited here, which may be parameters such as community density, community quality, and the like.
  • a community refers to each class in a social network. The better the community division is, the more edges within the community as possible, and the fewer edges between the communities as possible, that is, the fewer the intersections between the classes, the better the clustering effect.
  • Those skilled in the art can determine the clustering requirements according to the actual clustering requirements. Of course, clustering requirements, such as modularity, can also be used, which are not limited here.
  • social data is social network data, it usually includes a set of points and a set of edges, and the network structure can be determined by the set of points and edges.
  • the node information refers to the user information of each user in the social data.
  • this step is a process of obtaining a weighted directed graph according to social data.
  • G the social network
  • v ⁇ V the set of nodes
  • v ⁇ V the set of nodes
  • Each node v ⁇ V represents a user in the social network
  • each edge (u, v) ⁇ E represents the influence relationship from node u to node v.
  • the edge is directed, that is, the influence is directional, node u has influence on node v, but node v may not have influence on node u.
  • the weight of the edge represents the size of the influence.
  • S102 Perform a random walk according to the network structure and the node information to obtain a social network graph
  • the purpose of this step is to perform a random walk, resulting in a social network graph.
  • the random walk can start from any node in the network structure determined in the previous step.
  • the number of random walks and the number of steps are not limited here. And the number of random walks and the number of steps can be preset before this step, and can also be calculated before the random walk by means of a matrix or a function.
  • execution of this step may include the following processes:
  • Step 1 Starting from each node in the network structure, perform random walks with a preset number of times and a preset number of steps, and record the walking path of the random walk;
  • the Apriori algorithm is used to select frequent itemsets in the walking path to obtain the social network graph.
  • Apriori algorithm is a commonly used algorithm for mining data association rules, which is used to find frequently occurring data sets in data values.
  • the Apriori algorithm is optionally used to select frequent itemsets, and those skilled in the art may also use other algorithms to select frequent itemsets to obtain a social network graph.
  • the optimization algorithm of Apriori algorithm AprioriTid algorithm, etc.
  • the Apriori algorithm uses support and confidence to quantify frequent itemsets and association rules, and it mines frequent itemsets through two stages of candidate set generation and plot downward closure test detection. The mining results of the Apriori algorithm are universal and convincing, and the algorithm is simple, and the data requirements for social data are low.
  • the probability of random walks can also be calculated.
  • the probability transition matrix can be used to determine the random walk probability of the node.
  • Wij is the weight of the connection between node i and node j
  • Wig is the sum of the i-th row of the network weight matrix.
  • the network weight matrix corresponding to the social network graph may be determined first according to the node information and the network structure.
  • the network weight matrix contains the weights of the edges in the social network graph and is used to refer to the communication status between nodes.
  • the random walks of a preset number of times and a preset number of steps may be performed from each node in the network structure according to the random walk probability. It should be noted that the preset number of times refers to the number of random walks performed by each node. If the number of random walks is m, then if there are n nodes in the network, the number of all paths formed through this step is m*n.
  • S103 Use a preset clustering method to perform binary clustering on the nodes in the social network graph to obtain a social group that meets the clustering requirement.
  • a preset clustering method can be used for clustering to obtain social groups that meet the clustering requirements.
  • This embodiment does not limit which clustering method is used, and the Kernighan-Lin algorithm or the spectral bisection method may be used to perform binary clustering on the nodes in the social network graph. No matter what kind of binary clustering method is used, since this embodiment uses all nodes as starting nodes to perform random walk, a new social network graph is formed, which increases the credibility of the social network to a certain extent, and at the same time helps to divide the influential group.
  • the following uses the preset clustering method Kernighan-Lin algorithm as an example to describe the specific process of using the preset clustering method to perform binary clustering on nodes in the social network graph to obtain social groups that meet the clustering requirements:
  • the first step is to randomly divide the social network graph into two subgraphs, take a node from each of the two subgraphs to exchange, and calculate the difference between the gain functions before and after the node exchange;
  • the gain functions are two the difference between the number of edges in the subgraph and the number of edges between the two subgraphs;
  • the two nodes are exchanged when the difference of the gain function is the largest, and each node in the two subgraphs is exchanged at most once in each iteration process;
  • the third step is to repeatedly exchange the remaining nodes until the difference of the gain function is less than zero or all nodes in the existing subgraph are exchanged once to obtain the two subgraphs after the first iteration;
  • the fourth step is to judge whether the current two subgraphs meet the clustering requirements; if so, go to the fifth step; if not, go to the sixth step;
  • the fifth step taking the current two subgraphs as social groups that meet the clustering requirements
  • Step 6 Repeat the iteration until two subgraphs that meet the clustering requirements are obtained.
  • clustering requirements continue to cluster the subgraphs K' 1 and K' 2 by using the same method until the clustering requirements are met.
  • the following describes a system for dividing a social group provided by an embodiment of the present application.
  • the dividing system described below and the method for dividing a social group described above may refer to each other correspondingly.
  • FIG. 2 is a schematic structural diagram of a system for dividing a social group provided by an embodiment of the application, and the application also provides a system for dividing a social group, including:
  • the data acquisition module 100 is used for acquiring social data and clustering requirements, and determining the network structure and node information corresponding to the social data;
  • a social network confirmation module 200 configured to perform a random walk according to the network structure and the node information to obtain a social network graph
  • the clustering module 300 is configured to perform binary clustering on nodes in the social network graph by using a preset clustering method to obtain social groups that meet the clustering requirements.
  • the social network confirmation module 200 includes:
  • a walking unit used for starting from each node in the network structure, performing random walks of a preset number of times and a preset number of steps, and recording the walking path of the random walk;
  • the social network generating unit is used for selecting the frequent itemsets in the walking path by using the Apriori algorithm to obtain the social network graph.
  • the probability confirmation module is used to determine the random walk probability of the node by using the probability transition matrix;
  • the probability transition matrix is W ij is the weight of the connection between node i and node j, and W ig is the sum of the i-th row of the network weight matrix;
  • the walking unit is a unit for performing random walks of a preset number of times and a preset number of steps from each node in the network structure according to the random walk probability.
  • a weight confirmation module configured to determine the network weight matrix corresponding to the social network graph according to the node information and the network structure.
  • the clustering module 300 includes:
  • the clustering unit is used to perform binary clustering of nodes in a social network graph using the Kernighan-Lin algorithm or spectral bisection method.
  • the clustering module 300 is a module for performing the following steps:
  • the social network graph is randomly divided into two subgraphs, a node is taken from each of the two subgraphs to exchange, and the difference between the gain functions before and after the node exchange is calculated; the gain function is the two subgraphs The difference between the number of edges in the two subgraphs and the number of edges between the two subgraphs; the two nodes are exchanged when the difference of the gain function is the largest, and each node in the two subgraphs is exchanged at most once in each iteration process ; Repeat the exchange of the remaining nodes until the difference of the gain function is less than zero or all nodes in the existing subgraph are exchanged once to obtain the two subgraphs after the first iteration; judge whether the current two subgraphs satisfy the requirements If yes, take the current two subgraphs as social groups that meet the clustering requirements; if not, repeat the iteration until two subgraphs that meet the clustering requirements are obtained.
  • the present application also provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed, the steps provided by the above embodiments can be implemented.
  • the storage medium may include: U disk, removable hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes.
  • the present application also provides an electronic device, which may include a memory and a processor, where a computer program is stored in the memory, and when the processor invokes the computer program in the memory, the steps provided in the above embodiments can be implemented.
  • the electronic device may also include various network interfaces, power supplies and other components.

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请提供一种社交群体的划分方法,包括:获取社交数据和聚类要求,并确定社交数据对应的网络结构和节点信息(S101);根据网络结构和节点信息进行随机游走,得到社交网络图(S102);利用预设聚类方法对社交网络图中的节点进行二分聚类,得到满足聚类要求的社交群体(S103)。本申请首先采用简单的随机游走机制,以所有节点作为起始节点进行随机游走,形成新的社交网络图,在一定程度上增加了社交网络可信度,同时有利于划分出影响力较强的群体。本申请简单方便,易于通过软件方式进行模拟并实现不同社交群体的划分,且符合现实生活的社交网络结构。本申请还提供一种社交群体的划分系统、计算机可读存储介质和电子设备,具有上述有益效果。

Description

一种社交群体的划分方法、划分系统及相关装置
本申请要求在2021年2月26日提交中国专利局、申请号为202110218531.8、发明名称为“一种社交群体的划分方法、划分系统及相关装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及数据处理领域,特别涉及一种社交群体的划分方法、划分系统及相关装置。
背景技术
随着互联网技术的发展,在线社交网络呈爆炸式发展,人们的生活已经离不开在线社交网络,也时刻被它影响和改变着。深入研究影响力的传播模式有助于理解人类群体和个体的行为,从而对人们的行为做出预期,为政府、企业等各部门的决策提供可靠的依据和建议。
社交群体中的聚类分析是一个将数据样本划分为由相似对象组成的分组的过程。每一个组称为一个簇,每个簇中的数据对象的相似度大,而不同簇中的对象相似度小。针对实际情况下的社交网络,社交网络聚类可以按照实际情况下网络中节点的具体位置信息将节点分到不同的簇,展现实际情况下社交网络结构中隐藏的不同组织集群,从而改进对社交网络数据的挖掘分析能力。
传统的社交网络划分方法仅仅在一定程度上描述了社交网络中信息的传播和扩散过程,但由于缺乏前期预处理,对于影响力较小的传播路径仍进行了计算,影响了不同社交网络划分的精准性,导致聚类效果不显著。
发明内容
本申请的目的是提供一种社交群体的划分方法、划分系统、计算机可读存储介质和电子设备,通过对随机游走得到的社交网络图进行聚类,提高了社交网络可信度。
为解决上述技术问题,本申请提供一种社交群体的划分方法,具体技术方案如下:
获取社交数据和聚类要求,并确定所述社交数据对应的网络结构和节点信息;
根据所述网络结构和所述节点信息进行随机游走,得到社交网络图;
利用预设聚类方法对社交网络图中的节点进行二分聚类,得到满足所述聚类要求的社交群体。
可选的,根据所述网络结构和所述节点信息进行随机游走,得到社交网络图包括:
从所述网络结构中的每个节点出发,进行预设次数和预设步数的随机游走,并记录随机游走的游走路径;
利用Apriori算法选取所述游走路径中的频繁项集,得到所述社交网络图。
可选的,从所述网络结构中的每个节点出发,进行预设次数和预设步数的随机游走之前,还包括:
利用概率转移矩阵确定节点的随机游走概率;
所述概率转移矩阵为
Figure PCTCN2022074604-appb-000001
W ij为节点i和节点j之间连边的权重,W ig为网络权重矩阵的第i行总和;
则从所述网络结构中的每个节点出发,进行预设次数和预设步数的随机游走包括:
根据所述随机游走概率从所述网络结构中的每个节点出发,进行预设次数和预设步数的随机游走。
可选的,利用概率转移矩阵确定节点的随机游走概率之前,还包括:
根据所述节点信息和网络结构确定所述社交网络图对应的所述网络权重矩阵。
可选的,利用预设聚类方法对社交网络图中的节点进行二分聚类包括:
利用Kernighan-Lin算法或谱平分法对社交网络图中的节点进行二分聚类。
可选的,若所述预设聚类方法Kernighan-Lin算法,利用预设聚类方法对社交网络图中的节点进行二分聚类,得到满足所述聚类要求的社交群体包括:
将所述社交网络图随机划分为两个子图,从两个所述子图中各取一个节点进行交换,计算节点交换前后的增益函数的差值;所述增益函数为两个所述子图内的边数与两个子图之间边数的差值;
将所述增益函数的差值最大时两个节点进行交换,且两个子图中的每个节点在每次迭代过程中最多交换一次;
对剩余节点重复进行交换,直至所述增益函数的差值小于零或存在子图中的所有节点均被交换一次,得到第一次迭代后的两个子图;
判断当前两个子图是否满足所述聚类要求;
若是,将当前两个子图作为满足所述聚类要求的社交群体;
若否,重复迭代,直至得到满足所述聚类要求的两个子图。
本申请还提供一种社交群体的划分系统,包括:
数据获取模块,用于获取社交数据和聚类要求,并确定所述社交数据对应的网络结构和节点信息;
社交网络确认模块,用于根据所述网络结构和所述节点信息进行随机游走,得到社交网络图;
聚类模块,用于利用预设聚类方法对社交网络图中的节点进行二分聚类,得到满足所述聚类要求的社交群体。
可选的,社交网络确认模块包括:
游走单元,用于从所述网络结构中的每个节点出发,进行预设次数和预设步数的随机游走,并记录随机游走的游走路径;
社交网络生成单元,用于利用Apriori算法选取所述游走路径中的频繁项集,得到所述社交网络图。
本申请还提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现如上所述的方法的步骤。
本申请还提供一种电子设备,包括存储器和处理器,所述存储器中存有计算机程序,所述处理器调用所述存储器中的计算机程序时实现如上所述的方法的步骤。
本申请提供一种社交群体的划分方法,包括:获取社交数据和聚类要求,并确定所述社交数据对应的网络结构和节点信息;根据所述网络结构和所述 节点信息进行随机游走,得到社交网络图;利用预设聚类方法对社交网络图中的节点进行二分聚类,得到满足所述聚类要求的社交群体。
本申请首先采用简单的随机游走机制,以所有节点作为起始节点进行随机游走,形成新的社交网络图,在一定程度上增加了社交网络可信度,同时有利于划分出影响力较强的群体。本申请简单方便,在当前大数据技术的支撑下,易于通过软件方式进行模拟并实现不同社交群体的划分,且符合现实生活的社交网络结构,有一定的现实意义。
本申请还提供一种社交群体的划分系统、计算机可读存储介质和电子设备,具有上述有益效果,此处不再赘述。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。
图1为本申请实施例所提供的一种社交群体的划分方法的流程图;
图2为本申请实施例所提供的一种社交群体的划分系统结构示意图。
具体实施方式
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
请参考图1,图1为本申请实施例所提供的一种社交群体的划分方法的流程图,该方法包括:
S101:获取社交数据和聚类要求,并确定所述社交数据对应的网络结构和节点信息;
本步骤旨在获取社交数据和聚类要求,该社交数据指原始的社交网络数据,其可以包含用户信息,以及与用户存在关联的通信信息,其可以以通信记录的形式呈现。而通信信息通常又存在对应的通信目标,以此形成用户与 用户之间的通信,在社交网络中表明用户与用户之间存在连接。需要注意的是,用户与用户之间的通信具有方向性,即用户A对用户B的通信,与用户B对用户A的通信过程为两个通信过程,各自包含通信频率等通信属性信息,而通信属性信息可以认为用户对另一用户的影响力。在社交网络中,通常将每个用户视为一个节点,则也可以得到节点对另一节点的影响力关系。
聚类要求指对于社交网络数据的聚类标准,在此对于聚类要求的具体内容不作限定,其可以为社团密度、社团优劣等参数。社团指社交网络中的每一个类。社团划分越好,社团内部的边尽可能地多,社团之间的边尽可能地少,即类与类之间的交集越少说明聚类效果越好。本领域技术人员可以根据实际聚类需求确定聚类要求。当然聚类要求,还可以使用模块度等聚类要求,在此不一一限定。
由于社交数据为社交网络数据,其通常包含点集合和边集合,由点集合和边集合即可确定网络结构。而节点信息指社交数据中每个用户的用户信息。
换句话说,本步骤为根据社交数据,得到带权有向图的过程。若把社交网络表示为一个带权有向图G=(V,E),其中v∈V是节点的集合,
Figure PCTCN2022074604-appb-000002
是有向边的集合。每一个节点v∈V表示社交网络中的一个用户,每一条边(u,v)∈E表示节点u到节点v的影响力关系。边是有向的,即影响力是有方向的,节点u对节点v有影响力,但节点v对节点u可能没有影响力。边的权重表示影响力的大小。
S102:根据所述网络结构和所述节点信息进行随机游走,得到社交网络图;
本步骤旨在执行随机游走,从而得到社交网络图。随机游走可以从上一步骤确定的网络结构中的任一节点开始。在此对于随机游走的次数和步数均不限定。且随机游走的次数和步数可以在本步骤前预先设定,也可以利用矩阵或者函数等方式在随机游走前进行计算得到。
可选的,本步骤执行时可以包括如下过程:
第一步、从所述网络结构中的每个节点出发,进行预设次数和预设步数的随机游走,并记录随机游走的游走路径;
第二步、利用Apriori算法选取所述游走路径中的频繁项集,得到所述社交网络图。
Apriori算法是常用的用于挖掘出数据关联规则的算法,它用来找出数据值中频繁出现的数据集合。当然本实施例可选的采用了Apriori算法选取频繁项集,本领域技术人员还可以采用其他算法选取频繁项集,以得到社交网络图。例如Apriori算法的优化算法—AprioriTid算法等。Apriori算法用支持度和置信度来量化频繁项集和关联规则,其通过候选集生成和情节的向下封闭检验检测两个阶段来挖掘频繁项集。Apriori算法的挖掘结果具有普遍性,信服力较强,且算法简单,对于社交数据的数据要求较低。
此外,从所述网络结构中的每个节点出发,进行预设次数和预设步数的随机游走之前,还可以计算随机游走的概率。具体的,可以利用概率转移矩阵确定节点的随机游走概率。
概率转移矩阵为
Figure PCTCN2022074604-appb-000003
W ij为节点i和节点j之间连边的权重,W ig为网络权重矩阵的第i行总和。可以先根据所述节点信息和网络结构确定社交网络图对应的网络权重矩阵。网络权重矩阵包含了社交网络图中边的权重,用于指代节点与节点之间的通信状况。
若先计算随机游走的频率,则可以根据所述随机游走概率从所述网络结构中的每个节点出发,进行预设次数和预设步数的随机游走。需要注意的是,预设次数指每个节点进行随机游走的次数。若记随机游走的次数为m,则如果网络中共有n个节点,经过该步骤形成的所有路径个数为m*n。
S103:利用预设聚类方法对社交网络图中的节点进行二分聚类,得到满足所述聚类要求的社交群体。
在得到社交网络图后,即可利用预设聚类方法进行聚类,以得到满足聚类要求的社交群体。
本实施例对于采用何种聚类方法不作限定,可以利用Kernighan-Lin算法或谱平分法对社交网络图中的节点进行二分聚类。而无论采用何种二分聚类方法,由于本实施例以所有节点作为起始节点进行随机游走,形成新的社交网络图,在一定程度上增加了社交网络可信度,同时有利于划分出影响力较强的群体。
下文以预设聚类方法Kernighan-Lin算法为例,对利用预设聚类方法对社交网络图中的节点进行二分聚类,得到满足所述聚类要求的社交群体的具体过程进行说明:
第一步、将所述社交网络图随机划分为两个子图,从两个所述子图中各取一个节点进行交换,计算节点交换前后的增益函数的差值;所述增益函数为两个所述子图内的边数与两个子图之间边数的差值;
第二步、将所述增益函数的差值最大时两个节点进行交换,且两个子图中的每个节点在每次迭代过程中最多交换一次;
第三步、对剩余节点重复进行交换,直至所述增益函数的差值小于零或存在子图中的所有节点均被交换一次,得到第一次迭代后的两个子图;
第四步、判断当前两个子图是否满足所述聚类要求;若是,进入第五步;若否,进入第六步;
第五步、将当前两个子图作为满足所述聚类要求的社交群体;
第六步、重复迭代,直至得到满足所述聚类要求的两个子图。
具体的,将社交网络图随机划分为已知大小的两个子图K 1、K 2,定义增益函数:Q=两个社团内的边数-社团之间的边数,此时每个子图相当于一个社团。从两个子图中各取一个节点准备交换,尝试交换并计算ΔQ=Q交换后-Q交换前,选择使ΔQ最大的一对节点对交换。每个节点只能交换一次。
对剩余节点重复上一步操作,直到ΔQ<0,或者某个子图的所有节点都被交换了一次为止。允许每个节点的第二次交换,开始新一轮迭代,直到没有节点对可以交换。此时原社交网络图被划分为两个子图K' 1、K' 2。同一子图中节点之间的相似度大,而不同子图中节点之间的相似度较小。
根据聚类要求使用相同方法继续对子图K' 1、K' 2进行聚类划分,直至满足聚类要求。
下面对本申请实施例提供的一种社交群体的划分系统进行介绍,下文描述的划分系统与上文描述的一种社交群体的划分方法可相互对应参照。
图2为本申请实施例所提供的一种社交群体的划分系统结构示意图,本申请还提供一种社交群体的划分系统,包括:
数据获取模块100,用于获取社交数据和聚类要求,并确定所述社交数据对应的网络结构和节点信息;
社交网络确认模块200,用于根据所述网络结构和所述节点信息进行随机游走,得到社交网络图;
聚类模块300,用于利用预设聚类方法对社交网络图中的节点进行二分聚类,得到满足所述聚类要求的社交群体。
基于上述实施例,作为可选的实施例,社交网络确认模块200包括:
游走单元,用于从所述网络结构中的每个节点出发,进行预设次数和预设步数的随机游走,并记录随机游走的游走路径;
社交网络生成单元,用于利用Apriori算法选取所述游走路径中的频繁项集,得到所述社交网络图。
基于上述实施例,作为可选的实施例,还包括:
概率确认模块,用于利用概率转移矩阵确定节点的随机游走概率;所述概率转移矩阵为
Figure PCTCN2022074604-appb-000004
W ij为节点i和节点j之间连边的权重,W ig为网络权重矩阵的第i行总和;
则游走单元为用于根据所述随机游走概率从所述网络结构中的每个节点出发,进行预设次数和预设步数的随机游走的单元。
基于上述实施例,作为可选的实施例,还包括:
权重确认模块,用于根据所述节点信息和网络结构确定所述社交网络图对应的所述网络权重矩阵。
基于上述实施例,作为可选的实施例,聚类模块300包括:
聚类单元,用于利用Kernighan-Lin算法或谱平分法对社交网络图中的节点进行二分聚类。
基于上述实施例,作为可选的实施例,若所述预设聚类方法Kernighan-Lin算法,聚类模块300为用于执行如下步骤的模块:
将所述社交网络图随机划分为两个子图,从两个所述子图中各取一个节点进行交换,计算节点交换前后的增益函数的差值;所述增益函数为两个所述子图内的边数与两个子图之间边数的差值;将所述增益函数的差值最大时两个节点进行交换,且两个子图中的每个节点在每次迭代过程中最多交换一次;对剩余节点重复进行交换,直至所述增益函数的差值小于零或存在子图 中的所有节点均被交换一次,得到第一次迭代后的两个子图;判断当前两个子图是否满足所述聚类要求;若是,将当前两个子图作为满足所述聚类要求的社交群体;若否,重复迭代,直至得到满足所述聚类要求的两个子图。
本申请还提供了一种计算机可读存储介质,其上存有计算机程序,该计算机程序被执行时可以实现上述实施例所提供的步骤。该存储介质可以包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
本申请还提供了一种电子设备,可以包括存储器和处理器,所述存储器中存有计算机程序,所述处理器调用所述存储器中的计算机程序时,可以实现上述实施例所提供的步骤。当然所述电子设备还可以包括各种网络接口,电源等组件。
说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似部分互相参见即可。对于实施例提供的系统而言,由于其与实施例提供的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。
本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想。应当指出,对于本技术领域的普通技术人员来说,在不脱离本申请原理的前提下,还可以对本申请进行若干改进和修饰,这些改进和修饰也落入本申请权利要求的保护范围内。
还需要说明的是,在本说明书中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素, 并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。

Claims (10)

  1. 一种社交群体的划分方法,其特征在于,包括:
    获取社交数据和聚类要求,并确定所述社交数据对应的网络结构和节点信息;
    根据所述网络结构和所述节点信息进行随机游走,得到社交网络图;
    利用预设聚类方法对社交网络图中的节点进行二分聚类,得到满足所述聚类要求的社交群体。
  2. 根据权利要求1所述的社交群体的划分方法,其特征在于,根据所述网络结构和所述节点信息进行随机游走,得到社交网络图包括:
    从所述网络结构中的每个节点出发,进行预设次数和预设步数的随机游走,并记录随机游走的游走路径;
    利用Apriori算法选取所述游走路径中的频繁项集,得到所述社交网络图。
  3. 根据权利要求2所述的社交群体的划分方法,其特征在于,从所述网络结构中的每个节点出发,进行预设次数和预设步数的随机游走之前,还包括:
    利用概率转移矩阵确定节点的随机游走概率;
    所述概率转移矩阵为
    Figure PCTCN2022074604-appb-100001
    W ij为节点i和节点j之间连边的权重,W ig为网络权重矩阵的第i行总和;
    则从所述网络结构中的每个节点出发,进行预设次数和预设步数的随机游走包括:
    根据所述随机游走概率从所述网络结构中的每个节点出发,进行预设次数和预设步数的随机游走。
  4. 根据权利要求3所述的社交群体的划分方法,其特征在于,利用概率转移矩阵确定节点的随机游走概率之前,还包括:
    根据所述节点信息和网络结构确定所述社交网络图对应的所述网络权重矩阵。
  5. 根据权利要求1所述的社交群体的划分方法,其特征在于,利用预设聚类方法对社交网络图中的节点进行二分聚类包括:
    利用Kernighan-Lin算法或谱平分法对社交网络图中的节点进行二分聚类。
  6. 根据权利要求1所述的社交群体的划分方法,其特征在于,若所述预设聚类方法Kernighan-Lin算法,利用预设聚类方法对社交网络图中的节点进行二分聚类,得到满足所述聚类要求的社交群体包括:
    将所述社交网络图随机划分为两个子图,从两个所述子图中各取一个节点进行交换,计算节点交换前后的增益函数的差值;所述增益函数为两个所述子图内的边数与两个子图之间边数的差值;
    将所述增益函数的差值最大时两个节点进行交换,且两个子图中的每个节点在每次迭代过程中最多交换一次;
    对剩余节点重复进行交换,直至所述增益函数的差值小于零或存在子图中的所有节点均被交换一次,得到第一次迭代后的两个子图;
    判断当前两个子图是否满足所述聚类要求;
    若是,将当前两个子图作为满足所述聚类要求的社交群体;
    若否,重复迭代,直至得到满足所述聚类要求的两个子图。
  7. 一种社交群体的划分系统,其特征在于,包括:
    数据获取模块,用于获取社交数据和聚类要求,并确定所述社交数据对应的网络结构和节点信息;
    社交网络确认模块,用于根据所述网络结构和所述节点信息进行随机游走,得到社交网络图;
    聚类模块,用于利用预设聚类方法对社交网络图中的节点进行二分聚类,得到满足所述聚类要求的社交群体。
  8. 根据权利要求7所述的社交群体的划分系统,其特征在于,社交网络确认模块包括:
    游走单元,用于从所述网络结构中的每个节点出发,进行预设次数和预设步数的随机游走,并记录随机游走的游走路径;
    社交网络生成单元,用于利用Apriori算法选取所述游走路径中的频繁项集,得到所述社交网络图。
  9. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1-6任一项所述的社交群体的划分方法的步骤。
  10. 一种电子设备,其特征在于,包括存储器和处理器,所述存储器中存有计算机程序,所述处理器调用所述存储器中的计算机程序时实现如权利要求1-6任一项所述的社交群体的划分方法的步骤。
PCT/CN2022/074604 2021-02-26 2022-01-28 一种社交群体的划分方法、划分系统及相关装置 WO2022179384A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110218531.8 2021-02-26
CN202110218531.8A CN113011471A (zh) 2021-02-26 2021-02-26 一种社交群体的划分方法、划分系统及相关装置

Publications (1)

Publication Number Publication Date
WO2022179384A1 true WO2022179384A1 (zh) 2022-09-01

Family

ID=76386479

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/074604 WO2022179384A1 (zh) 2021-02-26 2022-01-28 一种社交群体的划分方法、划分系统及相关装置

Country Status (2)

Country Link
CN (1) CN113011471A (zh)
WO (1) WO2022179384A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115271987A (zh) * 2022-09-28 2022-11-01 南京拓界信息技术有限公司 基于手机数据的一种跨应用群体关系分析方法
CN115589605A (zh) * 2022-12-09 2023-01-10 深圳市永达电子信息股份有限公司 基于部落形成机制的通讯设备调试方法
CN116090525A (zh) * 2022-11-15 2023-05-09 广东工业大学 基于层次随机游走采样策略的嵌入向量表示方法及系统
CN117811992A (zh) * 2024-02-29 2024-04-02 山东海量信息技术研究院 一种网络不良信息传播抑制方法、装置、设备及存储介质
CN117833374A (zh) * 2023-12-26 2024-04-05 国网江苏省电力有限公司扬州供电分公司 基于随机游走算法的分布式灵活资源集群划分方法及系统

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113011471A (zh) * 2021-02-26 2021-06-22 山东英信计算机技术有限公司 一种社交群体的划分方法、划分系统及相关装置

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103793476A (zh) * 2014-01-08 2014-05-14 西安电子科技大学 基于网络社区的协同过滤推荐方法
CN107392782A (zh) * 2017-06-29 2017-11-24 上海斐讯数据通信技术有限公司 基于word2Vec的社团构建方法、装置及计算机处理设备
CN109934306A (zh) * 2019-04-04 2019-06-25 西南石油大学 基于随机游走的多标签属性值划分方法和装置
CN110010251A (zh) * 2019-02-01 2019-07-12 华南师范大学 一种中药社团信息生成方法、系统、装置和存储介质
US10771572B1 (en) * 2014-04-30 2020-09-08 Twitter, Inc. Method and system for implementing circle of trust in a social network
CN113011471A (zh) * 2021-02-26 2021-06-22 山东英信计算机技术有限公司 一种社交群体的划分方法、划分系统及相关装置

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7962346B2 (en) * 2003-09-24 2011-06-14 Fairnez Inc. Social choice determination systems and methods
CN102571431B (zh) * 2011-12-02 2014-06-18 北京航空航天大学 针对复杂网络的基于群思想改进的Fast-Newman聚类方法
CN106886524A (zh) * 2015-12-15 2017-06-23 天津科技大学 一种基于随机游走的社会网络社区划分方法
CN109242713A (zh) * 2018-09-07 2019-01-18 安徽大学 基于随机游走边界域处理的三支决策社团划分方法及装置
CN110263260A (zh) * 2019-05-23 2019-09-20 山西大学 一种面向社交网络的社区检测方法
CN111414486B (zh) * 2020-03-20 2022-11-11 厦门渊亭信息科技有限公司 一种基于路径排序算法的知识推理系统
CN112364295B (zh) * 2020-11-13 2024-04-19 中国科学院数学与系统科学研究院 网络节点重要性的确定方法、装置、电子设备及介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103793476A (zh) * 2014-01-08 2014-05-14 西安电子科技大学 基于网络社区的协同过滤推荐方法
US10771572B1 (en) * 2014-04-30 2020-09-08 Twitter, Inc. Method and system for implementing circle of trust in a social network
CN107392782A (zh) * 2017-06-29 2017-11-24 上海斐讯数据通信技术有限公司 基于word2Vec的社团构建方法、装置及计算机处理设备
CN110010251A (zh) * 2019-02-01 2019-07-12 华南师范大学 一种中药社团信息生成方法、系统、装置和存储介质
CN109934306A (zh) * 2019-04-04 2019-06-25 西南石油大学 基于随机游走的多标签属性值划分方法和装置
CN113011471A (zh) * 2021-02-26 2021-06-22 山东英信计算机技术有限公司 一种社交群体的划分方法、划分系统及相关装置

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115271987A (zh) * 2022-09-28 2022-11-01 南京拓界信息技术有限公司 基于手机数据的一种跨应用群体关系分析方法
CN115271987B (zh) * 2022-09-28 2023-01-10 南京拓界信息技术有限公司 一种基于手机数据的跨应用群体关系分析方法
CN116090525A (zh) * 2022-11-15 2023-05-09 广东工业大学 基于层次随机游走采样策略的嵌入向量表示方法及系统
CN116090525B (zh) * 2022-11-15 2024-02-13 广东工业大学 基于层次随机游走采样策略的嵌入向量表示方法及系统
CN115589605A (zh) * 2022-12-09 2023-01-10 深圳市永达电子信息股份有限公司 基于部落形成机制的通讯设备调试方法
CN117833374A (zh) * 2023-12-26 2024-04-05 国网江苏省电力有限公司扬州供电分公司 基于随机游走算法的分布式灵活资源集群划分方法及系统
CN117811992A (zh) * 2024-02-29 2024-04-02 山东海量信息技术研究院 一种网络不良信息传播抑制方法、装置、设备及存储介质
CN117811992B (zh) * 2024-02-29 2024-05-28 山东海量信息技术研究院 一种网络不良信息传播抑制方法、装置、设备及存储介质

Also Published As

Publication number Publication date
CN113011471A (zh) 2021-06-22

Similar Documents

Publication Publication Date Title
WO2022179384A1 (zh) 一种社交群体的划分方法、划分系统及相关装置
CN111506714B (zh) 基于知识图嵌入的问题回答
WO2022041979A1 (zh) 一种信息推荐模型的训练方法和相关装置
US10715570B1 (en) Generic event stream processing for machine learning
CN111602147A (zh) 基于非局部神经网络的机器学习模型
Häggström Data‐driven confounder selection via Markov and Bayesian networks
CN107341571B (zh) 一种基于量化社会影响力的社交网络用户行为预测方法
US20150242447A1 (en) Identifying effective crowdsource contributors and high quality contributions
JP2012058972A (ja) 評価予測装置、評価予測方法、及びプログラム
CN109508385B (zh) 一种基于贝叶斯网的网页新闻数据中的人物关系分析方法
Chen et al. A bootstrap method for goodness of fit and model selection with a single observed network
CN109492027B (zh) 一种基于弱可信数据的跨社群潜在人物关系分析方法
CN111522886A (zh) 一种信息推荐方法、终端及存储介质
US10885593B2 (en) Hybrid classification system
WO2020147259A1 (zh) 一种用户画像方法、装置、可读存储介质及终端设备
CN113330462A (zh) 使用软最近邻损失的神经网络训练
CN113409157B (zh) 一种跨社交网络用户对齐方法以及装置
CN108304568B (zh) 一种房地产公众预期大数据处理方法及系统
Li et al. Deep learning method for Chinese multisource point of interest matching
CN112052995A (zh) 基于融合情感倾向主题的社交网络用户影响力预测方法
WO2020151017A1 (zh) 一种可扩展的领域人机对话系统状态跟踪方法及设备
CN114461943B (zh) 基于深度学习的多源poi语义匹配方法、装置及其存储介质
JP6047190B2 (ja) リレーショナルモデルを決定するプログラムと装置
Hama et al. Multi-modal entity alignment using uncertainty quantification for modality importance
JP2015179512A (ja) リレーショナルモデルを決定する方法と装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22758749

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22758749

Country of ref document: EP

Kind code of ref document: A1