WO2023207013A1 - Graph embedding-based relational graph key personnel analysis method and system - Google Patents

Graph embedding-based relational graph key personnel analysis method and system Download PDF

Info

Publication number
WO2023207013A1
WO2023207013A1 PCT/CN2022/129009 CN2022129009W WO2023207013A1 WO 2023207013 A1 WO2023207013 A1 WO 2023207013A1 CN 2022129009 W CN2022129009 W CN 2022129009W WO 2023207013 A1 WO2023207013 A1 WO 2023207013A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
graph
embedding
key
relationship graph
Prior art date
Application number
PCT/CN2022/129009
Other languages
French (fr)
Chinese (zh)
Inventor
张暐
郭峰
陈瀚平
曹瑞雪
陈栩琪
Original Assignee
广州广电运通金融电子股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广州广电运通金融电子股份有限公司 filed Critical 广州广电运通金融电子股份有限公司
Publication of WO2023207013A1 publication Critical patent/WO2023207013A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • G06F18/2414Smoothing the distance, e.g. radial basis function networks [RBFN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate

Definitions

  • the present disclosure relates to the technical field of knowledge graph analysis, and specifically relates to a method and system for analyzing key personnel of a relationship graph based on graph embedding.
  • the personnel relationship graph is a knowledge graph constructed with the core of "personnel” entities and the social, kinship, and emotional relationships between people. According to the "six degrees of separation theory", in interpersonal communication, any two strangers can establish a connection through at most five friends. To some extent, everyone in the world is connected through personal networks. Because of the complexity of the real world, more and more types of characters and relationships are involved in the construction process of the relationship map. In several sub-graphs of a relationship graph, there is often only one character or a few characters who play a major role. Especially in public opinion analysis, administrative management, risk control and recommendation systems, the identification of key personnel plays a decisive role in the business. , has become an important technology for knowledge graph analysis and application.
  • Chinese patent CN 113032607 A discloses a key personnel analysis method. The method includes: "obtaining the member relationship map, obtaining member initialization weights, obtaining member interaction information, calculating the member full value based on the interaction information and the initial full value and updating, After the update, the sum of the two adjacent weight differences corresponding to each node person is less than the preset weight threshold, then the node person with the largest weight after the update is extracted as the target node person.”
  • This solution has the following shortcomings: 1) The node information, interactive information values, and node weight update methods in the relationship graph are all set by manual rules and are not learnable.
  • Chinese patent CN 112269922 A discloses a method for discovering key figures in community public opinion based on network representation learning.
  • the method includes "entering the social network relationship graph into the community structure and structure hole node discovery model to obtain the community division set and structure hole nodes;
  • the social network relationship diagram and community division set input the network embedding model containing social influence and community structure to obtain the social influence of the nodes in the community network diagram and the node network embedding representation vector; based on the structural hole node, social influence and network embedding representation Perform visual analysis of vectors to obtain key figures of public opinion.”
  • This solution still has the following shortcomings: 1) The direct modularity gain and indirect modularity gain of the relationship diagram, until the target matrix of the network embedding vector is obtained through eigenvalue decomposition, the vector in the whole process
  • the method is given by rules and still belongs to artificial selection of features rather than adaptive learning. This method relies heavily on the rule definition of direct modules and indirect modularity gains. If the rule definition cannot reflect the network structure, the method will be
  • the technical problem to be solved by this disclosure is the low accuracy of traditional key person mining methods.
  • the purpose of this disclosure is to provide a graph-embedded relationship graph key personnel analysis method and system to solve the problem that traditional key personnel mining methods are not scalable, or the weight update of node personnel only includes Local structural information and personnel information lead to low accuracy, or rules that rely on direct module and indirect modularity gains lead to low accuracy.
  • a graph embedding-based key personnel analysis method for relationship graphs including the following steps:
  • a clustering algorithm is used to analyze the key node seeds and identify key personnel nodes.
  • building a person relationship graph based on social media data includes:
  • mining character entities and relationships from news data that triggers the entire cycle of public opinion events and generating a character relationship graph includes:
  • Use crawler technology to filter news reports and social dynamic data published during the specified public opinion period through keywords on the network platform, and obtain the text and social dynamic content related to the public opinion event in the news reports during the public opinion period, as well as the interactive relationship between entities.
  • Use text structuring technology to generate the corresponding character relationship map.
  • the graph embedding algorithm is used to analyze each node in the character relationship graph to obtain the embedding vector of each node, including:
  • a random walk method is used to obtain neighboring nodes and a set of neighboring nodes is obtained; the skip-gram model is used to train the neighboring node set, and each neighboring node is used to predict the current node so that the probability of the current node being present is maximized.
  • Each neighboring node in the neighboring node set is trained to obtain the embedding vector of each node.
  • generating key node seeds based on pre-related indicators for nodes includes:
  • a clustering algorithm is used to analyze the embedding vectors of each node and identify key personnel nodes, including:
  • a clustering algorithm is used to classify each embedding vector to obtain several clustering categories
  • the clustering algorithm is used to classify each embedding vector to obtain several clustering categories, including:
  • Use the key node seeds as the initial clustering center calculate the distance from each embedding vector to each initial clustering center, obtain the initial clustering center with the shortest distance from each embedding vector, and classify each node as the shortest distance from it.
  • the clustering category to which the initial clustering center belongs is the clustering category to which the initial clustering center belongs.
  • a graph embedding-based key personnel analysis system for relationship graphs including:
  • a graph construction unit is used to construct a character relationship graph based on social media data
  • the graph analysis unit is used to analyze each node in the character relationship graph using a graph embedding algorithm to obtain the embedding vector of each node;
  • a key node seed generation unit used to generate key node seeds of the character relationship map based on pre-related indicators
  • An identification unit is configured to use a clustering algorithm to analyze the key node seeds according to the embedding vector of each node, and identify key personnel nodes.
  • An electronic device includes: at least one processor, and a memory communicatively connected to the at least one processor, wherein the memory stores instructions that can be executed by the at least one processor, and the instructions are At least one processor executes, so that the at least one processor can execute the graph embedding-based relationship graph key personnel analysis method.
  • a computer storage medium has a computer program stored thereon.
  • the computer program When the computer program is executed by a processor, the computer program implements the graph embedding-based relationship graph key personnel analysis method.
  • This disclosure builds a character relationship graph based on social media data, uses a graph embedding algorithm to analyze each node in the character relationship graph, and obtains the embedding vector of each node, making full use of the topological properties of the relationship graph. At the same time, it has the ability to Learning, network embedding representation and node vectorization are determined by random walk control and corresponding machine learning methods respectively. There is no need to manually set parameter values or specify calculation rules for degree gain, thereby eliminating the impact of unreasonable artificial rule settings on the results. Adverse effects; at the same time, this disclosure builds a character relationship graph based on social media data and only relies on the network topology.
  • the network can be quickly trained without additional knowledge injection; based on the pre-correlation
  • the indicator generates key node seeds of the person relationship graph; according to the embedding vector of each node, a clustering algorithm is used to analyze the key node seeds and identify key personnel nodes. In the process of identifying key personnel nodes, Calculating the entire graph takes into account the isomorphism and heterogeneity of nodes, making the key personnel analysis results more accurate.
  • a random walk method is used to obtain neighboring nodes, and a neighboring node set is obtained.
  • Each neighboring node is used to predict the current node, so that the probability of the current node appearing is maximized.
  • Each neighboring node in the neighboring node set is trained in sequence, and we obtain The embedding vector of each node is analyzed using a graph embedding method based on random walks. There is no need to manually set parameter values or specify calculation rules for degree gain, thus further improving the high accuracy of identifying key personnel nodes.
  • Figure 1 is a schematic flow chart of a key personnel analysis method for a relationship graph based on graph embedding provided by an embodiment of the present disclosure
  • Figure 2 is a schematic diagram of random walk sampling of neighbor nodes provided by an embodiment of the present disclosure
  • Figure 3 is a schematic diagram of a graph embedding-based relationship graph key personnel analysis system provided by an embodiment of the present disclosure
  • FIG. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.
  • Graph Embedding also called Network Embedding
  • Network Embedding is a process of mapping graph data (usually high-dimensional dense matrices) into low-dimensional dense vectors. It can well solve the problem of graph data being difficult to efficiently input into machine learning algorithms.
  • Adjacency Matrix is a matrix that represents the adjacent relationship between vertices.
  • the logical structure of the adjacency matrix is divided into two parts: V and E sets, where V is the vertex and E is the edge. Therefore, a one-dimensional array is used to store all vertex data in the graph; a two-dimensional array is used to store data on the relationships (edges or arcs) between vertices. This two-dimensional array is called an adjacency matrix.
  • Centrality is used to measure the importance of a node in the network.
  • Centrality can be defined for a single node or for a group of multiple nodes.
  • Eigenvector centrality combines the centrality of a node's neighbors as the centrality of the node.
  • the embedding vector of a node refers to the vector representation of the vertex (vertex) in the network obtained through the connection relationship in the network structure, and is used as a basic feature to be applied to tasks such as clustering and classification.
  • Figure 1 illustrates a graph embedding-based relationship graph key personnel analysis method of the present disclosure, which includes the following steps:
  • Step S1 Construct a character relationship graph based on social media data
  • the construction of a character relationship graph based on social media data includes:
  • crawler technology can be used to filter news reports and social dynamic data published during the specified public opinion period through keywords on the network platform, and obtain the text and social dynamic content related to the public opinion event in the news reports during the public opinion period.
  • text structuring technology is used to generate corresponding character relationship maps.
  • the character in the process of constructing the character relationship graph, can also be constructed through knowledge triple extraction technology, knowledge graph generation technology that dynamically evolves over time, development relationship mining technology, and transfer learning technology based on domain knowledge. Relationship map.
  • Step S2 Use a graph embedding algorithm to analyze each node in the character relationship graph to obtain the embedding vector of each node;
  • step S2 includes:
  • a random walk method is used to obtain neighboring nodes and obtain a set of neighboring nodes; specifically, please refer to Figure 2, which shows a random walk sampling of neighboring nodes provided by an embodiment of the present disclosure.
  • Schematic diagram where, given the current vertex v, the probability of going to vertex x is: Among them, ⁇ vx represents the unnormalized transition probability between vertices, which is the probability of a random walk passing through node t to node v and then walking to node x; Z is a normalized constant;
  • ⁇ vx ⁇ pq (t, x) ⁇ vx ;
  • ⁇ vx is the weight of the edge, p is the return parameter, q is the away parameter, d tx is the shortest path distance; the coefficient ⁇ pq (t, x) satisfies the following formula:
  • the random walk tends to visit nodes close to the previous node; if q ⁇ 1, the random walk tends to visit nodes far away from the previous node.
  • the present disclosure is based on the random walk vectorization method, which is different from the non-vectorization method of updating the value of interactive information and node weights of the Chinese patent CN 113032607 A, and is also different from the Chinese patent CN 112269922
  • the rule method of modularity gain of A is learnable and adaptive.
  • the skip-gram model is used to train the neighboring node set, and each neighboring node is used to predict the current node so that the probability of the current node appearing is maximized.
  • Each neighboring node in the neighboring node set is trained in sequence to obtain the embedding vector of each node.
  • a character relationship graph is constructed based on social media data, and a graph embedding algorithm is used to analyze each node in the character relationship graph. For example, by mining character entities from news data that triggers the entire cycle of public opinion events. and relationships, generate a character relationship graph; use the graph embedding machine learning method based on random walks to analyze the graph, obtain node vectors, directly vectorize the entire graph, and obtain more comprehensive feature information. By calculating the entire graph, comprehensive It eliminates the isomorphism and heterogeneity of nodes, making the key personnel analysis results more accurate.
  • derivative methods of word2vec such as CBOW
  • training optimization methods based on negative sampling or Huffman trees can also be used to help predict the current node.
  • the set of neighboring nodes of the current node is obtained, recorded as N s (u).
  • the skip-gram model is used to train each neighboring node, and the neighboring nodes are used to predict the current node, so that the probability of the current node appearing is maximized. The maximum probability is Then each neighboring node is trained in sequence to obtain the embedding vector.
  • Step S3 Generate key node seeds of the character relationship map according to pre-related indicators
  • step S3 includes:
  • Ax ⁇ x.
  • manual annotation pre-trained model annotation, remote unsupervised and other small sample annotation methods can be used. Annotation is performed first.
  • the centrality can also include degree centrality, betweenness centrality and closeness centrality. and other importance metrics.
  • Step S4 According to the embedding vector of each node, use a clustering algorithm to analyze the key node seeds and identify key personnel nodes.
  • step S4 specifically includes:
  • the key node seeds are used as initial clustering centers.
  • the initial clustering centers are ⁇ 1 , ⁇ 2 , ... ⁇ k respectively.
  • a clustering algorithm is used to classify each embedding vector to obtain several clustering categories; the clustering center of each clustering category c i is calculated, and the calculated clustering center is used as a key personnel node.
  • the present disclosure directly classifies the vectorization method of graph embedding without relying on strong assumptions. It is different from the community structure and social influence assumptions of the Chinese patent CN 112269922 A and has universal applicability.
  • the steps of using a clustering algorithm to classify each embedding vector include:
  • the clustering category c i where 1 ⁇ i ⁇ k, i and k are both natural numbers;
  • the calculation method used to calculate the cluster center is:
  • a machine learning method is used to analyze vectorized nodes and identify key nodes.
  • the algorithm used to identify key personnel nodes can use supervised and semi-supervised machine learning classification algorithms.
  • Figure 3 shows a graph embedding-based relationship graph key personnel analysis system of the present disclosure, including:
  • a graph construction unit is used to construct a character relationship graph based on social media data
  • the graph analysis unit is used to analyze each node in the character relationship graph using a graph embedding algorithm to obtain the embedding vector of each node;
  • a key node seed generation unit used to generate key node seeds of the character relationship map based on pre-related indicators
  • An identification unit is configured to use a clustering algorithm to analyze the key node seeds according to the embedding vector of each node, and identify key personnel nodes.
  • FIG. 4 is a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.
  • the schematic diagram shown in FIG. 4 can be used to describe the key to a relationship graph based on graph embedding for implementing the embodiment of the present disclosure.
  • the electronic device 100 includes one or more processors 102 and one or more storage devices 104. These components are connected through a bus system and/or other forms of connection mechanisms (not shown). out) interconnection. It should be noted that the components and structure of the electronic device 100 shown in FIG. 4 are only exemplary and not restrictive. According to needs, the electronic device may have some components shown in FIG. 4 or may have components not shown in FIG. 4 other components and structures.
  • the processor 102 may be a central processing unit (CPU) or other form of processing unit with data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 100 to perform desired functions.
  • CPU central processing unit
  • the processor 102 may be a central processing unit (CPU) or other form of processing unit with data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 100 to perform desired functions.
  • the storage device 104 may include one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory.
  • the volatile memory may include, for example, random access memory (RAM) and/or cache memory (cache).
  • the non-volatile memory may include, for example, read-only memory (ROM), hard disk, flash memory, etc.
  • One or more computer program instructions may be stored on the computer-readable storage medium and may be executed by the processor 102 to implement the functions (implemented by the processor) described below in the embodiments of the present disclosure and/ or other desired functions.
  • Various application programs and various data such as various data used and/or generated by the application programs, may also be stored in the computer-readable storage medium.
  • the present disclosure also provides a computer storage medium with a computer program stored thereon. If the method of the present disclosure is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in the computer storage medium. Based on this understanding, the present disclosure can implement all or part of the processes in the methods of the above embodiments, and can also be completed by instructing relevant hardware through a computer program.
  • the computer program can be stored in a computer storage medium, and the computer program can be stored in a computer storage medium. When executed by the processor, the steps of each of the above method embodiments can be implemented.
  • the computer program includes computer program code, which may be in the form of source code, object code, executable file or some intermediate form.
  • the computer storage medium may include: any entity or device capable of carrying the computer program code, recording media, U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), electrical carrier signals, telecommunications signals, and software distribution media, etc.
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • electrical carrier signals telecommunications signals
  • software distribution media etc.
  • the content contained in the computer storage medium can be appropriately increased or decreased according to the requirements of legislation and patent practice in the jurisdiction.
  • the computer storage medium does not include Electrical carrier signals and telecommunications signals.
  • the graph embedding-based key personnel analysis method of the relationship graph constructs a character relationship graph based on social media data, and uses a graph embedding algorithm to analyze each node in the character relationship graph to obtain the embedding vector of each node. It makes full use of the topological properties of the relationship graph, uses a clustering algorithm to analyze key node seeds according to the embedding vector of each node, identifies key personnel nodes, and calculates the entire graph, integrating the isomorphism and heterogeneity of nodes. This makes the key personnel analysis results more accurate and has strong industrial practicability.

Abstract

Disclosed in the present disclosure is a graph embedding-based relational graph key personnel analysis method and system. The method comprises the following steps: constructing a character relationship graph on the basis of social media data; analyzing each node in the character relationship graph by means of a graph embedding algorithm to obtain an embedding vector of each node; generating key node seeds of the character relationship graph according to a pre-correlation index; and according to the embedding vector of each node, analyzing the key node seeds by means of a clustering algorithm to identify a key personnel node. In the present disclosure, topological properties of the relationship graph are fully utilized, learnability is achieved, and there is no need to manually set a parameter value or specify a calculation rule for degree gain, so that an adverse effect of unreasonable setting of a manual rule is eliminated; meanwhile, the whole graph is calculated, and the isomorphism and heterogeneity of the nodes are integrated, so that the obtained analysis result of key personnel is more accurate.

Description

一种基于图嵌入的关系图谱关键人员分析方法及系统A graph embedding-based key personnel analysis method and system for relationship graphs
本公开要求于2022年04月26日提交中国专利局、申请号为202210451803.3、发明名称为“一种基于图嵌入的关系图谱关键人员分析方法及系统”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。This disclosure requires the priority of the Chinese patent application submitted to the China Patent Office on April 26, 2022, with the application number 202210451803.3 and the invention title "A graph embedding-based relationship graph key personnel analysis method and system", and its entire content incorporated by reference into this disclosure.
技术领域Technical field
本公开涉及知识图谱分析技术领域,具体涉及一种基于图嵌入的关系图谱关键人员分析方法及系统。The present disclosure relates to the technical field of knowledge graph analysis, and specifically relates to a method and system for analyzing key personnel of a relationship graph based on graph embedding.
背景技术Background technique
人员关系图谱是以“人员”实体和人员之间的社会、亲属、情感关系为核心构建的知识图谱。根据“六度分离理论”,在人际交往中,任意两个陌生人最多只要通过五个朋友就能建立联系。从某种程度上来说,世界上所有人都可以通过个人的关系网联系起来。因为现实世界的复杂性,关系图谱的构建过程中涉及到的人物和关系种类也越来越多。在一个关系图谱的若干子图中往往只有一个人物或者几个人物起到主要作用,尤其是在舆情分析、行政管理、风险控制和推荐系统中,对关键人员的挖掘,对业务发挥着决定性作用,已经成为了知识图谱分析和应用的重要技术。The personnel relationship graph is a knowledge graph constructed with the core of "personnel" entities and the social, kinship, and emotional relationships between people. According to the "six degrees of separation theory", in interpersonal communication, any two strangers can establish a connection through at most five friends. To some extent, everyone in the world is connected through personal networks. Because of the complexity of the real world, more and more types of characters and relationships are involved in the construction process of the relationship map. In several sub-graphs of a relationship graph, there is often only one character or a few characters who play a major role. Especially in public opinion analysis, administrative management, risk control and recommendation systems, the identification of key personnel plays a decisive role in the business. , has become an important technology for knowledge graph analysis and application.
在关系图谱上的关键人物挖掘,学习方法较少,还依赖于人工定性或者简单的静态数值计算。例如,中国专利CN 113032607 A公开了一种关键人员分析方法,方法包括:“获取成员关系图谱,获取成员初始化权值,获取成员交互信息,基于交互信息和初始全值计算成员全值并更新,更新后得到与所述各节点人员对应的相邻两次的权值差之和小于预设权值阈值,则提取更新后权值最大的节点人员作为目标节点人员”,该方案存在以下不足:1)关系图谱中的节点信息、交互信 息的值、节点权值的更新方法都是由人工规则设定,不具备可学习性。2)当增删节点和关系、进行跨领域业务迁移时,需要人工干预给出相应的业务规则,不具备可拓展性。3)节点人员的权值更新只包含了局部的结构信息和人员信息,未能利用到全局的拓扑结构,不具备高准确性。这些问题使得关系图谱的关键人员分析无法智能化,有着严重的应用限制。There are few learning methods for key person mining on relationship graphs, and they also rely on manual qualitative or simple static numerical calculations. For example, Chinese patent CN 113032607 A discloses a key personnel analysis method. The method includes: "obtaining the member relationship map, obtaining member initialization weights, obtaining member interaction information, calculating the member full value based on the interaction information and the initial full value and updating, After the update, the sum of the two adjacent weight differences corresponding to each node person is less than the preset weight threshold, then the node person with the largest weight after the update is extracted as the target node person." This solution has the following shortcomings: 1) The node information, interactive information values, and node weight update methods in the relationship graph are all set by manual rules and are not learnable. 2) When adding or deleting nodes and relationships, or performing cross-domain business migration, manual intervention is required to provide corresponding business rules, which is not scalable. 3) The weight update of node personnel only contains local structural information and personnel information, fails to take advantage of the global topological structure, and does not have high accuracy. These problems prevent the key personnel analysis of the relationship map from being intelligent and have serious application limitations.
例如,中国专利CN 112269922 A公开了一种基于网络表示学习的社区舆论关键人物发现方法,方法包括“将社交网络关系图输入社区结构与结构洞节点发现模型得到社区划分集和结构洞节点;将社交网络关系图、社区划分集输入蕴含社会影响力和社区结构的网络嵌入模型得到社区网络图中的节点的社会影响力和节点网络嵌入表示向量;基于结构洞节点、社会影响力和网络嵌入表示向量进行可视化分析,获取舆论关键人物。”该方案仍然存在以下不足:1)关系图的直接模块度增益和间接模块度增益,直到通过特征值分解得到网络嵌入向量的目标矩阵,整个过程中向量化的方法由规则给出,仍然属于人工选择特征,而非自适应学习。该方法极大的依赖于直接模块和间接模块度增益的规则定义,如果规则定义不能反应网络结构,那么方法会极大的受到影响,降低了关键人物发现的准确率。For example, Chinese patent CN 112269922 A discloses a method for discovering key figures in community public opinion based on network representation learning. The method includes "entering the social network relationship graph into the community structure and structure hole node discovery model to obtain the community division set and structure hole nodes; The social network relationship diagram and community division set input the network embedding model containing social influence and community structure to obtain the social influence of the nodes in the community network diagram and the node network embedding representation vector; based on the structural hole node, social influence and network embedding representation Perform visual analysis of vectors to obtain key figures of public opinion." This solution still has the following shortcomings: 1) The direct modularity gain and indirect modularity gain of the relationship diagram, until the target matrix of the network embedding vector is obtained through eigenvalue decomposition, the vector in the whole process The method is given by rules and still belongs to artificial selection of features rather than adaptive learning. This method relies heavily on the rule definition of direct modules and indirect modularity gains. If the rule definition cannot reflect the network structure, the method will be greatly affected, reducing the accuracy of key person discovery.
发明内容Contents of the invention
(一)要解决的技术问题(1) Technical problems to be solved
本公开要解决的技术问题是传统的关键人物挖掘方法准确率低的问题。The technical problem to be solved by this disclosure is the low accuracy of traditional key person mining methods.
(二)技术方案(2) Technical solutions
鉴于以上技术问题,本公开的目的在于提供一种基于图嵌入的关系图谱关键人员分析方法及系统,解决传统的关键人物挖掘方法不具备可拓展性,或采用节点人员的权值更新只包含了局部的结构信息和人员信息导致准确性低,或依赖于直接模块和间接模块度增益的规则 导致准确率低的问题。In view of the above technical problems, the purpose of this disclosure is to provide a graph-embedded relationship graph key personnel analysis method and system to solve the problem that traditional key personnel mining methods are not scalable, or the weight update of node personnel only includes Local structural information and personnel information lead to low accuracy, or rules that rely on direct module and indirect modularity gains lead to low accuracy.
本公开采用以下技术方案:This disclosure adopts the following technical solutions:
一种基于图嵌入的关系图谱关键人员分析方法,包括以下步骤:A graph embedding-based key personnel analysis method for relationship graphs, including the following steps:
基于社交媒体数据构建人物关系图谱;Build a relationship graph based on social media data;
采用图嵌入算法对所述人物关系图谱中的每个节点进行分析,得到各节点的嵌入向量;Using a graph embedding algorithm to analyze each node in the character relationship graph, obtain the embedding vector of each node;
根据预先相关指标生成所述人物关系图谱的关键节点种子;Generate key node seeds of the character relationship graph based on pre-related indicators;
根据所述各节点的嵌入向量,采用聚类算法对所述关键节点种子进行分析,识别出关键人员节点。According to the embedding vector of each node, a clustering algorithm is used to analyze the key node seeds and identify key personnel nodes.
可选的,所述基于社交媒体数据构建人物关系图谱,包括:Optionally, building a person relationship graph based on social media data includes:
从触发舆情事件整个周期的新闻数据中挖掘人物实体和关系,生成人物关系图谱。Mining character entities and relationships from news data that triggers the entire cycle of public opinion events, and generating a character relationship graph.
可选的,所述从触发舆情事件整个周期的新闻数据中挖掘人物实体和关系,生成人物关系图谱,包括:Optionally, mining character entities and relationships from news data that triggers the entire cycle of public opinion events and generating a character relationship graph includes:
使用爬虫技术在网络平台通过关键词过滤在指定舆情期间发表的新闻报道和社交动态数据,得到舆情期间所述新闻报道中与所述舆情事件相关的文本和社交动态内容,以及实体间互动关系,采用文本结构化技术生成相应的人物关系图谱。Use crawler technology to filter news reports and social dynamic data published during the specified public opinion period through keywords on the network platform, and obtain the text and social dynamic content related to the public opinion event in the news reports during the public opinion period, as well as the interactive relationship between entities. Use text structuring technology to generate the corresponding character relationship map.
可选的,所述采用图嵌入算法对所述人物关系图谱中的每个节点进行分析,得到各节点的嵌入向量,包括:Optionally, the graph embedding algorithm is used to analyze each node in the character relationship graph to obtain the embedding vector of each node, including:
对于每个节点,采取随机游走的方法获取近邻节点,得到邻近节点集合;使用skip-gram模型训练邻近节点集合,用各邻近节点预测当前节点,使得当前节点现的概率最大,依次对所述邻近节点集合中的各邻近节点进行训练,得到各节点的嵌入向量。For each node, a random walk method is used to obtain neighboring nodes and a set of neighboring nodes is obtained; the skip-gram model is used to train the neighboring node set, and each neighboring node is used to predict the current node so that the probability of the current node being present is maximized. Each neighboring node in the neighboring node set is trained to obtain the embedding vector of each node.
可选的,所述根据预先相关指标对节点,生成关键节点种子,包括:Optionally, generating key node seeds based on pre-related indicators for nodes includes:
根据预设相关指标生成图邻接矩阵,对所述邻接矩阵进行特征分 解,得到特征值和特征向量;Generate a graph adjacency matrix according to the preset relevant indicators, perform eigendecomposition on the adjacency matrix, and obtain eigenvalues and eigenvectors;
获取各节点特征值中最大特征值对应的特征向量,其中,第i个节点的中心性为最大特征值对应的特征向量中的第i个元素,根据各节点的中心性生成关键节点种子。Obtain the eigenvector corresponding to the largest eigenvalue among the eigenvalues of each node, where the centrality of the i-th node is the i-th element in the eigenvector corresponding to the largest eigenvalue, and generate key node seeds based on the centrality of each node.
可选的,所述根据所述关键节点种子,采用聚类算法对所述各节点的嵌入向量进行分析,识别出关键人员节点,包括:Optionally, according to the key node seeds, a clustering algorithm is used to analyze the embedding vectors of each node and identify key personnel nodes, including:
根据所述关键节点种子,采用聚类算法对所述各嵌入向量进行归类,得到若干聚类类别;According to the key node seeds, a clustering algorithm is used to classify each embedding vector to obtain several clustering categories;
计算每个聚类类别c i的聚类中心,将计算得到的聚类中心作为更新后的聚类中心,以所述更新厚的聚类中心作为关键人员节点。 Calculate the clustering center of each clustering category c i , use the calculated clustering center as the updated clustering center, and use the updated thicker clustering center as the key personnel node.
可选的,所述采用聚类算法对所述各嵌入向量进行归类,得到若干聚类类别,包括:Optionally, the clustering algorithm is used to classify each embedding vector to obtain several clustering categories, including:
将所述关键节点种子作为初始聚类中心,计算各嵌入向量到各个初始聚类中心的距离,并获取距离各嵌入向量距离最短的初始聚类中心,将每一节点归类为距离其距离最短的初始聚类中心所属的聚类类别。Use the key node seeds as the initial clustering center, calculate the distance from each embedding vector to each initial clustering center, obtain the initial clustering center with the shortest distance from each embedding vector, and classify each node as the shortest distance from it. The clustering category to which the initial clustering center belongs.
一种基于图嵌入的关系图谱关键人员分析系统,包括:A graph embedding-based key personnel analysis system for relationship graphs, including:
图谱构建单元,用于基于社交媒体数据构建人物关系图谱;A graph construction unit is used to construct a character relationship graph based on social media data;
图谱分析单元,用于采用图嵌入算法对所述人物关系图谱中的每个节点进行分析,得到各节点的嵌入向量;The graph analysis unit is used to analyze each node in the character relationship graph using a graph embedding algorithm to obtain the embedding vector of each node;
关键节点种子生成单元,用于根据预先相关指标生成所述人物关系图谱的关键节点种子;A key node seed generation unit, used to generate key node seeds of the character relationship map based on pre-related indicators;
识别单元,用于根据所述各节点的嵌入向量,采用聚类算法对所述关键节点种子进行分析,识别出关键人员节点。An identification unit is configured to use a clustering algorithm to analyze the key node seeds according to the embedding vector of each node, and identify key personnel nodes.
一种电子设备,包括:至少一个处理器,以及与所述至少一个处理器通信连接的存储器,其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行所述的基于图嵌入的关系图谱关键人员分析 方法。An electronic device includes: at least one processor, and a memory communicatively connected to the at least one processor, wherein the memory stores instructions that can be executed by the at least one processor, and the instructions are At least one processor executes, so that the at least one processor can execute the graph embedding-based relationship graph key personnel analysis method.
一种计算机存储介质,其上存储有计算机程序,所述计算机程序在被处理器执行时,实现所述的基于图嵌入的关系图谱关键人员分析方法。A computer storage medium has a computer program stored thereon. When the computer program is executed by a processor, the computer program implements the graph embedding-based relationship graph key personnel analysis method.
(三)有益效果(3) Beneficial effects
相比现有技术,本公开的有益效果在于:Compared with the existing technology, the beneficial effects of the present disclosure are:
本公开通过基于社交媒体数据构建人物关系图谱,采用图嵌入算法对所述人物关系图谱中的每个节点进行分析,得到各节点的嵌入向量,充分利用了关系图谱的拓扑性质,同时,具备可学习性,网络嵌入表示和节点向量化分别由随机游走控制和相应的机器学习方法决定,不需要人工设置参数值或者规定度增益的计算规则,从而消除了人为规则设定不合理对结果的不利影响;同时,本公开基于社交媒体数据构建人物关系图谱,只依赖网络拓扑结构,当增删节点和关系、进行跨领域业务迁移时,可以快速训练网络,不需要额外的知识注入;根据预先相关指标生成所述人物关系图谱的关键节点种子;根据所述各节点的嵌入向量,采用聚类算法对所述关键节点种子进行分析,识别出关键人员节点,在识别出关键人员节点的过程中,对全图进行计算,综合了节点的同构性和异质性,使得出的关键人员分析结果会更加准确。This disclosure builds a character relationship graph based on social media data, uses a graph embedding algorithm to analyze each node in the character relationship graph, and obtains the embedding vector of each node, making full use of the topological properties of the relationship graph. At the same time, it has the ability to Learning, network embedding representation and node vectorization are determined by random walk control and corresponding machine learning methods respectively. There is no need to manually set parameter values or specify calculation rules for degree gain, thereby eliminating the impact of unreasonable artificial rule settings on the results. Adverse effects; at the same time, this disclosure builds a character relationship graph based on social media data and only relies on the network topology. When adding and deleting nodes and relationships, and performing cross-domain business migration, the network can be quickly trained without additional knowledge injection; based on the pre-correlation The indicator generates key node seeds of the person relationship graph; according to the embedding vector of each node, a clustering algorithm is used to analyze the key node seeds and identify key personnel nodes. In the process of identifying key personnel nodes, Calculating the entire graph takes into account the isomorphism and heterogeneity of nodes, making the key personnel analysis results more accurate.
进一步的,采取随机游走的方法获取近邻节点,得到邻近节点集合,用各邻近节点预测当前节点,使得当前节点现的概率最大,依次对所述邻近节点集合中的各邻近节点进行训练,得到各节点的嵌入向量,采用基于随机游走的图嵌入方法进行分析,不需要人工设置参数值或者规定度增益的计算规则,从而进一步提高了识别关键人员节点的高准确率。Further, a random walk method is used to obtain neighboring nodes, and a neighboring node set is obtained. Each neighboring node is used to predict the current node, so that the probability of the current node appearing is maximized. Each neighboring node in the neighboring node set is trained in sequence, and we obtain The embedding vector of each node is analyzed using a graph embedding method based on random walks. There is no need to manually set parameter values or specify calculation rules for degree gain, thus further improving the high accuracy of identifying key personnel nodes.
附图说明Description of the drawings
图1为本公开一实施例提供的一种基于图嵌入的关系图谱关键人员分析方法的流程示意图;Figure 1 is a schematic flow chart of a key personnel analysis method for a relationship graph based on graph embedding provided by an embodiment of the present disclosure;
图2为本公开一实施例提供的一种近邻节点的随机游走采样示意图;Figure 2 is a schematic diagram of random walk sampling of neighbor nodes provided by an embodiment of the present disclosure;
图3为本公开一实施例提供的一种基于图嵌入的关系图谱关键人员分析系统的示意图;Figure 3 is a schematic diagram of a graph embedding-based relationship graph key personnel analysis system provided by an embodiment of the present disclosure;
图4为本公开一实施例提供的一种电子设备的结构示意图。FIG. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.
具体实施方式Detailed ways
下面,结合附图以及具体实施方式,对本公开做进一步描述,需要说明的是,在不相冲突的前提下,以下描述的各实施例之间或各技术特征之间可以任意组合形成新的实施例:Below, the present disclosure will be further described with reference to the accompanying drawings and specific implementation modes. It should be noted that, on the premise that there is no conflict, the various embodiments or technical features described below can be arbitrarily combined to form new embodiments. :
实施例一:Example 1:
下面先对本公开中专业术语进行解释说明:The following is an explanation of the professional terms used in this disclosure:
图嵌入(Graph Embedding,也叫Network Embedding)是一种将图数据(通常为高维稠密的矩阵)映射为低微稠密向量的过程,能够很好地解决图数据难以高效输入机器学习算法的问题。Graph Embedding (also called Network Embedding) is a process of mapping graph data (usually high-dimensional dense matrices) into low-dimensional dense vectors. It can well solve the problem of graph data being difficult to efficiently input into machine learning algorithms.
邻接矩阵(Adjacency Matrix)是表示顶点之间相邻关系的矩阵,邻接矩阵的逻辑结构分为两部分:V和E集合,其中,V是顶点,E是边。因此,用一个一维数组存放图中所有顶点数据;用一个二维数组存放顶点间关系(边或弧)的数据,这个二维数组称为邻接矩阵。Adjacency Matrix is a matrix that represents the adjacent relationship between vertices. The logical structure of the adjacency matrix is divided into two parts: V and E sets, where V is the vertex and E is the edge. Therefore, a one-dimensional array is used to store all vertex data in the graph; a two-dimensional array is used to store data on the relationships (edges or arcs) between vertices. This two-dimensional array is called an adjacency matrix.
中心性(centrality),是用来度量结点在网络中的重要性。对于单个结点或由多个结点组成的群体都可以定义中心性。特征向量中心性是结合结点邻居的中心性作为该结点的中心性。Centrality is used to measure the importance of a node in the network. Centrality can be defined for a single node or for a group of multiple nodes. Eigenvector centrality combines the centrality of a node's neighbors as the centrality of the node.
节点的嵌入向量是指通过网络结构中的连接关系,得到网络中顶点(vertex)的向量表示,作为基本特征应用到聚类、分类等任务上。The embedding vector of a node refers to the vector representation of the vertex (vertex) in the network obtained through the connection relationship in the network structure, and is used as a basic feature to be applied to tasks such as clustering and classification.
请参照图1所示,图1使出了本公开的一种基于图嵌入的关系图谱关键人员分析方法,包括以下步骤:Please refer to Figure 1, which illustrates a graph embedding-based relationship graph key personnel analysis method of the present disclosure, which includes the following steps:
步骤S1:基于社交媒体数据构建人物关系图谱;Step S1: Construct a character relationship graph based on social media data;
具体的,所述基于社交媒体数据构建人物关系图谱,包括:Specifically, the construction of a character relationship graph based on social media data includes:
从触发舆情事件整个周期的新闻数据中挖掘人物实体和关系,生 成人物关系图谱。Mining character entities and relationships from news data that triggers the entire cycle of public opinion events, and generating a character relationship graph.
在具体实施时,可使用爬虫技术在网络平台通过关键词过滤在指定舆情期间发表的新闻报道和社交动态数据,得到舆情期间所述新闻报道中与所述舆情事件相关的文本和社交动态内容,以及实体间互动关系,采用文本结构化技术生成相应的人物关系图谱。During specific implementation, crawler technology can be used to filter news reports and social dynamic data published during the specified public opinion period through keywords on the network platform, and obtain the text and social dynamic content related to the public opinion event in the news reports during the public opinion period. As well as the interactive relationships between entities, text structuring technology is used to generate corresponding character relationship maps.
在具体实施中,在构建人物关系图谱的过程中,还可通过知识三元组抽取技术、动态随时间演化的知识图谱生成技术、开发关系挖掘技术以及基于领域知识的迁移学习技术等实现构建人物关系图谱。In the specific implementation, in the process of constructing the character relationship graph, the character can also be constructed through knowledge triple extraction technology, knowledge graph generation technology that dynamically evolves over time, development relationship mining technology, and transfer learning technology based on domain knowledge. Relationship map.
步骤S2:采用图嵌入算法对所述人物关系图谱中的每个节点进行分析,得到各节点的嵌入向量;Step S2: Use a graph embedding algorithm to analyze each node in the character relationship graph to obtain the embedding vector of each node;
可选的,所述步骤S2,包括:Optionally, step S2 includes:
对于每个节点,采取随机游走的方法获取近邻节点,得到邻近节点集合;具体的,请参照图2所示,图2示出了本公开实施例提供的一种近邻节点的随机游走采样示意图;其中,给定当前顶点v,去到顶点x的概率为:
Figure PCTCN2022129009-appb-000001
其中,π vx表示顶点之间的未归一化转移概率,即为随机游走经过节点t到达节点v,游走到节点x的概率;Z是归一化常数;
For each node, a random walk method is used to obtain neighboring nodes and obtain a set of neighboring nodes; specifically, please refer to Figure 2, which shows a random walk sampling of neighboring nodes provided by an embodiment of the present disclosure. Schematic diagram; where, given the current vertex v, the probability of going to vertex x is:
Figure PCTCN2022129009-appb-000001
Among them, π vx represents the unnormalized transition probability between vertices, which is the probability of a random walk passing through node t to node v and then walking to node x; Z is a normalized constant;
具体的,为控制随机游走的方向,来表达我们的偏好,假设当前的随机游走经过节点t到达了节点v,此时的游走到x的概率π vx满足以下公式: Specifically, in order to control the direction of the random walk and express our preferences, assuming that the current random walk passes through node t and reaches node v, the probability π vx of walking to x at this time satisfies the following formula:
π vx=α pq(t,x)·ω vx;ω vx是边的权重,p为返回参数,q为远离参数,d tx是最短路径距离;系数α pq(t,x)满足以下公式: π vx = α pq (t, x)·ω vx ; ω vx is the weight of the edge, p is the return parameter, q is the away parameter, d tx is the shortest path distance; the coefficient α pq (t, x) satisfies the following formula:
Figure PCTCN2022129009-appb-000002
Figure PCTCN2022129009-appb-000002
其中,若q>1,则随机游走倾向于访问于前一节点接近的节点,若q<1,则随机游走倾向于访问远离前一节点的节点。Among them, if q>1, the random walk tends to visit nodes close to the previous node; if q<1, the random walk tends to visit nodes far away from the previous node.
在上述实现过程中,本公开基于随机游走的向量化方法,不同于中国专利的CN 113032607 A交互信息的值、节点权值的更新这种非向量化方法,也不同于中国专利的CN 112269922 A的模块度增益的规则方法,具有可学习性和自适应性。In the above implementation process, the present disclosure is based on the random walk vectorization method, which is different from the non-vectorization method of updating the value of interactive information and node weights of the Chinese patent CN 113032607 A, and is also different from the Chinese patent CN 112269922 The rule method of modularity gain of A is learnable and adaptive.
然后,使用skip-gram模型训练邻近节点集合,用各邻近节点预测当前节点,使得当前节点现的概率最大,依次对所述邻近节点集合中的各邻近节点进行训练,得到各节点的嵌入向量。Then, the skip-gram model is used to train the neighboring node set, and each neighboring node is used to predict the current node so that the probability of the current node appearing is maximized. Each neighboring node in the neighboring node set is trained in sequence to obtain the embedding vector of each node.
在上述实现过程中,通过基于社交媒体数据构建人物关系图谱,采用图嵌入算法对所述人物关系图谱中的每个节点进行分析,例如,通过从触发舆情事件整个周期的新闻数据中挖掘人物实体和关系,生成人物关系图谱;使用基于随机游走的图嵌入机器学习方法对图谱进行分析,得到节点向量,对于整个图直接进行向量化,特征信息获取更加全面,通过对全图进行计算,综合了节点的同构性和异质性,使得出的关键人员分析结果会更加准确。In the above implementation process, a character relationship graph is constructed based on social media data, and a graph embedding algorithm is used to analyze each node in the character relationship graph. For example, by mining character entities from news data that triggers the entire cycle of public opinion events. and relationships, generate a character relationship graph; use the graph embedding machine learning method based on random walks to analyze the graph, obtain node vectors, directly vectorize the entire graph, and obtain more comprehensive feature information. By calculating the entire graph, comprehensive It eliminates the isomorphism and heterogeneity of nodes, making the key personnel analysis results more accurate.
在具体实施时,在使用邻近节点预测当前节点的技术过程中,还可采用CBOW等word2vec的衍生方法,以及基于负采样或者哈夫曼树的训练优化方法来帮助实现预测当前节点。In specific implementation, in the technical process of using neighboring nodes to predict the current node, derivative methods of word2vec such as CBOW, and training optimization methods based on negative sampling or Huffman trees can also be used to help predict the current node.
具体的,得到当前节点的邻近节点集合,记为N s(u),先使用skip-gram模型训练各邻近节点,用邻近节点预测当前节点,使得当前节点出现的概率最大,最大概率为
Figure PCTCN2022129009-appb-000003
然后依次训练各邻近节点得到嵌入向量。
Specifically, the set of neighboring nodes of the current node is obtained, recorded as N s (u). First, the skip-gram model is used to train each neighboring node, and the neighboring nodes are used to predict the current node, so that the probability of the current node appearing is maximized. The maximum probability is
Figure PCTCN2022129009-appb-000003
Then each neighboring node is trained in sequence to obtain the embedding vector.
步骤S3:根据预先相关指标生成所述人物关系图谱的关键节点种 子;Step S3: Generate key node seeds of the character relationship map according to pre-related indicators;
可选的,所述步骤S3,包括:Optionally, step S3 includes:
根据预设相关指标生成图邻接矩阵,对所述邻接矩阵进行特征分解,得到特征值和特征向量;Generate a graph adjacency matrix according to the preset relevant indicators, perform eigendecomposition on the adjacency matrix, and obtain eigenvalues and eigenvectors;
获取各节点特征值中最大特征值对应的特征向量,其中,第i个节点的中心性为最大特征值对应的特征向量中的第i个元素,根据各节点的中心性生成关键节点种子。Obtain the eigenvector corresponding to the largest eigenvalue among the eigenvalues of each node, where the centrality of the i-th node is the i-th element in the eigenvector corresponding to the largest eigenvalue, and generate key node seeds based on the centrality of each node.
具体的,可根据网络密度、可达性、聚类系数和中心性测度等相关指标生成图邻接矩阵A,对邻接矩阵进行特征分解,即Ax=λx,得到特征值和特征向量后,最大特征值对应的特征向量中,第i个节点的中心性等于特征向量中的第i个元素。Specifically, the graph adjacency matrix A can be generated based on relevant indicators such as network density, reachability, clustering coefficient and centrality measure, and the adjacency matrix can be characterized by eigendecomposition, that is, Ax = λx. After obtaining the eigenvalues and eigenvectors, the maximum feature In the feature vector corresponding to the value, the centrality of the i-th node is equal to the i-th element in the feature vector.
在具体实施时,具体还可采用人工标注、预训练模型标注,远程无监督等小样本标注方法,先进行标注,所述中心性具体还可包括度中心性,介数中心性,紧密中心性等重要性度量指标。In specific implementation, manual annotation, pre-trained model annotation, remote unsupervised and other small sample annotation methods can be used. Annotation is performed first. The centrality can also include degree centrality, betweenness centrality and closeness centrality. and other importance metrics.
步骤S4:根据所述各节点的嵌入向量,采用聚类算法对所述关键节点种子进行分析,识别出关键人员节点。Step S4: According to the embedding vector of each node, use a clustering algorithm to analyze the key node seeds and identify key personnel nodes.
可选的,所述步骤S4,具体包括:Optionally, step S4 specifically includes:
将所述关键节点种子作为初始聚类中心,所述初始聚类中心分别α 1、α 2、......α k,所述初始聚类中心组成初始聚类中心集α=α 1,α 2,......α kThe key node seeds are used as initial clustering centers. The initial clustering centers are α 1 , α 2 , ... α k respectively. The initial clustering centers constitute the initial clustering center set α = α 12 ,...α k ;
采用聚类算法对所述各嵌入向量进行归类,得到若干聚类类别;计算每个聚类类别c i的聚类中心,将计算得到的聚类中心作为关键人员节点。 A clustering algorithm is used to classify each embedding vector to obtain several clustering categories; the clustering center of each clustering category c i is calculated, and the calculated clustering center is used as a key personnel node.
在上述实现过程中,本公开对图嵌入的向量化方法直接进行了分类,不依赖于强假设,不同于中国专利的CN 112269922 A的社区结构和社会影响力假设,具有普适性。In the above implementation process, the present disclosure directly classifies the vectorization method of graph embedding without relying on strong assumptions. It is different from the community structure and social influence assumptions of the Chinese patent CN 112269922 A and has universal applicability.
采用聚类算法对所述各嵌入向量进行归类的步骤包括:The steps of using a clustering algorithm to classify each embedding vector include:
计算各嵌入向量x i到各个初始聚类中心的距离,并获取距离各嵌入向量距离最短的初始聚类中心α i,将每一节点归类为距离其距离最短的初始聚类中心α i所属的聚类类别c i,其中,1≤i≤k,i和k均为自然数; Calculate the distance between each embedded vector x i and each initial cluster center, and obtain the initial cluster center α i with the shortest distance from each embedded vector, and classify each node as belonging to the initial cluster center α i with the shortest distance from it. The clustering category c i , where 1≤i≤k, i and k are both natural numbers;
具体的,计算采用的聚类中心的计算方法为:Specifically, the calculation method used to calculate the cluster center is:
Figure PCTCN2022129009-appb-000004
Figure PCTCN2022129009-appb-000004
其中,|c i|表示聚类类别中的节点个数,将聚类中心的算法重复迭代,直到达到某个中止条件,其中,关键节点种子节点所在的类作为关键节点类。 Among them, |c i | represents the number of nodes in the clustering category. The algorithm of the clustering center is iterated repeatedly until a certain termination condition is reached. Among them, the class where the key node seed node is located is regarded as the key node class.
在本实施例中,使用机器学习方法对向量化节点进行分析,识别出关键节点,具体的,识别出关键人员节点采用的算法可采用有监督和半监督的机器学习分类算法。In this embodiment, a machine learning method is used to analyze vectorized nodes and identify key nodes. Specifically, the algorithm used to identify key personnel nodes can use supervised and semi-supervised machine learning classification algorithms.
请参照图3所示,图3示出了本公开的一种基于图嵌入的关系图谱关键人员分析系统,包括:Please refer to Figure 3, which shows a graph embedding-based relationship graph key personnel analysis system of the present disclosure, including:
图谱构建单元,用于基于社交媒体数据构建人物关系图谱;A graph construction unit is used to construct a character relationship graph based on social media data;
图谱分析单元,用于采用图嵌入算法对所述人物关系图谱中的每个节点进行分析,得到各节点的嵌入向量;The graph analysis unit is used to analyze each node in the character relationship graph using a graph embedding algorithm to obtain the embedding vector of each node;
关键节点种子生成单元,用于根据预先相关指标生成所述人物关系图谱的关键节点种子;A key node seed generation unit, used to generate key node seeds of the character relationship map based on pre-related indicators;
识别单元,用于根据所述各节点的嵌入向量,采用聚类算法对所述关键节点种子进行分析,识别出关键人员节点。An identification unit is configured to use a clustering algorithm to analyze the key node seeds according to the embedding vector of each node, and identify key personnel nodes.
实施例三:Embodiment three:
图4为本公开实施例提供的一种电子设备的结构示意图,在本公开中可以通过图4所示的示意图来描述用于实现本公开实施例的本公开一种基于图嵌入的关系图谱关键人员分析方法的电子设备100。FIG. 4 is a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure. In the present disclosure, the schematic diagram shown in FIG. 4 can be used to describe the key to a relationship graph based on graph embedding for implementing the embodiment of the present disclosure. Electronic device 100 for people analysis method.
如图所4示的一种电子设备的结构示意图,电子设备100包括一个或多个处理器102、一个或多个存储装置104,这些组件通过总线系统和/或其它形式的连接机构(未示出)互连。应当注意,图4所示的电 子设备100的组件和结构只是示例性的,而非限制性的,根据需要,所述电子设备可以具有图4示出的部分组件,也可以具有图4未示出的其他组件和结构。As shown in Figure 4, a schematic structural diagram of an electronic device, the electronic device 100 includes one or more processors 102 and one or more storage devices 104. These components are connected through a bus system and/or other forms of connection mechanisms (not shown). out) interconnection. It should be noted that the components and structure of the electronic device 100 shown in FIG. 4 are only exemplary and not restrictive. According to needs, the electronic device may have some components shown in FIG. 4 or may have components not shown in FIG. 4 other components and structures.
所述处理器102可以是中央处理单元(CPU)或者具有数据处理能力和/或指令执行能力的其它形式的处理单元,并且可以控制所述电子设备100中的其它组件以执行期望的功能。The processor 102 may be a central processing unit (CPU) or other form of processing unit with data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 100 to perform desired functions.
所述存储装置104可以包括一个或多个计算机程序产品,所述计算机程序产品可以包括各种形式的计算机可读存储介质,例如易失性存储器和/或非易失性存储器。所述易失性存储器例如可以包括随机存取存储器(RAM)和/或高速缓冲存储器(cache)等。所述非易失性存储器例如可以包括只读存储器(ROM)、硬盘、闪存等。在所述计算机可读存储介质上可以存储一个或多个计算机程序指令,处理器102可以运行所述程序指令,以实现下文所述的本公开实施例中(由处理器实现)的功能以及/或者其它期望的功能。在所述计算机可读存储介质中还可以存储各种应用程序和各种数据,例如所述应用程序使用和/或产生的各种数据等。The storage device 104 may include one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, random access memory (RAM) and/or cache memory (cache). The non-volatile memory may include, for example, read-only memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer-readable storage medium and may be executed by the processor 102 to implement the functions (implemented by the processor) described below in the embodiments of the present disclosure and/ or other desired functions. Various application programs and various data, such as various data used and/or generated by the application programs, may also be stored in the computer-readable storage medium.
本公开还提供一种计算机存储介质,其上存储有计算机程序,本公开的方法如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在该计算机存储介质中。基于这样的理解,本公开实现上述实施例方法中的全部或部分流程,也可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一计算机存储介质中,该计算机程序在被处理器执行时,可实现上述各个方法实施例的步骤。其中,所述计算机程序包括计算机程序代码,所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机存储介质可以包括:能够携带所述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、电载波信号、电信信号以及软件 分发介质等。需要说明的是,所述计算机存储介质包含的内容可以根据司法管辖区内立法和专利实践的要求进行适当的增减,例如在某些司法管辖区,根据立法和专利实践,计算机存储介质不包括电载波信号和电信信号。The present disclosure also provides a computer storage medium with a computer program stored thereon. If the method of the present disclosure is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in the computer storage medium. Based on this understanding, the present disclosure can implement all or part of the processes in the methods of the above embodiments, and can also be completed by instructing relevant hardware through a computer program. The computer program can be stored in a computer storage medium, and the computer program can be stored in a computer storage medium. When executed by the processor, the steps of each of the above method embodiments can be implemented. Wherein, the computer program includes computer program code, which may be in the form of source code, object code, executable file or some intermediate form. The computer storage medium may include: any entity or device capable of carrying the computer program code, recording media, U disk, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), electrical carrier signals, telecommunications signals, and software distribution media, etc. It should be noted that the content contained in the computer storage medium can be appropriately increased or decreased according to the requirements of legislation and patent practice in the jurisdiction. For example, in some jurisdictions, according to legislation and patent practice, the computer storage medium does not include Electrical carrier signals and telecommunications signals.
对本领域的技术人员来说,可根据以上描述的技术方案以及构思,做出其它各种相应的改变以及形变,而所有的这些改变以及形变都应该属于本公开权利要求的保护范围之内。For those skilled in the art, various other corresponding changes and deformations can be made based on the technical solutions and concepts described above, and all of these changes and deformations should fall within the protection scope of the claims of the present disclosure.
工业实用性Industrial applicability
本公开提供的基于图嵌入的关系图谱关键人员分析方法,通过基于社交媒体数据构建人物关系图谱,采用图嵌入算法对所述人物关系图谱中的每个节点进行分析,得到各节点的嵌入向量,充分利用了关系图谱的拓扑性质,根据各节点的嵌入向量,采用聚类算法对关键节点种子进行分析,识别出关键人员节点,并对全图进行计算,综合了节点的同构性和异质性,使得出的关键人员分析结果更加准确,具有很强的工业实用性。The graph embedding-based key personnel analysis method of the relationship graph provided by this disclosure constructs a character relationship graph based on social media data, and uses a graph embedding algorithm to analyze each node in the character relationship graph to obtain the embedding vector of each node. It makes full use of the topological properties of the relationship graph, uses a clustering algorithm to analyze key node seeds according to the embedding vector of each node, identifies key personnel nodes, and calculates the entire graph, integrating the isomorphism and heterogeneity of nodes. This makes the key personnel analysis results more accurate and has strong industrial practicability.

Claims (10)

  1. 一种基于图嵌入的关系图谱关键人员分析方法,其特征在于,包括以下步骤:A graph embedding-based key personnel analysis method for relationship graphs, which is characterized by including the following steps:
    基于社交媒体数据构建人物关系图谱;Build a relationship graph based on social media data;
    采用图嵌入算法对所述人物关系图谱中的每个节点进行分析,得到各节点的嵌入向量;Using a graph embedding algorithm to analyze each node in the character relationship graph, obtain the embedding vector of each node;
    根据预先相关指标生成所述人物关系图谱的关键节点种子;Generate key node seeds of the character relationship graph based on pre-related indicators;
    根据所述各节点的嵌入向量,采用聚类算法对所述关键节点种子进行分析,识别出关键人员节点。According to the embedding vector of each node, a clustering algorithm is used to analyze the key node seeds and identify key personnel nodes.
  2. 根据权利要求1所述的基于图嵌入的关系图谱关键人员分析方法,其特征在于,所述基于社交媒体数据构建人物关系图谱,包括:The key personnel analysis method of the relationship graph based on graph embedding according to claim 1, characterized in that the construction of the person relationship graph based on social media data includes:
    从触发舆情事件整个周期的新闻数据中挖掘人物实体和关系,生成人物关系图谱。Mining character entities and relationships from news data that triggers the entire cycle of public opinion events, and generating a character relationship graph.
  3. 根据权利要求2所述的基于图嵌入的关系图谱关键人员分析方法,其特征在于,所述从触发舆情事件整个周期的新闻数据中挖掘人物实体和关系,生成人物关系图谱,包括:The key personnel analysis method of a relationship graph based on graph embedding according to claim 2, characterized in that, mining character entities and relationships from news data that triggers the entire cycle of public opinion events to generate a character relationship graph includes:
    使用爬虫技术在网络平台通过关键词过滤在指定舆情期间发表的新闻报道和社交动态数据,得到舆情期间所述新闻报道中与所述舆情事件相关的文本和社交动态内容,以及实体间互动关系,采用文本结构化技术生成相应的人物关系图谱。Use crawler technology to filter news reports and social dynamic data published during the specified public opinion period through keywords on the network platform, and obtain the text and social dynamic content related to the public opinion event in the news reports during the public opinion period, as well as the interactive relationship between entities. Use text structuring technology to generate the corresponding character relationship map.
  4. 根据权利要求1所述的基于图嵌入的关系图谱关键人员分析方法,其特征在于,所述采用图嵌入算法对所述人物关系图谱中的每个节点进行分析,得到各节点的嵌入向量,包括:The key personnel analysis method of the relationship graph based on graph embedding according to claim 1, characterized in that the graph embedding algorithm is used to analyze each node in the character relationship graph to obtain the embedding vector of each node, including :
    对于每个节点,采取随机游走的方法获取近邻节点,得到邻近节点集合;使用skip-gram模型训练邻近节点集合,用各邻近节点预测当前节点,使得当前节点现的概率最大,依次对所述邻近节点集合中的各邻 近节点进行训练,得到各节点的嵌入向量。For each node, a random walk method is used to obtain neighboring nodes and a set of neighboring nodes is obtained; the skip-gram model is used to train the neighboring node set, and each neighboring node is used to predict the current node so that the probability of the current node being present is maximized. Each neighboring node in the neighboring node set is trained to obtain the embedding vector of each node.
  5. 根据权利要求1所述的基于图嵌入的关系图谱关键人员分析方法,其特征在于,所述根据预先相关指标对节点,生成关键节点种子,包括:The key personnel analysis method of relationship graph based on graph embedding according to claim 1, characterized in that generating key node seeds based on pre-related indicators for nodes includes:
    根据预设相关指标生成图邻接矩阵,对所述邻接矩阵进行特征分解,得到特征值和特征向量;Generate a graph adjacency matrix according to the preset relevant indicators, perform eigendecomposition on the adjacency matrix, and obtain eigenvalues and eigenvectors;
    获取各节点特征值中最大特征值对应的特征向量,其中,第i个节点的中心性为最大特征值对应的特征向量中的第i个元素,根据各节点的中心性生成关键节点种子。Obtain the eigenvector corresponding to the largest eigenvalue among the eigenvalues of each node, where the centrality of the i-th node is the i-th element in the eigenvector corresponding to the largest eigenvalue, and generate key node seeds based on the centrality of each node.
  6. 根据权利要求1所述的基于图嵌入的关系图谱关键人员分析方法,其特征在于,所述根据所述各节点的嵌入向量,采用聚类算法对所述关键节点种子进行分析,识别出关键人员节点,包括:The key personnel analysis method of a relationship graph based on graph embedding according to claim 1, characterized in that, according to the embedding vector of each node, a clustering algorithm is used to analyze the key node seeds and identify the key personnel. Nodes, including:
    根据所述关键节点种子,采用聚类算法对所述各嵌入向量进行归类,得到若干聚类类别;According to the key node seeds, a clustering algorithm is used to classify each embedding vector to obtain several clustering categories;
    计算每个聚类类别c i的聚类中心,将计算得到的聚类中心作为更新后的聚类中心,以所述更新后的聚类中心作为关键人员节点。 Calculate the clustering center of each clustering category c i , use the calculated clustering center as the updated clustering center, and use the updated clustering center as the key personnel node.
  7. 根据权利要求6所述的基于图嵌入的关系图谱关键人员分析方法,其特征在于,所述采用聚类算法对所述各嵌入向量进行归类,得到若干聚类类别,包括:The key personnel analysis method of relationship graph based on graph embedding according to claim 6, characterized in that the clustering algorithm is used to classify each embedding vector to obtain several clustering categories, including:
    将所述关键节点种子作为初始聚类中心,计算各嵌入向量到各个初始聚类中心的距离,并获取距离各嵌入向量距离最短的初始聚类中心,将每一节点归类为距离其距离最短的初始聚类中心所属的聚类类别。Use the key node seeds as the initial clustering center, calculate the distance from each embedding vector to each initial clustering center, obtain the initial clustering center with the shortest distance from each embedding vector, and classify each node as the shortest distance from it. The clustering category to which the initial clustering center belongs.
  8. 一种基于图嵌入的关系图谱关键人员分析系统,其特征在于,包括:A graph embedding-based relationship graph key personnel analysis system, which is characterized by including:
    图谱构建单元,用于基于社交媒体数据构建人物关系图谱;A graph construction unit is used to construct a character relationship graph based on social media data;
    图谱分析单元,用于采用图嵌入算法对所述人物关系图谱中的每个节点进行分析,得到各节点的嵌入向量;The graph analysis unit is used to analyze each node in the character relationship graph using a graph embedding algorithm to obtain the embedding vector of each node;
    关键节点种子生成单元,用于根据预先相关指标生成所述人物关系图谱的关键节点种子;A key node seed generation unit, used to generate key node seeds of the character relationship map based on pre-related indicators;
    识别单元,用于根据所述各节点的嵌入向量,采用聚类算法对所述关键节点种子进行分析,识别出关键人员节点。An identification unit is configured to use a clustering algorithm to analyze the key node seeds according to the embedding vector of each node, and identify key personnel nodes.
  9. 一种电子设备,其特征在于,包括:至少一个处理器,以及与所述至少一个处理器通信连接的存储器,其中,所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行权利要求1-7任一项所述的基于图嵌入的关系图谱关键人员分析方法。An electronic device, characterized in that it includes: at least one processor, and a memory communicatively connected to the at least one processor, wherein the memory stores instructions that can be executed by the at least one processor, and the The instructions are executed by the at least one processor to enable the at least one processor to execute the graph embedding-based relationship graph key personnel analysis method described in any one of claims 1-7.
  10. 一种计算机存储介质,其上存储有计算机程序,其特征在于,所述计算机程序在被处理器执行时,实现权利要求1-7任一项所述的基于图嵌入的关系图谱关键人员分析方法。A computer storage medium with a computer program stored thereon, characterized in that, when executed by a processor, the computer program implements the graph embedding-based relationship graph key personnel analysis method described in any one of claims 1-7 .
PCT/CN2022/129009 2022-04-26 2022-11-01 Graph embedding-based relational graph key personnel analysis method and system WO2023207013A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210451803.3 2022-04-26
CN202210451803.3A CN114880482A (en) 2022-04-26 2022-04-26 Graph embedding-based relation graph key personnel analysis method and system

Publications (1)

Publication Number Publication Date
WO2023207013A1 true WO2023207013A1 (en) 2023-11-02

Family

ID=82671533

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/129009 WO2023207013A1 (en) 2022-04-26 2022-11-01 Graph embedding-based relational graph key personnel analysis method and system

Country Status (2)

Country Link
CN (1) CN114880482A (en)
WO (1) WO2023207013A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117808616A (en) * 2024-02-28 2024-04-02 中国传媒大学 Community discovery method and system based on graph embedding and node affinity

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114880482A (en) * 2022-04-26 2022-08-09 广州广电运通金融电子股份有限公司 Graph embedding-based relation graph key personnel analysis method and system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8312056B1 (en) * 2011-09-13 2012-11-13 Xerox Corporation Method and system for identifying a key influencer in social media utilizing topic modeling and social diffusion analysis
CN106296537A (en) * 2016-08-04 2017-01-04 武汉数为科技有限公司 Colony in a kind of information in public security organs industry finds method
CN111797714A (en) * 2020-06-16 2020-10-20 浙江大学 Multi-view human motion capture method based on key point clustering
CN111813951A (en) * 2020-06-18 2020-10-23 国网上海市电力公司 Key point identification method based on technical map
CN112269922A (en) * 2020-10-14 2021-01-26 西华大学 Community public opinion key character discovery method based on network representation learning
CN114880482A (en) * 2022-04-26 2022-08-09 广州广电运通金融电子股份有限公司 Graph embedding-based relation graph key personnel analysis method and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8312056B1 (en) * 2011-09-13 2012-11-13 Xerox Corporation Method and system for identifying a key influencer in social media utilizing topic modeling and social diffusion analysis
CN106296537A (en) * 2016-08-04 2017-01-04 武汉数为科技有限公司 Colony in a kind of information in public security organs industry finds method
CN111797714A (en) * 2020-06-16 2020-10-20 浙江大学 Multi-view human motion capture method based on key point clustering
CN111813951A (en) * 2020-06-18 2020-10-23 国网上海市电力公司 Key point identification method based on technical map
CN112269922A (en) * 2020-10-14 2021-01-26 西华大学 Community public opinion key character discovery method based on network representation learning
CN114880482A (en) * 2022-04-26 2022-08-09 广州广电运通金融电子股份有限公司 Graph embedding-based relation graph key personnel analysis method and system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117808616A (en) * 2024-02-28 2024-04-02 中国传媒大学 Community discovery method and system based on graph embedding and node affinity

Also Published As

Publication number Publication date
CN114880482A (en) 2022-08-09

Similar Documents

Publication Publication Date Title
WO2023207013A1 (en) Graph embedding-based relational graph key personnel analysis method and system
WO2023000574A1 (en) Model training method, apparatus and device, and readable storage medium
Amini et al. On density-based data streams clustering algorithms: A survey
WO2022205833A1 (en) Method and system for constructing and analyzing knowledge graph of wireless network protocol, and device and medium
CN110224987B (en) Method for constructing network intrusion detection model based on transfer learning and detection system
Yang et al. A non-revisiting quantum-behaved particle swarm optimization based multilevel thresholding for image segmentation
CN114332984B (en) Training data processing method, device and storage medium
de Arruda et al. A complex networks approach for data clustering
CN114172688B (en) Method for automatically extracting key nodes of network threat of encrypted traffic based on GCN-DL (generalized traffic channel-DL)
CN110855648A (en) Early warning control method and device for network attack
CN106789149B (en) Intrusion detection method adopting improved self-organizing characteristic neural network clustering algorithm
CN112561031A (en) Model searching method and device based on artificial intelligence and electronic equipment
CN117061322A (en) Internet of things flow pool management method and system
CN116506181A (en) Internet of vehicles intrusion detection method based on different composition attention network
Wang et al. Unsupervised outlier detection for mixed-valued dataset based on the adaptive k-nearest neighbor global network
WO2016106944A1 (en) Method for creating virtual human on mapreduce platform
CN116522988B (en) Federal learning method, system, terminal and medium based on graph structure learning
CN111292062B (en) Network embedding-based crowd-sourced garbage worker detection method, system and storage medium
CN112052399A (en) Data processing method and device and computer readable storage medium
CN112367325B (en) Unknown protocol message clustering method and system based on closed frequent item mining
CN113822412A (en) Graph node marking method, device, equipment and storage medium
CN114528973A (en) Method for generating business processing model, business processing method and device
CN116610820B (en) Knowledge graph entity alignment method, device, equipment and storage medium
Peng et al. TH-SLP: Web Service Link Prediction Based on Topic-aware Heterogeneous Graph Neural Network
WO2023011062A1 (en) Information pushing method and apparatus, device, storage medium, and computer program product

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22939830

Country of ref document: EP

Kind code of ref document: A1