CN114880482A - Graph embedding-based relation graph key personnel analysis method and system - Google Patents

Graph embedding-based relation graph key personnel analysis method and system Download PDF

Info

Publication number
CN114880482A
CN114880482A CN202210451803.3A CN202210451803A CN114880482A CN 114880482 A CN114880482 A CN 114880482A CN 202210451803 A CN202210451803 A CN 202210451803A CN 114880482 A CN114880482 A CN 114880482A
Authority
CN
China
Prior art keywords
node
graph
key
embedding
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210451803.3A
Other languages
Chinese (zh)
Inventor
张暐
郭峰
陈瀚平
曹瑞雪
陈栩琪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
GRG Banking Equipment Co Ltd
Original Assignee
GRG Banking Equipment Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by GRG Banking Equipment Co Ltd filed Critical GRG Banking Equipment Co Ltd
Priority to CN202210451803.3A priority Critical patent/CN114880482A/en
Publication of CN114880482A publication Critical patent/CN114880482A/en
Priority to PCT/CN2022/129009 priority patent/WO2023207013A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • G06F18/2414Smoothing the distance, e.g. radial basis function networks [RBFN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Mathematical Analysis (AREA)
  • Operations Research (AREA)
  • Algebra (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Animal Behavior & Ethology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a relation graph key personnel analysis method and a system based on graph embedding, wherein the method comprises the following steps of constructing a character relation graph based on social media data; analyzing each node in the character relation graph by adopting a graph embedding algorithm to obtain an embedding vector of each node; generating key node seeds of the figure relation graph according to pre-related indexes; and analyzing the key node seeds by adopting a clustering algorithm according to the embedded vector of each node, and identifying key personnel nodes. The invention fully utilizes the topological property of the relational graph, has the learning performance, does not need to manually set parameter values or the calculation rule of the specified degree gain, and thus eliminates the adverse effect of unreasonable manual rule setting on the result; meanwhile, the whole graph is calculated, the isomorphism and heterogeneity of the nodes are integrated, and the obtained analysis result of key personnel is more accurate.

Description

Graph embedding-based relation graph key personnel analysis method and system
Technical Field
The invention relates to the technical field of knowledge graph analysis, in particular to a method and a system for analyzing key personnel of a relation graph based on graph embedding.
Background
The personnel relationship map is a knowledge map which is constructed by taking social, relatives and emotional relationships between personnel entities as cores. According to the six-degree separation theory, any two strangers can establish contact only by five friends at most in interpersonal interaction. To some extent, all people in the world can be linked through a personal relationship network. Because of the complexity of the real world, the number of people and relationship types involved in the construction of the relationship graph is increasing. In a plurality of sub-graphs of a relationship graph, only one character or a plurality of characters often play a main role, especially in public opinion analysis, administrative management, risk control and recommendation systems, the mining of key personnel plays a decisive role in business, and the method becomes an important technology for knowledge graph analysis and application.
The key people on the relation graph are mined, learning methods are few, and manual qualitative or simple static numerical calculation is also relied on. For example, chinese patent CN113032607A discloses a key personnel analysis method, which includes: acquiring a member relation map, acquiring member initialization weight, acquiring member interaction information, calculating and updating a member full value based on the interaction information and the initial full value, and if the sum of the weight differences of two adjacent times corresponding to each node person obtained after updating is less than a preset weight threshold, extracting the node person with the maximum updated weight as a target node person, wherein the scheme has the following defects: 1) the updating methods of the node information, the value of the interactive information and the node weight in the relational graph are all set by manual rules and have no learnability. 2) When nodes and relations are added and deleted and cross-domain service migration is carried out, corresponding service rules need to be given through manual intervention, and expansibility is not achieved. 3) The weight updating of the node personnel only comprises local structure information and personnel information, a global topological structure cannot be utilized, and high accuracy is not achieved. These problems make the analysis of the key personnel of the relationship graph unintelligent, and have serious application limitation.
For example, chinese patent CN 112269922 a discloses a method for discovering community public opinion key characters based on network representation learning, the method includes inputting a social network relationship diagram into a community structure and structure hole node discovery model to obtain a community partition set and structure hole nodes; inputting the social network relationship graph and the community partition set into a network embedding model containing social influence and a community structure to obtain the social influence of the nodes in the community network graph and a node network embedding expression vector; and performing visual analysis based on the structural hole nodes, the social influence and the network embedded expression vector to obtain public opinion key figures. "this solution still suffers from the following disadvantages: 1) and (3) direct modularity gain and indirect modularity gain of the relational graph until a target matrix of the network embedded vector is obtained through eigenvalue decomposition, wherein a vectorization method in the whole process is given by rules and still belongs to manual selection of features, rather than adaptive learning. The method greatly depends on the rule definition of the gain of the degree of the direct module and the indirect module, and if the rule definition cannot reflect the network structure, the method is greatly influenced, and the accuracy rate of finding key people is reduced.
Disclosure of Invention
In view of the above technical problems, the present invention aims to provide a method and a system for analyzing key personnel of a relational graph based on graph embedding, which solve the problems that the conventional method for mining key people has no expansibility, or the accuracy is low due to the fact that only local structural information and personnel information are included in the weight update of node personnel, or the accuracy is low due to the dependence on the rule of direct module and indirect module degree gain.
The invention adopts the following technical scheme:
a relation map key personnel analysis method based on graph embedding comprises the following steps:
constructing a figure relation graph based on social media data;
analyzing each node in the character relation graph by adopting a graph embedding algorithm to obtain an embedding vector of each node;
generating key node seeds of the figure relation graph according to pre-related indexes;
and analyzing the key node seeds by adopting a clustering algorithm according to the embedded vector of each node, and identifying key personnel nodes.
Optionally, the building of the person relationship graph based on social media data includes:
and mining character entities and relations from news data triggering the whole period of the public sentiment event to generate a character relation map.
Optionally, the mining of the character entities and relationships from news data triggering the whole period of the public sentiment event to generate the character relationship graph includes:
the method comprises the steps of filtering news reports and social dynamic data published in a specified public sentiment period through keywords in a network platform by using a crawler technology to obtain texts and social dynamic contents related to public sentiment events in the news reports in the public sentiment period and interactive relations among entities, and generating a corresponding character relation graph by using a text structuring technology.
Optionally, the analyzing each node in the person relationship graph by using a graph embedding algorithm to obtain an embedding vector of each node includes:
for each node, acquiring a neighboring node by adopting a random walk method to obtain a neighboring node set; and training a neighboring node set by using a skip-gram model, predicting the current node by using each neighboring node to enable the current node to have the maximum probability, and sequentially training each neighboring node in the neighboring node set to obtain the embedded vector of each node.
Optionally, the generating a key node seed for the node according to the pre-correlation index includes:
generating an adjacent matrix of the image according to preset relevant indexes, and performing characteristic decomposition on the adjacent matrix to obtain a characteristic value and a characteristic vector;
and acquiring a feature vector corresponding to the maximum feature value in the feature values of the nodes, wherein the centrality of the ith node is the ith element in the feature vector corresponding to the maximum feature value, and generating a key node seed according to the centrality of each node.
Optionally, the analyzing, according to the key node seeds, the embedded vectors of the nodes by using a clustering algorithm to identify key personnel nodes includes:
classifying each embedded vector by adopting a clustering algorithm according to the key node seeds to obtain a plurality of clustering categories;
calculate each cluster class c i And taking the calculated clustering center as an updated clustering center, and taking the updated clustering center as a key personnel node.
Optionally, the classifying each embedded vector by using a clustering algorithm to obtain a plurality of clustering categories includes:
and taking the key node seeds as initial clustering centers, calculating the distance from each embedded vector to each initial clustering center, acquiring the initial clustering center with the shortest distance from each embedded vector, and classifying each node into the clustering category to which the initial clustering center with the shortest distance belongs.
A graph-embedding-based relationship graph key personnel analysis system comprises:
the map building unit is used for building a character relation map based on the social media data;
the graph analysis unit is used for analyzing each node in the character relation graph by adopting a graph embedding algorithm to obtain an embedding vector of each node;
the key node seed generating unit is used for generating key node seeds of the figure relation graph according to pre-related indexes;
and the identification unit is used for analyzing the key node seeds by adopting a clustering algorithm according to the embedded vectors of all the nodes and identifying key personnel nodes.
An electronic device, comprising: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to perform the graph-embedding-based relationship graph key personnel analysis method.
A computer storage medium having stored thereon a computer program which, when executed by a processor, implements the graph embedding-based relationship graph key personnel analysis method.
Compared with the prior art, the invention has the beneficial effects that:
the character relation graph is constructed based on social media data, each node in the character relation graph is analyzed by adopting a graph embedding algorithm, an embedding vector of each node is obtained, the topological property of the relation graph is fully utilized, meanwhile, the learning performance is achieved, network embedding expression and node vectorization are respectively determined by random walk control and a corresponding machine learning method, parameter values or calculation rules of regulation gain do not need to be set manually, and therefore the adverse effect of unreasonable manual rule setting on results is eliminated; meanwhile, the character relation graph is constructed based on social media data, only a network topological structure is relied on, when nodes and relations are added and deleted and cross-domain service migration is carried out, the network can be trained quickly, and extra knowledge injection is not needed; generating key node seeds of the figure relation graph according to pre-related indexes; and analyzing the key node seeds by adopting a clustering algorithm according to the embedded vectors of the nodes to identify key personnel nodes, and calculating the whole graph in the process of identifying the key personnel nodes, so that the isomorphism and heterogeneity of the nodes are integrated, and the obtained key personnel analysis result is more accurate.
Furthermore, a random walk method is adopted to obtain neighboring nodes to obtain a neighboring node set, each neighboring node is used for predicting the current node, the current probability of the current node is made to be maximum, each neighboring node in the neighboring node set is trained in sequence to obtain an embedded vector of each node, a graph embedding method based on random walk is adopted for analysis, parameter values or a calculation rule of a specified gain is not required to be set manually, and therefore the high accuracy of identifying key personnel nodes is further improved.
Drawings
Fig. 1 is a schematic flowchart of a relationship graph key person analysis method based on graph embedding according to an embodiment of the present invention;
fig. 2 is a schematic diagram of random walk sampling of a neighboring node according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a relationship graph key personnel analysis system based on graph embedding according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The present invention will be further described with reference to the accompanying drawings and specific embodiments, and it should be noted that, in the premise of no conflict, the following described embodiments or technical features may be arbitrarily combined to form a new embodiment:
the first embodiment is as follows:
the following explains the terms of art in the present invention:
graph Embedding (also called Network Embedding) is a process for mapping Graph data (usually a high-dimensional dense matrix) into a low-micro dense vector, and can well solve the problem that the Graph data is difficult to be efficiently input into a machine learning algorithm.
An Adjacency Matrix (Adjacency Matrix) is a Matrix representing the Adjacency relationship between vertices, and the logical structure of the Adjacency Matrix is divided into two parts: v and E are set, where V is a vertex and E is an edge. Therefore, a one-dimensional array is used for storing all vertex data in the graph; the data of the relationships (edges or arcs) between vertices are stored in a two-dimensional array called a adjacency matrix.
Centrality (centricity) is a measure of the importance of a node in a network. Centrality may be defined for a single node or a group of nodes. The feature vector centrality is the centrality of the node in combination with the centrality of the neighbors of the node.
The embedded vector of the node is the vector representation of a vertex (vertex) in the network obtained through the connection relation in the network structure, and the vector representation is applied to tasks such as clustering, classification and the like as a basic feature.
Referring to fig. 1, fig. 1 shows a method for analyzing key personnel of a relationship graph based on graph embedding, which includes the following steps:
step S1, constructing a character relation graph based on the social media data;
specifically, the building of the person relationship graph based on social media data includes:
and mining character entities and relations from news data triggering the whole period of the public sentiment event to generate a character relation map.
In the specific implementation, a crawler technology can be used for filtering news reports and social dynamic data published in a specified public sentiment period through keywords in a network platform to obtain texts and social dynamic contents related to the public sentiment events in the news reports in the public sentiment period and interactive relations among entities, and a text structuring technology is adopted to generate a corresponding character relation graph.
In specific implementation, in the process of constructing the figure relation graph, the figure relation graph can be constructed through a knowledge triple extraction technology, a knowledge graph generation technology which dynamically evolves along with time, a development relation mining technology, a transfer learning technology based on domain knowledge and the like.
Step S2, analyzing each node in the character relation graph by adopting a graph embedding algorithm to obtain an embedding vector of each node;
optionally, the step S2 includes:
for each node, acquiring a neighboring node by adopting a random walk method to obtain a neighboring node set; specifically, referring to fig. 2, fig. 2 is a schematic diagram illustrating a random walk sampling of a neighboring node according to an embodiment of the present invention; wherein, given a current vertex v, the probability of going to vertex x is:
Figure RE-RE-GDA0003740491920000061
wherein, pi vx Expressing the unnormalized transition probability between the vertexes, namely the probability that the random walk passes through the node t to reach the node v and walks to the node x; z is a normalization constant;
specifically, to control the direction of random walk to express our preference, assume that the current random walk reaches node v through node t, and the probability pi of the walk to x at this time vx The following formula is satisfied:
ω vx =α pq (t,x)·ω vx ;ω vx is the weight of the edge, p is the return parameter, q is the distance parameter, d tx Is the shortest path distance; coefficient alpha pq(t,x) The following formula is satisfied:
Figure RE-RE-GDA0003740491920000062
where if q >1, the random walk tends to access a node close to the previous node, and if q <1, the random walk tends to access a node far from the previous node.
In the implementation process, the invention is based on a vectorization method of random walk, is different from a non-vectorization method of updating the value of CN113032607A interaction information and the node weight of Chinese patent, is also different from a regular method of modularity gain of CN 112269922A of Chinese patent, and has learnability and adaptivity.
Then, a skip-gram model is used for training a neighboring node set, each neighboring node is used for predicting the current node, the current probability of the current node is made to be maximum, and each neighboring node in the neighboring node set is trained in sequence to obtain the embedded vector of each node.
In the implementation process, a character relation graph is constructed based on social media data, each node in the character relation graph is analyzed by adopting a graph embedding algorithm, for example, character entities and relations are mined from news data triggering a public sentiment event in a whole period, and the character relation graph is generated; the graph is analyzed by using a graph embedding machine learning method based on random walk to obtain a node vector, the whole graph is directly vectorized, the characteristic information is obtained more comprehensively, and the isomorphism and heterogeneity of nodes are integrated by calculating the whole graph, so that the obtained analysis result of key personnel is more accurate.
In specific implementation, in the technical process of predicting the current node by using the neighboring node, a derivation method of word2vec such as CBOW and a training optimization method based on negative sampling or a huffman tree can be further adopted to help realize the prediction of the current node.
Specifically, a neighboring node set of the current node is obtained and marked as N S (u) training each neighboring node by using a skip-gram model, and predicting the current node by using the neighboring nodes so as to maximize the probability of the current node, wherein the maximum probability is
Figure RE-RE-GDA0003740491920000071
Then, each adjacent node is trained in sequence to obtain an embedded vector.
Step S3, generating key node seeds of the character relation graph according to the pre-correlation indexes;
optionally, the step S3 includes:
generating an adjacent matrix of the image according to preset relevant indexes, and performing characteristic decomposition on the adjacent matrix to obtain a characteristic value and a characteristic vector;
and acquiring a feature vector corresponding to the maximum feature value in the feature values of the nodes, wherein the centrality of the ith node is the ith element in the feature vector corresponding to the maximum feature value, and generating a key node seed according to the centrality of each node.
Specifically, the graph adjacency matrix a may be generated according to relevant indexes such as network density, reachability, clustering coefficient, and centrality measure, and the adjacency matrix is subjected to feature decomposition, that is, Ax is λ x, so as to obtain a feature value and a feature vector, and then after the feature value and the feature vector are obtained, the centrality of the ith node in the feature vector corresponding to the largest feature value is equal to the ith element in the feature vector.
During specific implementation, manual labeling, pre-training model labeling, remote unsupervised small sample labeling and other small sample labeling methods can be adopted, labeling is performed firstly, and the centrality can further include importance measurement indexes such as degree centrality, betweenness centrality, tight centrality and the like.
And step S4, analyzing the key node seeds by adopting a clustering algorithm according to the embedded vectors of the nodes, and identifying key personnel nodes.
Optionally, the step S4 specifically includes:
taking the key node seeds as initial clustering centers, wherein the initial clustering centers respectively have alpha 1 、α 2 、......α k The initial cluster centers form an initial cluster center set alpha ═ alpha 1 ,α 2 ,......α k
Classifying the embedded vectors by adopting a clustering algorithm to obtain a plurality of clustering categories; calculate each cluster class c i And the calculated clustering center is used as a key personnel node.
In the implementation process, the vectorization method for image embedding is directly classified, does not depend on strong hypothesis, is different from the community structure and social influence hypothesis of CN 112269922A of Chinese patent, and has universality.
The step of classifying each embedded vector by using a clustering algorithm comprises the following steps:
computing each of the embedding vectors x i The distance from each initial clustering center is obtained, and the initial clustering center alpha with the shortest distance from each embedded vector is obtained i Classifying each node as an initial cluster center alpha having the shortest distance from the node i Cluster category to which it belongs c i Wherein i is more than or equal to 1 and less than or equal to k, and i and k are natural numbers;
specifically, the calculation method of the cluster center adopted by the calculation is as follows:
Figure RE-RE-GDA0003740491920000081
wherein, | c i And | representing the number of nodes in the clustering category, and repeating iteration of the algorithm of the clustering center until a certain termination condition is reached, wherein the category where the key node seed node is located is used as the key node category.
In this embodiment, a machine learning method is used to analyze the quantitative nodes and identify key nodes, and specifically, the algorithm used to identify the key personnel nodes may be a supervised and semi-supervised machine learning classification algorithm.
Referring to fig. 3, fig. 3 shows a graph-embedding-based relationship graph key personnel analysis system according to the present invention, which includes:
the map building unit is used for building a character relation map based on the social media data;
the graph analysis unit is used for analyzing each node in the character relation graph by adopting a graph embedding algorithm to obtain an embedding vector of each node;
the key node seed generating unit is used for generating key node seeds of the figure relation graph according to pre-related indexes;
and the identification unit is used for analyzing the key node seeds by adopting a clustering algorithm according to the embedded vectors of all the nodes and identifying key personnel nodes.
Example three:
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application, and in the present application, an electronic device 100 for implementing a graph-embedded relationship graph key person analysis method according to the present invention according to the embodiment of the present application may be described by using the schematic diagram shown in fig. 4.
As shown in fig. 4, an electronic device 100 includes one or more processors 102, one or more memory devices 104, and the like, which are interconnected via a bus system and/or other type of connection mechanism (not shown). It should be noted that the components and structure of the electronic device 100 shown in fig. 4 are only exemplary and not limiting, and the electronic device may have some of the components shown in fig. 4 and may also have other components and structures not shown in fig. 4, as needed.
The processor 102 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 100 to perform desired functions.
The storage 104 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. On which one or more computer program instructions may be stored that may be executed by processor 102 to implement the functions of the embodiments of the application (as implemented by the processor) described below and/or other desired functions. Various applications and various data, such as various data used and/or generated by the applications, may also be stored in the computer-readable storage medium.
The invention also provides a computer storage medium on which a computer program is stored, in which the method of the invention, if implemented in the form of software functional units and sold or used as a stand-alone product, can be stored. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer storage medium and used by a processor to implement the steps of the embodiments of the method. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer storage medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer storage media may include content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer storage media that does not include electrical carrier signals and telecommunications signals as subject to legislation and patent practice.
Various other modifications and changes may be made by those skilled in the art based on the above-described technical solutions and concepts, and all such modifications and changes should fall within the scope of the claims of the present invention.

Claims (10)

1. A relation map key personnel analysis method based on graph embedding is characterized by comprising the following steps:
constructing a figure relation graph based on social media data;
analyzing each node in the character relation graph by adopting a graph embedding algorithm to obtain an embedding vector of each node;
generating key node seeds of the figure relation graph according to pre-related indexes;
and analyzing the key node seeds by adopting a clustering algorithm according to the embedded vector of each node, and identifying key personnel nodes.
2. The graph embedding-based relationship graph key personnel analysis method according to claim 1, wherein the building of the person relationship graph based on social media data comprises:
and mining character entities and relations from news data triggering the whole period of the public sentiment event to generate a character relation map.
3. The graph-based embedded relationship graph key personnel analysis method according to claim 2, wherein the mining of the character entities and relationships from news data triggering the whole period of the public sentiment event to generate the character relationship graph comprises:
the method comprises the steps of filtering news reports and social dynamic data published in a specified public sentiment period through keywords in a network platform by using a crawler technology to obtain texts and social dynamic contents related to public sentiment events in the news reports in the public sentiment period and interactive relations among entities, and generating a corresponding character relation graph by using a text structuring technology.
4. The graph embedding-based relationship graph key personnel analysis method according to claim 1, wherein the analyzing each node in the person relationship graph by using a graph embedding algorithm to obtain an embedded vector of each node comprises:
for each node, acquiring a neighboring node by adopting a random walk method to obtain a neighboring node set; and training a neighboring node set by using a skip-gram model, predicting the current node by using each neighboring node to enable the current node to have the maximum probability, and sequentially training each neighboring node in the neighboring node set to obtain the embedded vector of each node.
5. The graph embedding-based relational graph key personnel analysis method according to claim 1, wherein the generating key node seeds for nodes according to pre-correlation indexes comprises:
generating an adjacent matrix of the image according to preset relevant indexes, and performing characteristic decomposition on the adjacent matrix to obtain a characteristic value and a characteristic vector;
and acquiring a feature vector corresponding to the maximum feature value in the feature values of the nodes, wherein the centrality of the ith node is the ith element in the feature vector corresponding to the maximum feature value, and generating a key node seed according to the centrality of each node.
6. The graph embedding-based relational graph key personnel analysis method according to claim 1, wherein the step of analyzing the embedding vector of each node by adopting a clustering algorithm according to the key node seeds to identify key personnel nodes comprises the following steps:
classifying each embedded vector by adopting a clustering algorithm according to the key node seeds to obtain a plurality of clustering categories;
calculate each cluster class c i And taking the calculated clustering center as an updated clustering center, and taking the updated clustering center as a key personnel node.
7. The graph embedding-based relationship atlas key personnel analysis method of claim 6, wherein the classifying each embedded vector by a clustering algorithm to obtain a plurality of cluster categories comprises:
and taking the key node seeds as initial clustering centers, calculating the distance from each embedded vector to each initial clustering center, acquiring the initial clustering center with the shortest distance from each embedded vector, and classifying each node into the clustering category to which the initial clustering center with the shortest distance belongs.
8. A relational graph key personnel analysis system based on graph embedding is characterized by comprising:
the map building unit is used for building a character relation map based on the social media data;
the graph analysis unit is used for analyzing each node in the character relation graph by adopting a graph embedding algorithm to obtain an embedding vector of each node;
the key node seed generating unit is used for generating key node seeds of the figure relation graph according to pre-related indexes;
and the identification unit is used for analyzing the key node seeds by adopting a clustering algorithm according to the embedded vectors of all the nodes and identifying key personnel nodes.
9. An electronic device, comprising: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the graph-based embedded relationship graph key personnel analysis method of any of claims 1-7.
10. A computer storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the graph-embedding based relationship graph key personnel analysis method of any one of claims 1-7.
CN202210451803.3A 2022-04-26 2022-04-26 Graph embedding-based relation graph key personnel analysis method and system Pending CN114880482A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210451803.3A CN114880482A (en) 2022-04-26 2022-04-26 Graph embedding-based relation graph key personnel analysis method and system
PCT/CN2022/129009 WO2023207013A1 (en) 2022-04-26 2022-11-01 Graph embedding-based relational graph key personnel analysis method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210451803.3A CN114880482A (en) 2022-04-26 2022-04-26 Graph embedding-based relation graph key personnel analysis method and system

Publications (1)

Publication Number Publication Date
CN114880482A true CN114880482A (en) 2022-08-09

Family

ID=82671533

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210451803.3A Pending CN114880482A (en) 2022-04-26 2022-04-26 Graph embedding-based relation graph key personnel analysis method and system

Country Status (2)

Country Link
CN (1) CN114880482A (en)
WO (1) WO2023207013A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023207013A1 (en) * 2022-04-26 2023-11-02 广州广电运通金融电子股份有限公司 Graph embedding-based relational graph key personnel analysis method and system

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117808616A (en) * 2024-02-28 2024-04-02 中国传媒大学 Community discovery method and system based on graph embedding and node affinity

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8312056B1 (en) * 2011-09-13 2012-11-13 Xerox Corporation Method and system for identifying a key influencer in social media utilizing topic modeling and social diffusion analysis
CN106296537B (en) * 2016-08-04 2019-11-19 武汉数为科技有限公司 A kind of group in information in public security organs industry finds method
CN111797714B (en) * 2020-06-16 2022-04-26 浙江大学 Multi-view human motion capture method based on key point clustering
CN111813951A (en) * 2020-06-18 2020-10-23 国网上海市电力公司 Key point identification method based on technical map
CN112269922B (en) * 2020-10-14 2022-05-31 西华大学 Community public opinion key character discovery method based on network representation learning
CN114880482A (en) * 2022-04-26 2022-08-09 广州广电运通金融电子股份有限公司 Graph embedding-based relation graph key personnel analysis method and system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023207013A1 (en) * 2022-04-26 2023-11-02 广州广电运通金融电子股份有限公司 Graph embedding-based relational graph key personnel analysis method and system

Also Published As

Publication number Publication date
WO2023207013A1 (en) 2023-11-02

Similar Documents

Publication Publication Date Title
US10360517B2 (en) Distributed hyperparameter tuning system for machine learning
CN108717408B (en) Sensitive word real-time monitoring method, electronic equipment, storage medium and system
CN110968701A (en) Relationship map establishing method, device and equipment for graph neural network
CN111753044B (en) Regularization-based language model for removing social bias and application
CN114880482A (en) Graph embedding-based relation graph key personnel analysis method and system
CN113554175B (en) Knowledge graph construction method and device, readable storage medium and terminal equipment
CN113190670A (en) Information display method and system based on big data platform
CN112418320B (en) Enterprise association relation identification method, device and storage medium
CN113610265A (en) Hypergraph convolutional network-based time-space behavior prediction method and system
CN114036051A (en) Test method, device, equipment and storage medium
WO2017119006A1 (en) Method and virtual data agent system for providing data insights with artificial intelligence
CN114491084B (en) Self-encoder-based relation network information mining method, device and equipment
CN117061322A (en) Internet of things flow pool management method and system
CN114219562A (en) Model training method, enterprise credit evaluation method and device, equipment and medium
CN110096651B (en) Visual analysis method based on online social media personal center network
JP2005078240A (en) Method for extracting knowledge by data mining
CN110910235A (en) Method for detecting abnormal behavior in credit based on user relationship network
Kumar et al. Community-enhanced Link Prediction in Dynamic Networks
Aktekin et al. A family of multivariate non‐gaussian time series models
CN112131199A (en) Log processing method, device, equipment and medium
CN115660695A (en) Customer service personnel label portrait construction method and device, electronic equipment and storage medium
CN115619245A (en) Portrait construction and classification method and system based on data dimension reduction method
CN115099344A (en) Model training method and device, user portrait generation method and device, and equipment
CN114897290A (en) Evolution identification method and device of business process, terminal equipment and storage medium
Chen et al. Community Detection Based on DeepWalk Model in Large‐Scale Networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination