CN114819152A - Graph embedding expert entity alignment method based on reinforcement learning enhancement - Google Patents

Graph embedding expert entity alignment method based on reinforcement learning enhancement Download PDF

Info

Publication number
CN114819152A
CN114819152A CN202210060387.4A CN202210060387A CN114819152A CN 114819152 A CN114819152 A CN 114819152A CN 202210060387 A CN202210060387 A CN 202210060387A CN 114819152 A CN114819152 A CN 114819152A
Authority
CN
China
Prior art keywords
node
entity
expert
reinforcement learning
graph embedding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210060387.4A
Other languages
Chinese (zh)
Inventor
邵健
胡单春
鲁伟明
庄越挺
宗畅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202210060387.4A priority Critical patent/CN114819152A/en
Publication of CN114819152A publication Critical patent/CN114819152A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/027Frames

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a graph embedding entity alignment method based on reinforcement learning enhancement. The invention adopts a mode of constructing the heterogeneous subgraph, only carries out message aggregation on the n-hop neighbors of the entity pair to be aligned, and directly reduces the requirement of computing resources. A graph embedding learning algorithm based on characteristic linear modulation is used, a hyper-network thought is introduced, and a message transmission mechanism and a node updating mechanism with high computational complexity are completed by a small number of parameters, so that interactive information among nodes is better utilized. In addition, the invention provides a node selector for reinforcement learning enhancement, and provides and applies a reliability measurement method based on an automatic supervision signal in the node selector to sample a certain number of reliable edges, so that the size of a heterogeneous subgraph is limited, problem edges are filtered, and the reliability of the edges participating in node updating is ensured. The invention also realizes a node sampling quantity updating strategy based on reinforcement learning, dynamically optimizes the number of sampling nodes and enhances the node selector.

Description

Graph embedding expert entity alignment method based on reinforcement learning enhancement
Technical Field
The invention relates to the technical field of knowledge maps in natural language processing, in particular to a graph embedding entity alignment method based on reinforcement learning enhancement.
Background
"competition in the 21 st century is competition of talents", and talent elements occupy the core position of industrial elements. The talent knowledge base can be used as an upstream data knowledge support of downstream decision tasks such as element scheduling, talent recommendation, accurate project investment and the like, and is a decision basis for various governments to perfect and optimize industrial transformation, reasonably coordinate and mobilize various industrial elements. At present, mass talent related data are widely distributed on internet platforms such as academic institution websites and academic search engines, and the problems of serious islanding, various data types, uneven quality and the like exist. In order to construct a large-scale and high-quality talent knowledge base, it is one of the necessary means to integrate the knowledge of other talent knowledge bases. The expert entity is used as a pivot for linking different knowledge bases and is very important for integrating the talent knowledge bases. The process of identifying expert entities in different talent knowledge bases that represent the same individual in the real world is called expert entity alignment.
Entity alignment is typically scored by comparing some features of the entity pair to be aligned, such as entity name, entity attributes, and attribute values, using some machine learning method or representation learning based method for similarity calculation. However, for the talent knowledge base, the existing data characteristics and the application requirements of the real scene provide some requirements for the existing entity alignment method: first, the available information is reduced. The relationship and the attribute in the talent knowledge base have enumerability and do not need to be aligned. Due to the characteristic, the existing entity alignment methods cannot utilize the alignment information of the relationship predicates and the attribute predicates, and the performance of the model is reduced. Second, computational resources are limited and the results of the run are not stable. The entity scale in the talent knowledge base is very large, a large number of paper entities or expert entities are added every day, the number of results published by different experts is greatly different, and calculation instability can be caused to a certain extent in an actual application scene. Third, problematic edges exist in the knowledge base. In the existing knowledge bases, it is common that there are wrong triple data, and the existence of these wrong problem edges undoubtedly has negative influence on the model to judge whether an entity pair is the same real entity.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a graph embedding expert entity alignment method based on reinforcement learning enhancement. The technical scheme of the invention is as follows:
the invention provides a graph embedding expert entity alignment method based on reinforcement learning enhancement, which comprises the following steps:
step 1: data G of two talent knowledge bases is obtained 1 =(E 1 ,T 1 R) and G 2 =(E 2 ,T 2 R), wherein E represents an entity set, and the entity types comprise expert entities and paper entities; r represents a relation set; t represents a triple set, which is a subset of E multiplied by R multiplied by E;
step 2: for each expert entity e in a talent knowledge base, generating a candidate expert set C from another talent knowledge base through a candidate entity pair generation module based on a regular matching template generator method according to the expert names;
and step 3: for each candidate expert C ∈ C, construct about entity pairs<e,c>2-hop isomorphism scheme HG ═ V, H, T, R, where
Figure RE-GDA0003637329340000021
H 1 、H 2 Respectively representing entity initial vector sets of two talent knowledge bases; initial node vector representation h (0)
Figure RE-GDA0003637329340000022
Figure RE-GDA0003637329340000023
h struct =LINE(G)
Wherein
Figure RE-GDA0003637329340000024
Represents a vector splicing operation, h attr Average vector of word vectors obtained for each attribute feature of an entity through a skip-gram model, h struct The structure vector is obtained by coding the structure information of each entity in the knowledge base through a LINE model;
and 4, step 4: in a node vector updating module, in each layer of graph embedding layer, calculating the reliability of each edge by using a reliability measurement method based on an automatic supervision signal;
and 5: in each layer of graph embedding layer, for the heterogeneous subgraphs HG, a top-p sampling strategy is used, and according to the reliability calculated in the step 4, p is sampled for each relation from large to small r The method comprises the steps of cutting reliable edges, and updating the node sampling number by using a node sampling number updating strategy based on reinforcement learning;
step 6: in each layer of graph embedding layer, after obtaining the sampled heterogeneous subgraph, updating the node vector by using a graph embedding learning algorithm based on characteristic linear modulation;
and 7: after the L-layer graph embedding layer, the updated entity pair to be aligned is taken out<e,c>Node vector of
Figure RE-GDA0003637329340000025
And
Figure RE-GDA0003637329340000026
computing match probabilities through a multi-tier perceptron
Figure RE-GDA0003637329340000027
And 8: and according to the matching probabilities of all candidate entities, taking the candidate expert with the highest probability as the matching expert.
Compared with the prior art, the method has the advantages that:
(1) aiming at the problem of available information reduction, a graph embedding learning algorithm based on characteristic linear modulation is used, a hyper-network thought is introduced, a relation weight matrix can be dynamically generated according to a target node vector, and a message transmission mechanism and a node updating mechanism with high computational complexity are completed by a small number of parameters, so that interactive information among nodes is better utilized;
(2) aiming at the problem edges in the knowledge base, the method provides a node selector for reinforcement learning enhancement, which applies a reliability measurement method based on an automatic supervision signal and constructs the automatic supervision signal based on the existing edges, so that the problem edges can be filtered according to the reliability sampling relationship edges, and the reliability of the edges participating in node updating is ensured. The size of the sampled heterogeneous subgraph is controlled, and meanwhile, the problems of limited computing resources and unstable operation results are solved.
(3) In order to avoid large amount of manual adjustment of the sampling quantity of the nodes, the node selector also realizes a node sampling quantity updating strategy based on reinforcement learning, and dynamically optimizes the number of the sampling nodes.
Drawings
FIG. 1 is a diagram of an overall framework of a graph embedding expert entity alignment method based on reinforcement learning enhancement.
FIG. 2 is an overall flow diagram of a candidate entity canonical match template generator.
FIG. 3 is a flowchart of the node vector update module as a whole, wherein the dashed box portion is implemented only during the training process.
Detailed Description
The invention will be further illustrated and described with reference to specific embodiments. The described embodiments are merely exemplary of the disclosure and are not intended to limit the scope thereof. The technical features of the embodiments of the present invention can be combined correspondingly without mutual conflict.
Talent knowledge bases typically contain multiple entity type and relationship type information, given two talent knowledge bases, G 1 =(E 1 ,T 1 R) and G 2 =(E 2 ,T 2 R), wherein E represents an entity set, and the entity types comprise expert entities and paper entities; r represents a relationship set; t represents a triple set, which is a subset of E × R × E. Set of pairs of seed entities
Figure RE-GDA0003637329340000031
Indicating for trainingA set of aligned entity pairs. The expert entity alignment task aims to obtain a model by utilizing the known entity to train information and predict potential expert alignment results
Figure RE-GDA0003637329340000032
Where equal signs represent that two expert entities point to the same individual in the real world.
The process of finding the expert entity corresponding to a given expert entity in the talent knowledge base in another talent knowledge base can be regarded as a matching problem, that is, under a certain feature space, the matching probability of the given expert entity and all candidate expert entities in another talent knowledge base is calculated, and the entity with the highest matching score is regarded as an alignment result.
As shown in FIG. 1, the present invention designs an overall frame diagram of a graph-embedded expert entity alignment method with reinforcement learning enhancement: for two talent knowledge bases, firstly generating candidate entity pairs according to names; then constructing a 2-hop isomerous subgraph for each pair of entity pairs; secondly, in a node vector updating module, in each layer of graph embedding layer, firstly, a node selector based on a self-supervision signal is used for carrying out reliable edge sampling on heterogeneous subgraphs, a reinforcement learning module is used for carrying out dynamic updating on the number of the reliable edge samples, the heterogeneous subgraphs after filtering the problem edges are embedded into a graph based on characteristic linear modulation to complete node vector updating of the current layer, and L layers of graph embedding layers are shared; and finally, taking out the final node vector representation of the expert entity pair to be aligned, performing vector point multiplication, and finishing the calculation of the matching probability by using a multilayer perceptron.
The modules (steps) of the present invention are described in detail below.
S1 candidate entity pair generation: in order to deal with the situations of Chinese and English names, such as abbreviations, polyphones, compound names and the like, the method based on the regular matching template generator is realized. FIG. 2 details the process of how the regular matching template generator generates the regular matching template.
First, Chinese and English names are processed respectively, and a name resolver is used for generating a resolving dictionary. For English names, the analysis of surnames and first names is firstly carried out, whether the sequence of surnames and first names can be determined or not is determined according to a certain rule, and then an analysis dictionary is generated. For Chinese names, the Chinese names are converted into pinyin forms, namely English names, polyphone characters and compound names are considered at the same time, intermediate analysis results are generated, and then the intermediate analysis results are processed with English names. Table 1 shows the parsing rule for converting a chinese name into an english name, table 2 shows the rule for how to determine surnames and the order of names for english/converted chinese phonetic names, and table 3 shows the parsing dictionary generation method.
TABLE 1 resolution rules for Chinese name to English
Figure RE-GDA0003637329340000041
TABLE 2 rules for determining surnames and first name order
Figure RE-GDA0003637329340000051
TABLE 3 parsing dictionary generation method
Figure RE-GDA0003637329340000052
And taking the analysis dictionary generated by the name analyzer as the input of the template generator, and outputting a corresponding regular template. Firstly, whether the order of surnames and first names can be determined is judged, whether the first names are abbreviations is judged, and finally 2 groups or 4 groups of regular templates are generated according to the two orders of 'surname first' and 'surname second'. Table 4 shows abbreviation judgment rules and table 5 shows rules for generating canonical templates from the parsing dictionary.
TABLE 4 abbreviation judging rules
Figure RE-GDA0003637329340000053
TABLE 5 rules for generating canonical templates from a parsing dictionary
Figure RE-GDA0003637329340000054
Figure RE-GDA0003637329340000061
S2 construction of an n-hop isomeric subgraph: for example, take n-2 as an example, for the pair of expert entities to be aligned<a 1 ,a′ 1 >From G respectively 1 And G 2 The first-order neighbor nodes of the first-order neighbor nodes are obtained, including published paper nodes, meeting/periodical nodes and expert nodes of co-author relations, and the paper nodes and the meeting/periodical nodes of the first-order neighbor expert nodes, so as to construct a 1 Or a' 1 Is a core subgraph. Under the condition of knowing the result of partial entity alignment (for example, through the condition that whether the conference/journal name is identical, whether the paper title is identical, and the aligned expert entity), the originally separated two subgraphs can be merged to obtain one expert entity pair to be aligned<a 1 ,a′ 1 >Is a core heterogeneous subgraph.
The S3 node vector updating module: in the step, node vector representation updating in the heterogeneous subgraph is completed through an L-layer graph embedding layer, and initial node vector representation h needs to be obtained (0) The specific flow of each layer graph embedded layer node vector updating is as shown in fig. 3, firstly the reliability of each edge is calculated by a reliability measurement method based on an automatic supervision signal, then a top-p sampling strategy is adopted, and according to the reliability obtained by calculation, a certain number p of samples is sampled from large to small for each relation r And after the sampled heterogeneous subgraphs are obtained, updating the node vectors by using a graph embedding learning algorithm based on characteristic linear modulation, finally taking out the final node vector representation of the entity pair to be aligned, and calculating the matching probability by using a multilayer perceptron. In the training phase, in order to obtain a more excellent reliability measurement method, the existing relationship side SPT is sampled and calculated by 1:1, constructing an nonexistent relationship edge SPF with the nodes of which the relationship does not exist with the target nodes, constructing an automatic supervision signal, calculating the average edge sampling loss,thereby training the weight matrix in the reliability metric method. In addition, the number of node samples p for each relationship is dynamically updated r And calculating the average unreliability of the sampled edges, and updating the sampling number by using a node sampling number updating strategy based on reinforcement learning.
Initial node vector representation h (0)
Figure RE-GDA0003637329340000071
Figure RE-GDA0003637329340000072
h struct =LINE(G)
Wherein
Figure RE-GDA0003637329340000073
Represents a vector splicing operation, h attr Average vector of word vectors obtained by skip-gram model for each attribute feature of an entity (including paper title, abstract, meeting/journal name, etc.), h struct The structural information of each entity in the knowledge base is encoded by the LINE model.
Calculating the reliability of each edge: for a triplet < u, r, v > (i.e., there is a relationship r between node u and node v, and the direction is u pointing to v), the reliability S (u, v) is calculated as follows:
firstly, the feature expression eta of the nodes on each relation is calculated:
η u ,η v =σ 1 (W τ(u) h u ),σ 1 (W τ(v) h v ))
wherein W τ(u) And W τ(v) Respectively representing the weight matrix associated with the corresponding node type with dimension R |R|×d , h u And h v Is the node vector representation of nodes u and v at this layer of the graph embedding layer, with dimension d.
Node pairs are then computed<u,v>Presence switchProbability of system r alpha u,v
α u,v =σ 2 (W TF ·(η u ⊙η v ))
Wherein W TF Has a dimension of R 2×|R| And is a vector dot product operation.
Finally, calculating the unreliability D (u, v) by using the Manhattan distance, and obtaining the reliability S (u, v):
D(u,v)=||α u,v || 1
S(u,v)=1-D(u,v)
top-p sampling strategy: for each node in the heterogeneous subgraph, for each relation r, sampling p from large to small according to the reliability obtained by calculation r An edge.
Updating node vectors by using a graph embedding learning algorithm based on characteristic linear modulation: at level l, for node v, its node vector is composed of
Figure RE-GDA0003637329340000081
Is updated to
Figure RE-GDA0003637329340000082
h v ∈R d
Figure RE-GDA0003637329340000083
Wherein W r (l) ∈R d×d Is the message conversion function weight, σ is the ReLU function,
Figure RE-GDA0003637329340000084
the message weight of each neighbor node u calculated by the node v in the l-th layer graph embedding layer is respectively:
Figure RE-GDA0003637329340000085
wherein g (h) v ;θ g,l,r ) Is a hyper network, using a sheetLayer linear layer implementation, theta g,l,r ∈R 2d×d Is a parameter of this super network.
S4 calculating the match probability: after the L-layer graph embedding layer, the updated entity pair to be aligned is taken out<e,c>Node vector of
Figure RE-GDA0003637329340000086
And
Figure RE-GDA0003637329340000087
computing match probabilities through a multi-tier perceptron
Figure RE-GDA0003637329340000088
The specific calculation process is as follows:
Figure RE-GDA0003637329340000089
where FC is a 3-layer linear layer and each layer uses a ReLU nonlinear function.
The S5 training phase updates the node sample number and calculates the loss function: (1) calculating the average unreliability of the sampled edges, and dynamically optimizing the sampling number of the nodes by using a reinforcement learning strategy; (2) and constructing an automatic supervision signal, calculating average edge sampling loss, and summing the average edge sampling loss and the alignment loss to obtain final loss.
Constructing an automatic supervision signal: for each relationship r, 60% of the existing partial relationship edges SPT in the sample map are<u,r,v>|<u,r,v>E.g., T, as a positive example, and 1:1, proportional sampling and construction of nonexistent relationship edges of nodes of which the relationship does not exist in target nodes
Figure RE-GDA00036373293400000810
As a negative example, an auto-supervision signal based on existing relationship edges is constructed.
Calculating the average edge sampling loss: minimizing average edge sampling loss for all graph embedding layers using cross entropy loss as a loss function
Figure RE-GDA00036373293400000811
Thus, the various weight coefficients used in calculating the reliability are trained:
Figure RE-GDA00036373293400000812
where ψ (u, r, v) ═ 1 represents that the side < u, r, v > is present, whereas ψ (u, r, v) ═ 0 represents that the side is absent.
Calculating the average unreliability of the sampled edge: at each epoch E, for the relation r, the sampled edge E is computed r,sampled Mean all unreliability of
Figure RE-GDA0003637329340000091
Updating the sampling number of the nodes: updating the node sampling number p of the relation r according to the variation trend between the average unreliability between two adjacent epochs r The actions, Reward and Termination in the reinforcement learning strategy are respectively as follows:
Action={+∈,-∈}
Figure RE-GDA0003637329340000092
Figure RE-GDA0003637329340000093
where e is a small fixed integer.
If the average unreliable rate of the current epoch is less than the last epoch, the reward function is positive, and the method of real-time reward is adopted to greedy increase p r Otherwise, decrease p r
In the training process, the loss of the expert entity alignment task includes the alignment loss of all the entity pairs to be aligned in addition to the average edge sampling loss of the reliability measurement method based on the self-supervision signal
Figure RE-GDA0003637329340000094
The goal of this loss is to make the vector representation of the aligned pair of expert entities pair<z e ,z c >The higher the probability obtained after passing through the multi-layer perceptron, the closer to 1, while the fraction of non-aligned vector representation pairs is close to 0:
Figure RE-GDA0003637329340000095
the final loss function of the model is the weighted sum of the alignment loss and the average edge sampling loss:
Figure RE-GDA0003637329340000096
wherein | | Θ | non-conducting phosphor 2 Regularizing a penalty term, λ, for L2 3 Is the corresponding penalty term coefficient, λ 1 And λ 2 The weighting factors for the alignment loss and the average edge sampling loss, respectively.
In the experiment, the data set OAG is used in the present embodiment, and the data set relates to the microsoft academic network and the AMiner academic network, the network scale size is shown in table 6, and the basic statistical information of the data set is shown in table 7.
TABLE 6 OAG data set network data size
Network Number of meetings/periodicals Number of papers Number of experts Type of relationship
AMiner 69,397 172,209,563 113,171,945 co-author,write,publish_on
MAG 52,678 208,915,369 253,144,301 co-author,write,publish_on
TABLE 7 basic statistics of OAG data sets
Figure RE-GDA0003637329340000097
Figure RE-GDA0003637329340000101
Evaluation indexes are as follows: precision (Precision), Recall (Recall) and F1 values (F1-score) were used as evaluation indices. Precision is the proportion of correct prediction in the positive sample predicted by the assessment classifier, and the larger the value of Precision is, the higher the prediction accuracy of the representative model on the positive sample is. Recall is the proportion of all positive samples predicted to be correct by the evaluation classifier, and the larger the value of Recall, the higher the probability that the model predicts the correct positive samples. F1-score comprehensive assessment Precision and Recall, the higher the value, the better the model overall prediction performance.
Setting the hyper-parameters: in the experiment, the epoch is 100, the size of each batch is 64, and the dimension d of the vector representation of each node is 300The optimizer selects Adam, the learning rate of the graph embedding learning algorithm based on the characteristic linear modulation is 0.005, and the learning rate of the reliability measurement method based on the self-supervision signal is 0.001; initial number of samples p per relationship r 30, and each action update in the reinforcement learning update strategy belongs to 2; the graph embedding layer number L in the node vector updating module is 3, and the loss coefficient lambda in the loss function 1 ,λ 2 ,λ 3 Respectively 2, 1 and 0.001. The dimension of the weight coefficient of 3 linear layers in the multilayer perceptron is R in sequence 3d×d 、R dX3d 、R d×2
The present example method is compared to 5 methods: (1) exact Name Match, if the entity to be aligned is completely matched with the Name, the two entities to be aligned are considered to be matched; (2) SVM, character level 4-gram similarity using attributes of expert names, affiliates, frequently occurring meetings/journals, paper title keywords and coauthor names; (3) COSNET, regarding the attributes of two entities as local factors, regarding the relationship between entity pairs as correlation factors, regarding that two aligned entities correspondingly have the same label, and using a factor graph model to transmit the label information; (4) MEgo2Vec, mining potential matching entity pairs as nodes of a matching ego network, modeling literal semantic features of different attributes in a multi-view node embedding mode, distinguishing the influence of different neighbor nodes by using an attention mechanism, and adding a topological structure of a graph to obtain a regularized structure embedding; (5) LinKG also utilizes potentially matching entity pair information, however, the constructed network nodes are still entities, not entity pairs, and the node vector updating based on the attention network of the relational graph is completed by utilizing a multi-head attention mechanism based on node types to calculate attention weights. Two ablation experiments were designed simultaneously: (1) the method is not reliable, and only a graph embedding learning algorithm based on characteristic linear modulation is used; (2) the method is not reinforcement learning, and removes a reinforcement learning dynamic updating strategy on the basis of the method; and respectively verifying the contribution of a graph embedding learning algorithm based on characteristic linear modulation, a reliability measurement method based on an automatic supervision signal and a node sampling quantity updating strategy based on reinforcement learning to the model. Table 8 shows the performance of the example method of this embodiment compared to other comparative methods on the OAG dataset.
TABLE 8 comparative experimental results for each method on OAG data set
Model (model) Precision Recall F1-score
Exact Name Match 44.48 80.63 57.33
SVM 84.70 92.22 88.30
COSNET 91.73 85.33 88.42
MEgo2Vec 91.03 90.82 90.92
LinKG 95.37 93.48 94.42
Method-no reliability 96.74 96.86 96.80
Method-no reinforcement learning 97.18 96.82 97.00
Method for producing a composite material 96.58 97.84 97.20
The experimental result shows that the method is obviously superior to 5 comparison methods in three evaluation indexes, and the F1 value is improved by at least 2.78%. Through the method, the experimental result of no reliability is that the F1 value is improved by at least 2.38% compared with the 5 comparison methods, and the effectiveness of the characteristic linear modulation-based graph embedding learning algorithm is proved. Compared with the experimental result of the method without reinforcement learning and the method without reliability, the method has the advantages that the F1 value is improved by 0.20%, and the effectiveness of the reliability measurement method based on the self-supervision signal is verified. Compared with the experimental result of the method without reinforcement learning, the method has the advantages that the F1 value is improved by 0.20%, and the effectiveness of the node sampling quantity updating strategy based on reinforcement learning is verified.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It will be apparent to those skilled in the art that various changes and modifications can be made without departing from the spirit and scope of the invention.

Claims (8)

1. A graph embedding expert entity alignment method based on reinforcement learning enhancement is characterized by comprising the following steps:
step 1: data G of two talent knowledge bases is obtained 1 =(E 1 ,T 1 R) and G 2 =(E 2 ,T 2 R), wherein E represents an entity set, and the entity types comprise expert entities and paper entities; r represents a relationship set; t represents a triple set, which is a subset of E multiplied by R multiplied by E;
step 2: for each expert entity e in a talent knowledge base, generating a candidate expert set C from another talent knowledge base through a candidate entity pair generation module based on a regular matching template generator method according to the expert names;
and step 3: for each candidate expert C ∈ C, construct about entity pairs<e,c>The 2-hop isomeric diagram HG ═ (V, H, T, R), where
Figure FDA0003478018730000011
H 1 、H 2 Respectively representing entity initial vector sets of two talent knowledge bases; initial node vector representation h (0)
Figure FDA0003478018730000012
Figure FDA0003478018730000013
h struct =LINE(G)
Wherein
Figure FDA0003478018730000014
Represents a vector splicing operation, h attr Average vector of word vectors obtained for each attribute feature of an entity through a skip-gram model, h struct The structure vector is obtained by coding the structure information of each entity in the knowledge base through a LINE model;
and 4, step 4: in a node vector updating module, in each layer of graph embedding layer, calculating the reliability of each edge by using a reliability measurement method based on an automatic supervision signal;
and 5: in each layer of graph embedding layer, for the heterogeneous subgraphs HG, a top-p sampling strategy is used, and according to the reliability calculated in the step 4, p is sampled for each relation from large to small r The method comprises the steps of cutting reliable edges, and updating the node sampling number by using a node sampling number updating strategy based on reinforcement learning;
and 6: in each layer of graph embedding layer, after obtaining the sampled heterogeneous subgraph, updating the node vector by using a graph embedding learning algorithm based on characteristic linear modulation;
and 7: after the L-layer graph embedding layer, the updated entity pair to be aligned is taken out<e,c>Node vector of
Figure FDA0003478018730000015
And
Figure FDA0003478018730000016
computing match probabilities through a multi-tier perceptron
Figure FDA0003478018730000017
And 8: and according to the matching probabilities of all candidate entities, taking the candidate expert with the highest probability as the matching expert.
2. The graph embedding expert entity alignment method based on reinforcement learning enhancement as claimed in claim 1, wherein the method is trained by a training set before application, and is applied to expert entity alignment after training is completed;
in the course of the training process,the loss of the expert entity alignment task includes the alignment loss of all the entity pairs to be aligned, in addition to the average edge sampling loss of the reliability measurement method based on the self-supervision signal
Figure FDA0003478018730000021
The goal of this loss is to make the vector representation of the aligned pair of expert entities pair<z e ,z c >The higher the probability obtained after passing through the multi-layered perceptron, the closer to 1, while the non-aligned vectors represent a score of the pair close to 0.
3. The reinforcement learning enhancement based graph embedding expert entity alignment method according to claim 1, wherein the step 2 specifically comprises the following steps:
step 2-1: processing Chinese and English names respectively, and generating an analytic dictionary by using a name analyzer;
step 2-2: and taking the analysis dictionary generated by the name analyzer as the input of the template generator, and outputting a corresponding regular template.
4. The method for aligning the graph embedding expert entities based on reinforcement learning enhancement as claimed in claim 1, wherein in the step 2-1, for an English name, the analysis of the surname and the first name is performed first, whether the sequence of the surname and the first name is unique is judged, then an analysis dictionary is generated, for a Chinese name, the Chinese name is converted into a Pinyin form, namely, the English name, and meanwhile, the polyphone and the compound surname are considered, and the intermediate analysis result is generated and then processed with the English name.
5. The reinforcement learning enhancement-based graph embedding expert entity alignment method according to claim 1, wherein in step 2-2, first, whether the order of surnames and first names can be determined is determined, then, whether the first names are abbreviations is judged, and finally, 2 groups or 4 groups of regular templates are generated according to two orders of "first surname" and "last" respectively.
6. The method according to claim 1, wherein in the step 4, for the triplet < u, r, v >, that is, the relationship r exists between the node u and the node v, and the direction u points to v, the calculation process of the reliability S (u, v) specifically includes the following steps:
step 4-1: firstly, the feature expression eta of the nodes on each relation is calculated:
η uv =σ 1 (W τ(u) h u ),σ 1 (W τ(v) h v ))
wherein W τ(u) And W τ(v) Respectively representing the weight matrix associated with the corresponding node type with dimension R |R|×d ,h u And h v Is the node vector representation of the nodes u and v at the embedding layer of the layer diagram, and the dimension is d, sigma 1 (. represents an activation function;
step 4-2: computing node pairs<u,v>Probability a of existence of relation r u,v
α u,v =σ 2 (W TF ·(η u ⊙η v ))
Wherein W TF Has a dimension of R 2×|R| As a vector dot product operation, σ 2 (. represents an activation function;
step 4-3: calculating the unreliability D (u, v) using the manhattan distance, and obtaining the reliability S (u, v):
D(u,v)=||α u,v || 1
S(u,v)=1-D(u,v)
step 4-4: in the training phase, in order to make the proposed reliability measurement method more effective in calculating the reliability of the edge, the reliability measurement method based on the self-supervision signal samples the partial relationship edge SPT (spherical segment) existing in the graph<u,r,v>|<u,r,v>E.g. T, as a positive example, sampling by a proportion of 1:1, and constructing a nonexistent relationship edge with a node of the target node, which does not have the relationship
Figure FDA0003478018730000031
As a negative example, the construction is based on the existing relationsEdge-tied self-supervision signal and using cross-entropy loss as a loss function to minimize the average edge sampling loss across all graph embedding layers
Figure FDA0003478018730000032
Thus, the various weight coefficients used in calculating the reliability are trained:
Figure FDA0003478018730000033
where ψ (u, r, v) ═ 1 represents that the side < u, r, v > is present, whereas ψ (u, r, v) ═ 0 represents that the side is absent.
7. The graph-embedded expert entity alignment method based on reinforcement learning enhancement as claimed in claim 1, wherein in the step 5), the reinforcement learning based node sampling number update strategy specifically comprises the following steps:
step 5-1: at each epoche, for the relation r, the sampled edge E is computed r,sampled Mean all unreliability of
Figure FDA0003478018730000034
Wherein | E |) r,sampled The number of edges with the relation of r among the sampled edges is referred to;
step 5-2: updating the node sampling number p of the relation r according to the variation trend between the average unreliability between two adjacent epochs r The actions, Reward and Termination in the reinforcement learning strategy are respectively as follows:
Action={+∈,-∈}
Figure FDA0003478018730000035
Figure FDA0003478018730000036
wherein epsilon is a smaller fixed integer; if the average unreliability of the current epoch is less than the previous epoch, the reward function is positive, and the method of real-time reward is adopted to greedy increase p r Otherwise, decrease p r
8. The reinforcement learning enhancement based graph embedding expert entity alignment method according to claim 1, wherein the step 6 is specifically:
at level l, for node v, its node vector is composed of
Figure FDA0003478018730000041
Is updated to
Figure FDA0003478018730000042
And is
Figure FDA0003478018730000043
h v ∈R d
Figure FDA0003478018730000044
Wherein W r (l) ∈R d×d Is the message conversion function weight, sigma is the ReLU function,
Figure FDA0003478018730000045
the message weight of each neighbor node u calculated by the node v in the l-th layer graph embedding layer is respectively:
Figure FDA0003478018730000046
wherein g (h) v ;θ g,l,r ) Is a super-network, implemented using a single linear layer, theta g,l,r ∈R 2d×d Is a parameter of this super network.
CN202210060387.4A 2022-01-19 2022-01-19 Graph embedding expert entity alignment method based on reinforcement learning enhancement Pending CN114819152A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210060387.4A CN114819152A (en) 2022-01-19 2022-01-19 Graph embedding expert entity alignment method based on reinforcement learning enhancement

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210060387.4A CN114819152A (en) 2022-01-19 2022-01-19 Graph embedding expert entity alignment method based on reinforcement learning enhancement

Publications (1)

Publication Number Publication Date
CN114819152A true CN114819152A (en) 2022-07-29

Family

ID=82527359

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210060387.4A Pending CN114819152A (en) 2022-01-19 2022-01-19 Graph embedding expert entity alignment method based on reinforcement learning enhancement

Country Status (1)

Country Link
CN (1) CN114819152A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115860135A (en) * 2022-11-16 2023-03-28 中国人民解放军总医院 Method, apparatus, and medium for solving heterogeneous federated learning using a super network

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115860135A (en) * 2022-11-16 2023-03-28 中国人民解放军总医院 Method, apparatus, and medium for solving heterogeneous federated learning using a super network
CN115860135B (en) * 2022-11-16 2023-08-01 中国人民解放军总医院 Heterogeneous federation learning method, equipment and medium based on super network

Similar Documents

Publication Publication Date Title
WO2023065545A1 (en) Risk prediction method and apparatus, and device and storage medium
CN110674850A (en) Image description generation method based on attention mechanism
CN109389151A (en) A kind of knowledge mapping treating method and apparatus indicating model based on semi-supervised insertion
CN107590139B (en) Knowledge graph representation learning method based on cyclic matrix translation
CN113407759A (en) Multi-modal entity alignment method based on adaptive feature fusion
CN112256866B (en) Text fine-grained emotion analysis algorithm based on deep learning
CN108765383A (en) Video presentation method based on depth migration study
CN110569355B (en) Viewpoint target extraction and target emotion classification combined method and system based on word blocks
CN114332519A (en) Image description generation method based on external triple and abstract relation
CN109086463A (en) A kind of Ask-Answer Community label recommendation method based on region convolutional neural networks
CN111813888A (en) Training target model
CN113326884A (en) Efficient learning method and device for large-scale abnormal graph node representation
CN113869424A (en) Semi-supervised node classification method based on two-channel graph convolutional network
CN115203550A (en) Social recommendation method and system for enhancing neighbor relation
CN110083676B (en) Short text-based field dynamic tracking method
CN117574915A (en) Public data platform based on multiparty data sources and data analysis method thereof
CN114819152A (en) Graph embedding expert entity alignment method based on reinforcement learning enhancement
Liu et al. Dynamic updating of the knowledge base for a large-scale question answering system
CN114722896B (en) News topic discovery method fusing neighbor heading diagrams
Chen et al. AutoADR: Automatic model design for ad relevance
CN113297385B (en) Multi-label text classification system and method based on improved GraphRNN
CN113159976B (en) Identification method for important users of microblog network
US11544332B2 (en) Bipartite graph construction
CN117743694B (en) Multi-layer transfer learning cross-domain recommendation method and system based on graph node characteristic enhancement
CN118395983B (en) Cold start homonymous disambiguation method based on multitask learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination