CN114625881A - Economic field knowledge graph completion algorithm based on graph attention machine mechanism - Google Patents
Economic field knowledge graph completion algorithm based on graph attention machine mechanism Download PDFInfo
- Publication number
- CN114625881A CN114625881A CN202111471322.0A CN202111471322A CN114625881A CN 114625881 A CN114625881 A CN 114625881A CN 202111471322 A CN202111471322 A CN 202111471322A CN 114625881 A CN114625881 A CN 114625881A
- Authority
- CN
- China
- Prior art keywords
- entity
- erp
- embedding
- gat
- equation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000007246 mechanism Effects 0.000 title claims abstract description 17
- 239000011159 matrix material Substances 0.000 claims abstract description 19
- 230000007704 transition Effects 0.000 claims abstract description 3
- 239000013598 vector Substances 0.000 claims description 24
- 238000000034 method Methods 0.000 claims description 16
- 230000006870 function Effects 0.000 claims description 15
- 230000009466 transformation Effects 0.000 claims description 14
- 230000008569 process Effects 0.000 claims description 5
- 238000012935 Averaging Methods 0.000 claims description 2
- 230000006399 behavior Effects 0.000 claims description 2
- 238000007306 functionalization reaction Methods 0.000 claims description 2
- 238000010606 normalization Methods 0.000 claims description 2
- 230000000694 effects Effects 0.000 abstract description 2
- 238000013527 convolutional neural network Methods 0.000 description 9
- 238000004458 analytical method Methods 0.000 description 5
- 238000002474 experimental method Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 241000238413 Octopus Species 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000012856 packing Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Development Economics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Animal Behavior & Ethology (AREA)
- Accounting & Taxation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Economics (AREA)
- Finance (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- Technology Law (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides an ERP-GAT-based economic field knowledge graph completion algorithm. The ERP-GAT algorithm adopts an encoder-decoder structure, an image attention machine mechanism is introduced into an encoder, an entity embedding matrix and a relation embedding matrix are input, the attention score of adjacent triples of each target entity is calculated, the embedding matrix is updated, the multi-hop relation around a given entity or node can be obtained, rich semantic information and roles played in the relation near the given entity can be obtained, the relation group with similar semantically existing knowledge is consolidated, a ConvKB model is used by a decoder, a score function is obtained by a convolutional layer to analyze the global embedding characteristic in each dimension, and the transition characteristic in the ERP-GAT model is summarized. And finally, compared with other existing algorithms, the five indexes of the standard data set FB15K237 and the four indexes of NELL-995 are remarkably improved, and the best effect of a knowledge graph completion task is obtained.
Description
Technical Field
The invention belongs to the field of natural language processing.
Background
The mainstream method of the knowledge graph completion algorithm is to deduce new entities, relationships, rules and knowledge from the existing entities, relationships, rules and knowledge and predict whether a given triplet is effective, a traditional knowledge embedding-based model such as a Convolutional Neural Network (CNN) can learn more high-quality embedding due to the parameter efficiency and the characteristic that the complex relationship can be considered, however, the CNN can only consider each triplet independently, ignore rich semantic information and potential relationships near the given entity in the knowledge graph, and does not consider the relationship between triplets. The R-GCN is able to collect information from the neighbors of a given entity, convolving the neighbors of each entity, however the weights assigned to the neighbors by the R-GCN are equal, there is a bottleneck in processing directed graphs, and the problem of dynamic graphs cannot be handled.
Disclosure of Invention
The invention provides an economic field knowledge graph completion algorithm based on a graph attention machine mechanism. The contents are as follows:
(1) an ERP-GAT benchmark algorithm and an improved algorithm are given firstly, and a corresponding overall frame diagram is given.
(2) The baseline model and the modified ERP-GAT model were then tested on two sets of public data (FB15K-237 and NELL-995).
(3) Finally, the effectiveness of the ERP-GAT algorithm is verified through experimental analysis, and experimental results show that the ERP-GAT algorithm effectively improves MR, MRR and Hits @ N indexes in the relation prediction task result.
Drawings
FIG. 1 is an overall block diagram of the algorithm of the present invention.
Fig. 2 is a diagram attention mechanism layer network structure of the present invention.
FIG. 3 is a process for calculating attention values for a triplet of interest using the model of the present invention.
FIG. 4 is a data set presentation used in the algorithmic experiments of the present invention.
FIG. 5 is a graph of the relationship prediction results on the NELL995 data set in accordance with the present invention.
FIG. 6 is a graph of the results of the relational prediction on the FB15 dataset of the present invention.
Detailed Description
The mainstream method of the knowledge graph completion algorithm is to deduce new entities, relationships, rules and knowledge from the existing entities, relationships, rules and knowledge and predict whether a given triple is effective, a traditional knowledge embedding-based model such as a Convolutional Neural Network (CNN) can learn more high-quality embedding due to the parameter efficiency and the characteristic that the complex relationship can be considered, however, the CNN can only independently process each triple, and the abundant semantic information and potential relationship near the given entity in the knowledge graph are ignored. The R-GCN is able to collect information from the neighbors of a given entity, convolving the neighbors of each entity, however the weights assigned to the neighbors by the R-GCN are equal, there is a bottleneck in processing directed graphs, and the problem of dynamic graphs cannot be handled. In the existing method, knowledge graph embedding is learned by using entity characteristics, or characteristics of entities and relations are processed in a non-connected mode, and the ERP-GAT algorithm provided by the method can comprehensively capture semantic similarity relations of single-hop and multi-hop neighbors of any given entity in the knowledge graph.
The idea of the algorithm will be described below, and specific steps of the algorithm will be given.
Firstly, the problems which are not completely solved in the relation prediction algorithm based on CNN and GCN are briefly analyzed, and accordingly, a solution is proposed and a design framework of an ERP-GAT algorithm is introduced (shown in a figure 1); then, the detailed description of ERP-GAT includes obtaining the multi-hop relation around the given entity or node, obtaining rich semantic information near the given entity and the role played in the relation, consolidating the relation group with similar semanteme of the prior knowledge, etc.; finally, experiments and results analyses were performed on the reference models TransE, ConvKB, R-GCN, etc. and the modified ERP-GAT model on two sets of public data sets (FB15K-237 and NELL-995), specifically comparing the three aspects of MR, MRR, Hits @ N, etc. The effectiveness of the ERP-GAT algorithm is verified through experimental analysis, and experimental results show that the ERP-GAT algorithm effectively improves MR, MRR and Hits @ N indexes in a relation prediction task result.
In fig. 1, a knowledge graph completion method (ERP-GAT) based on a graph attention mechanism firstly inputs two embedded matrices, then enters a graph attention mechanism layer, calculates attention scores of all triples adjacent to a given target entity, then updates the embedded matrices, trains a loss function after passing through the graph attention mechanism layer, then enters a decoder part, trains the loss function after passing through a CNN layer, and finally obtains a knowledge graph completion result.
The method comprises the following specific steps:
the method comprises the following steps: calculating the attention score of the adjacent triplets of each target entity
In the knowledge graph, the entity plays different roles according to the relationship in the current triple, for example, in the triple { liuqiang, chief executive officer, kyoto } and the triple { liuqiang, husband, and octopus }, the entity "liuqiangdong" appears in two different triples, and due to the different relationship, plays two roles of "chief executive officer" and "husband", respectively. To cope with this phenomenon, the ERP-GAT algorithm uses a graphical attention mechanism layer. The figure notes that the mechanism layer takes as input two embedded matrices, where,representing an entity embedding matrix, wherein the ith behavior entity eiEmbedded vector of, NeRepresenting the total number of entities, and T representing the feature dimension of each entity embedding vector;represents a relational embedding matrix, where NrRepresenting the total number of relationships and P representing the feature dimension of each relationship embedding vector. The drawing attention mechanism layer takes two corresponding embedded matrixes as output, and the two embedded matrixes are respectivelyAnd
to obtain an entity eiThe ERP-GAT model performs linear transformation on entity and relation characteristic vector of the triple connected with the target entityTo learn with entity eiEach triplet of an association. As shown in equation 1:
whereinIs a tripletA vector representation of (a);respectively representing entity embedding vectors ei、ejAnd relation embedding vector rk;W1A linear transformation matrix is represented.
To measure the importance of triples associated with a target entity, the ERP-GAT model uses αijkTo define the importance of the triples, using a linear transformation matrix W2To pairAfter linear transformation, the attention score is obtained through a LeakyReLU function, and normalization is performed through a softmax function, as shown in formula 2:
step two: updating an embedded matrix
In order to solve the problems that CNN can only independently consider each triple, ignore rich semantic information and potential relation near a given entity in a knowledge graph and do not consider the relation between the triples, an ERP-GAT model uses a multi-head attention mechanism to stabilize a learning process, encapsulates more information in the triples near the given entity, respectively calculates updated embedded vectors for M triples, then connects the embedded vectors in series to finally obtain the updated entity embedded vectorAs shown in equation 3:
specifically, in the last layer of the graph attention machine layer, the ERP-GAT model performs weighted averaging on a plurality of embedded vectors to obtain a final output entity embedded vector, as shown in equation 4:
using a linear transformation matrix W for a relational embedding matrixR∈RT×T′Linear transformation is performed, where T' is the dimension of the output relationship embedding matrix, as shown in equation 5:
G′=GWR (5)
the model was trained using hinge loss as a loss function, as shown in equation 6:
wherein γ >0 is an edge hyper-parameter; s represents the correct triplet set and S' represents the incorrect triplet set.
Step three: decoder
The ERP-GAT model uses ConvKB as a decoder, uses convolutional layer scoring functions to analyze global embedding characteristics in each dimension and generalizes transition characteristics in the ERP-GAT model, and makes tripletsAfter passing through the convolution filter, the output of each convolution filter is connected in series through a ReLU function, and finally linear transformation is carried outAs shown in equation 7:
where Ω denotes the number of convolution filters, ωmRepresenting the mth convolution filter.
The decoder is trained using the soft boundary loss as a loss function, as shown in equation 8:
step four: results and analysis of the experiments
(1) Experimental data set
To verify the effectiveness of the algorithm presented herein, the common standard data set, FB15K237 and NELL-995, widely used by researchers in the field of knowledge-graph complementation, were used herein. The specific information of the data set is shown in fig. 4.
(2) Evaluation index
Knowledge profiles are used herein to represent the commonly used indicators for learning studies MR, MRR, Hit @1, Hit @3, Hit @ 10. Wherein higher values of MRR and hit @ N indicate better prediction, and lower values of MR indicate better prediction. Where MRR represents the average of the reciprocals of the correct entity score ranks in the multiple triplet sets Q. As shown in equation 9:
wherein, rankiThe relational prediction ranking of the ith triplet is represented.
The MR calculation formula is shown in equation 10:
hits @ N refers to the average proportion of triples with a ranking no higher than N in the relational prediction, and the calculation formula is shown in formula 11:
where II (-) represents the indicator function, 1 if the condition is true, and 0 otherwise, N is set to 1, 3, 10 for evaluation.
(3) Experimental setup
The experimental goal herein is to complement the triplet { e 'of the head entity replaced by all valid entities'i,rk,ejTriple of { e } or tail entitiesi,rk,e′jAnd then packing the replaced invalid triples and the unique valid triples before replacement into a set, and evaluating all the triples in the set.
(4) Analysis of results
The ERP-GAT model algorithm training and result testing are completed by performing experiments on the standard data sets FB15K-237 and NELL-995, and the obtained experimental results are shown in FIG. 5 and FIG. 6. The result shows that five indexes of the FB15K-237 data set and four indexes of the NELL-995 data set of the ERP-GAT algorithm are obviously improved compared with other existing algorithms, and the best effect of the knowledge graph completion task is achieved.
Claims (4)
1. An economic field knowledge graph completion algorithm based on a graph attention mechanism comprises the following steps:
the method comprises the following steps: calculating the attention score of the adjacent triplets of each target entity
In the knowledge graph, the entity plays different roles according to the relationship in the current triplet, for example, in the triplet { liuqiang, chief executive officer, kyoto } and the triplet { liuqiang, husband, and toshiba }, the entity "liuqiangdong" appears in two different triplets, and due to the different relationship, plays two roles of chief executive officer and husband, respectively, in order to cope with this phenomenon, the ERP-GAT algorithm uses an image attention machine system layer which takes two embedded matrices as input, wherein,representing an entity embedding matrix, wherein the ith behavior entity eiEmbedded vector of, NeRepresenting the total number of entities, and T representing the feature dimension of each entity embedding vector;represents a relational embedding matrix, where NrRepresenting the total number of relationships, P representing the feature dimension of each relationship embedded vector, and the graph attention mechanism layer takes two corresponding embedded matrixes as output, namelyAnd
to obtain an entity eiThe ERP-GAT model is based on the entity and the closing of the triples connected with the target entityLinear transformation of the characteristic vectorTo learn with entity eiEach triplet associated, as shown in equation 1:
whereinIs a tripletA vector representation of (a);respectively representing entity embedding vectors ei、ejAnd relation embedding vector rk;W1A linear transformation matrix is represented.
To measure the importance of triples associated with a target entity, the ERP-GAT model uses αijkTo define the importance of the triples, using a linear transformation matrix W2For is toAfter linear transformation, the attention score is obtained through a LeakyReLU function, and normalization is performed through a softmax function, as shown in formula 2:
step two: updating an embedded matrix
In order to solve the problem that the CNN can only independently consider each triple, ignore rich semantic information and potential relations near a given entity in the knowledge graph, and do not consider the relations among the triples,the ERP-GAT model uses a multi-head attention mechanism to stabilize the learning process, encapsulates more information in triplets near a given entity, respectively calculates updated embedding vectors for M triplets, and then connects the embedding vectors in series to finally obtain updated entity embedding vectorsAs shown in equation 3:
specifically, in the last layer of the graph attention machine layer, the ERP-GAT model performs weighted averaging on a plurality of embedded vectors to obtain a final output entity embedded vector, as shown in equation 4:
using a linear transformation matrix W for a relational embedding matrixR∈RT×T′Linear transformation is performed, where T' is the dimension of the output relationship embedding matrix, as shown in equation 5:
G'=GWR#(5)
the model was trained using hinge loss as a loss function, as shown in equation 6:
wherein γ >0 is an edge hyper-parameter; s represents the correct triplet set and S' represents the incorrect triplet set.
Step three: decoder
The ERP-GAT model uses ConvKB as a decoder, uses convolutional layer scoring functions to analyze global embedding characteristics in each dimension and generalizes transition characteristics in the ERP-GAT model, and makes tripletsAfter passing through the convolution filter, the output of each convolution filter is connected in series through a ReLU function, and finally linear transformation is carried outAs shown in equation 7:
where Ω denotes the number of convolution filters, ωmRepresenting the mth convolution filter.
The decoder is trained using the soft boundary loss as a loss function, as shown in equation 8:
2. the method of claim 1, wherein an attention score mechanism is used in step 1 to calculate and measure the importance of triples associated with a given entity.
3. The method of claim 1, wherein step 2 uses a multi-attention mechanism to stabilize the learning process and to encapsulate more information in triples around a given entity.
4. The method of claim 1, wherein ConvKB is used as the decoder in step 3.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111471322.0A CN114625881A (en) | 2021-12-04 | 2021-12-04 | Economic field knowledge graph completion algorithm based on graph attention machine mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111471322.0A CN114625881A (en) | 2021-12-04 | 2021-12-04 | Economic field knowledge graph completion algorithm based on graph attention machine mechanism |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114625881A true CN114625881A (en) | 2022-06-14 |
Family
ID=81898858
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111471322.0A Pending CN114625881A (en) | 2021-12-04 | 2021-12-04 | Economic field knowledge graph completion algorithm based on graph attention machine mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114625881A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116306936A (en) * | 2022-11-24 | 2023-06-23 | 北京建筑大学 | Knowledge graph embedding method and model based on hierarchical relation rotation and entity rotation |
CN116629356A (en) * | 2023-05-09 | 2023-08-22 | 华中师范大学 | Encoder and Gaussian mixture model-based small-sample knowledge graph completion method |
CN117540799A (en) * | 2023-10-20 | 2024-02-09 | 上海歆广数据科技有限公司 | Individual case map creation and generation method and system |
-
2021
- 2021-12-04 CN CN202111471322.0A patent/CN114625881A/en active Pending
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116306936A (en) * | 2022-11-24 | 2023-06-23 | 北京建筑大学 | Knowledge graph embedding method and model based on hierarchical relation rotation and entity rotation |
CN116629356A (en) * | 2023-05-09 | 2023-08-22 | 华中师范大学 | Encoder and Gaussian mixture model-based small-sample knowledge graph completion method |
CN116629356B (en) * | 2023-05-09 | 2024-01-26 | 华中师范大学 | Encoder and Gaussian mixture model-based small-sample knowledge graph completion method |
CN117540799A (en) * | 2023-10-20 | 2024-02-09 | 上海歆广数据科技有限公司 | Individual case map creation and generation method and system |
CN117540799B (en) * | 2023-10-20 | 2024-04-09 | 上海歆广数据科技有限公司 | Individual case map creation and generation method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114625881A (en) | Economic field knowledge graph completion algorithm based on graph attention machine mechanism | |
CN108334948B (en) | Mechanical bearing fault diagnosis technology based on wide residual error network learning model | |
CN109620152B (en) | MutifacolLoss-densenert-based electrocardiosignal classification method | |
CN109034194B (en) | Transaction fraud behavior deep detection method based on feature differentiation | |
CN110097755A (en) | Freeway traffic flow amount state identification method based on deep neural network | |
CN107770517A (en) | Full reference image quality appraisement method based on image fault type | |
CN112087447B (en) | Rare attack-oriented network intrusion detection method | |
CN111178319A (en) | Video behavior identification method based on compression reward and punishment mechanism | |
CN114841257A (en) | Small sample target detection method based on self-supervision contrast constraint | |
CN110070116B (en) | Segmented selection integration image classification method based on deep tree training strategy | |
CN116153495A (en) | Prognosis survival prediction method for immunotherapy of esophageal cancer patient | |
CN109214298A (en) | A kind of Asia women face value Rating Model method based on depth convolutional network | |
CN109034062A (en) | A kind of Weakly supervised anomaly detection method based on temporal consistency | |
CN110826702A (en) | Abnormal event detection method for multitask deep network | |
CN114913379B (en) | Remote sensing image small sample scene classification method based on multitasking dynamic contrast learning | |
CN108596044B (en) | Pedestrian detection method based on deep convolutional neural network | |
CN112668809A (en) | Method for establishing autism child rehabilitation effect prediction model and method and system for predicting autism child rehabilitation effect | |
CN113344077A (en) | Anti-noise solanaceae disease identification method based on convolution capsule network structure | |
CN112651360A (en) | Skeleton action recognition method under small sample | |
CN114722216A (en) | Entity alignment method based on Chinese electronic medical record knowledge graph | |
CN114169504B (en) | Self-adaptive filtering-based graph convolution neural network pooling method | |
CN116527346A (en) | Threat node perception method based on deep learning graph neural network theory | |
CN115269861A (en) | Reinforced learning knowledge graph reasoning method based on generative confrontation and imitation learning | |
CN103281555B (en) | Half reference assessment-based quality of experience (QoE) objective assessment method for video streaming service | |
CN112861443B (en) | Advanced learning fault diagnosis method integrated with priori knowledge |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication |