CN114625881A - Economic field knowledge graph completion algorithm based on graph attention machine mechanism - Google Patents

Economic field knowledge graph completion algorithm based on graph attention machine mechanism Download PDF

Info

Publication number
CN114625881A
CN114625881A CN202111471322.0A CN202111471322A CN114625881A CN 114625881 A CN114625881 A CN 114625881A CN 202111471322 A CN202111471322 A CN 202111471322A CN 114625881 A CN114625881 A CN 114625881A
Authority
CN
China
Prior art keywords
entity
erp
embedding
gat
equation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111471322.0A
Other languages
Chinese (zh)
Inventor
贾海涛
邢增桓
高源�
李家伟
林思远
王树臣
梁晓程
许文波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yangtze River Delta Research Institute of UESTC Huzhou
Original Assignee
Yangtze River Delta Research Institute of UESTC Huzhou
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yangtze River Delta Research Institute of UESTC Huzhou filed Critical Yangtze River Delta Research Institute of UESTC Huzhou
Priority to CN202111471322.0A priority Critical patent/CN114625881A/en
Publication of CN114625881A publication Critical patent/CN114625881A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Development Economics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Animal Behavior & Ethology (AREA)
  • Accounting & Taxation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Economics (AREA)
  • Finance (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides an ERP-GAT-based economic field knowledge graph completion algorithm. The ERP-GAT algorithm adopts an encoder-decoder structure, an image attention machine mechanism is introduced into an encoder, an entity embedding matrix and a relation embedding matrix are input, the attention score of adjacent triples of each target entity is calculated, the embedding matrix is updated, the multi-hop relation around a given entity or node can be obtained, rich semantic information and roles played in the relation near the given entity can be obtained, the relation group with similar semantically existing knowledge is consolidated, a ConvKB model is used by a decoder, a score function is obtained by a convolutional layer to analyze the global embedding characteristic in each dimension, and the transition characteristic in the ERP-GAT model is summarized. And finally, compared with other existing algorithms, the five indexes of the standard data set FB15K237 and the four indexes of NELL-995 are remarkably improved, and the best effect of a knowledge graph completion task is obtained.

Description

Economic field knowledge graph completion algorithm based on graph attention mechanism
Technical Field
The invention belongs to the field of natural language processing.
Background
The mainstream method of the knowledge graph completion algorithm is to deduce new entities, relationships, rules and knowledge from the existing entities, relationships, rules and knowledge and predict whether a given triplet is effective, a traditional knowledge embedding-based model such as a Convolutional Neural Network (CNN) can learn more high-quality embedding due to the parameter efficiency and the characteristic that the complex relationship can be considered, however, the CNN can only consider each triplet independently, ignore rich semantic information and potential relationships near the given entity in the knowledge graph, and does not consider the relationship between triplets. The R-GCN is able to collect information from the neighbors of a given entity, convolving the neighbors of each entity, however the weights assigned to the neighbors by the R-GCN are equal, there is a bottleneck in processing directed graphs, and the problem of dynamic graphs cannot be handled.
Disclosure of Invention
The invention provides an economic field knowledge graph completion algorithm based on a graph attention machine mechanism. The contents are as follows:
(1) an ERP-GAT benchmark algorithm and an improved algorithm are given firstly, and a corresponding overall frame diagram is given.
(2) The baseline model and the modified ERP-GAT model were then tested on two sets of public data (FB15K-237 and NELL-995).
(3) Finally, the effectiveness of the ERP-GAT algorithm is verified through experimental analysis, and experimental results show that the ERP-GAT algorithm effectively improves MR, MRR and Hits @ N indexes in the relation prediction task result.
Drawings
FIG. 1 is an overall block diagram of the algorithm of the present invention.
Fig. 2 is a diagram attention mechanism layer network structure of the present invention.
FIG. 3 is a process for calculating attention values for a triplet of interest using the model of the present invention.
FIG. 4 is a data set presentation used in the algorithmic experiments of the present invention.
FIG. 5 is a graph of the relationship prediction results on the NELL995 data set in accordance with the present invention.
FIG. 6 is a graph of the results of the relational prediction on the FB15 dataset of the present invention.
Detailed Description
The mainstream method of the knowledge graph completion algorithm is to deduce new entities, relationships, rules and knowledge from the existing entities, relationships, rules and knowledge and predict whether a given triple is effective, a traditional knowledge embedding-based model such as a Convolutional Neural Network (CNN) can learn more high-quality embedding due to the parameter efficiency and the characteristic that the complex relationship can be considered, however, the CNN can only independently process each triple, and the abundant semantic information and potential relationship near the given entity in the knowledge graph are ignored. The R-GCN is able to collect information from the neighbors of a given entity, convolving the neighbors of each entity, however the weights assigned to the neighbors by the R-GCN are equal, there is a bottleneck in processing directed graphs, and the problem of dynamic graphs cannot be handled. In the existing method, knowledge graph embedding is learned by using entity characteristics, or characteristics of entities and relations are processed in a non-connected mode, and the ERP-GAT algorithm provided by the method can comprehensively capture semantic similarity relations of single-hop and multi-hop neighbors of any given entity in the knowledge graph.
The idea of the algorithm will be described below, and specific steps of the algorithm will be given.
Firstly, the problems which are not completely solved in the relation prediction algorithm based on CNN and GCN are briefly analyzed, and accordingly, a solution is proposed and a design framework of an ERP-GAT algorithm is introduced (shown in a figure 1); then, the detailed description of ERP-GAT includes obtaining the multi-hop relation around the given entity or node, obtaining rich semantic information near the given entity and the role played in the relation, consolidating the relation group with similar semanteme of the prior knowledge, etc.; finally, experiments and results analyses were performed on the reference models TransE, ConvKB, R-GCN, etc. and the modified ERP-GAT model on two sets of public data sets (FB15K-237 and NELL-995), specifically comparing the three aspects of MR, MRR, Hits @ N, etc. The effectiveness of the ERP-GAT algorithm is verified through experimental analysis, and experimental results show that the ERP-GAT algorithm effectively improves MR, MRR and Hits @ N indexes in a relation prediction task result.
In fig. 1, a knowledge graph completion method (ERP-GAT) based on a graph attention mechanism firstly inputs two embedded matrices, then enters a graph attention mechanism layer, calculates attention scores of all triples adjacent to a given target entity, then updates the embedded matrices, trains a loss function after passing through the graph attention mechanism layer, then enters a decoder part, trains the loss function after passing through a CNN layer, and finally obtains a knowledge graph completion result.
The method comprises the following specific steps:
the method comprises the following steps: calculating the attention score of the adjacent triplets of each target entity
In the knowledge graph, the entity plays different roles according to the relationship in the current triple, for example, in the triple { liuqiang, chief executive officer, kyoto } and the triple { liuqiang, husband, and octopus }, the entity "liuqiangdong" appears in two different triples, and due to the different relationship, plays two roles of "chief executive officer" and "husband", respectively. To cope with this phenomenon, the ERP-GAT algorithm uses a graphical attention mechanism layer. The figure notes that the mechanism layer takes as input two embedded matrices, where,
Figure RE-GDA0003608266680000031
representing an entity embedding matrix, wherein the ith behavior entity eiEmbedded vector of, NeRepresenting the total number of entities, and T representing the feature dimension of each entity embedding vector;
Figure RE-GDA0003608266680000032
represents a relational embedding matrix, where NrRepresenting the total number of relationships and P representing the feature dimension of each relationship embedding vector. The drawing attention mechanism layer takes two corresponding embedded matrixes as output, and the two embedded matrixes are respectively
Figure RE-GDA0003608266680000033
And
Figure RE-GDA0003608266680000034
to obtain an entity eiThe ERP-GAT model performs linear transformation on entity and relation characteristic vector of the triple connected with the target entity
Figure RE-GDA0003608266680000035
To learn with entity eiEach triplet of an association. As shown in equation 1:
Figure RE-GDA0003608266680000036
wherein
Figure RE-GDA0003608266680000037
Is a triplet
Figure RE-GDA0003608266680000038
A vector representation of (a);
Figure RE-GDA0003608266680000039
respectively representing entity embedding vectors ei、ejAnd relation embedding vector rk;W1A linear transformation matrix is represented.
To measure the importance of triples associated with a target entity, the ERP-GAT model uses αijkTo define the importance of the triples, using a linear transformation matrix W2To pair
Figure RE-GDA00036082666800000310
After linear transformation, the attention score is obtained through a LeakyReLU function, and normalization is performed through a softmax function, as shown in formula 2:
Figure RE-GDA00036082666800000311
step two: updating an embedded matrix
In order to solve the problems that CNN can only independently consider each triple, ignore rich semantic information and potential relation near a given entity in a knowledge graph and do not consider the relation between the triples, an ERP-GAT model uses a multi-head attention mechanism to stabilize a learning process, encapsulates more information in the triples near the given entity, respectively calculates updated embedded vectors for M triples, then connects the embedded vectors in series to finally obtain the updated entity embedded vector
Figure RE-GDA00036082666800000312
As shown in equation 3:
Figure RE-GDA00036082666800000313
specifically, in the last layer of the graph attention machine layer, the ERP-GAT model performs weighted averaging on a plurality of embedded vectors to obtain a final output entity embedded vector, as shown in equation 4:
Figure RE-GDA0003608266680000041
using a linear transformation matrix W for a relational embedding matrixR∈RT×T′Linear transformation is performed, where T' is the dimension of the output relationship embedding matrix, as shown in equation 5:
G′=GWR (5)
the model was trained using hinge loss as a loss function, as shown in equation 6:
Figure RE-GDA0003608266680000042
wherein γ >0 is an edge hyper-parameter; s represents the correct triplet set and S' represents the incorrect triplet set.
Step three: decoder
The ERP-GAT model uses ConvKB as a decoder, uses convolutional layer scoring functions to analyze global embedding characteristics in each dimension and generalizes transition characteristics in the ERP-GAT model, and makes triplets
Figure RE-GDA0003608266680000043
After passing through the convolution filter, the output of each convolution filter is connected in series through a ReLU function, and finally linear transformation is carried out
Figure RE-GDA0003608266680000044
As shown in equation 7:
Figure RE-GDA0003608266680000045
where Ω denotes the number of convolution filters, ωmRepresenting the mth convolution filter.
The decoder is trained using the soft boundary loss as a loss function, as shown in equation 8:
Figure RE-GDA0003608266680000046
wherein,
Figure RE-GDA0003608266680000047
step four: results and analysis of the experiments
(1) Experimental data set
To verify the effectiveness of the algorithm presented herein, the common standard data set, FB15K237 and NELL-995, widely used by researchers in the field of knowledge-graph complementation, were used herein. The specific information of the data set is shown in fig. 4.
(2) Evaluation index
Knowledge profiles are used herein to represent the commonly used indicators for learning studies MR, MRR, Hit @1, Hit @3, Hit @ 10. Wherein higher values of MRR and hit @ N indicate better prediction, and lower values of MR indicate better prediction. Where MRR represents the average of the reciprocals of the correct entity score ranks in the multiple triplet sets Q. As shown in equation 9:
Figure RE-GDA0003608266680000051
wherein, rankiThe relational prediction ranking of the ith triplet is represented.
The MR calculation formula is shown in equation 10:
Figure RE-GDA0003608266680000052
hits @ N refers to the average proportion of triples with a ranking no higher than N in the relational prediction, and the calculation formula is shown in formula 11:
Figure RE-GDA0003608266680000053
where II (-) represents the indicator function, 1 if the condition is true, and 0 otherwise, N is set to 1, 3, 10 for evaluation.
(3) Experimental setup
The experimental goal herein is to complement the triplet { e 'of the head entity replaced by all valid entities'i,rk,ejTriple of { e } or tail entitiesi,rk,e′jAnd then packing the replaced invalid triples and the unique valid triples before replacement into a set, and evaluating all the triples in the set.
(4) Analysis of results
The ERP-GAT model algorithm training and result testing are completed by performing experiments on the standard data sets FB15K-237 and NELL-995, and the obtained experimental results are shown in FIG. 5 and FIG. 6. The result shows that five indexes of the FB15K-237 data set and four indexes of the NELL-995 data set of the ERP-GAT algorithm are obviously improved compared with other existing algorithms, and the best effect of the knowledge graph completion task is achieved.

Claims (4)

1. An economic field knowledge graph completion algorithm based on a graph attention mechanism comprises the following steps:
the method comprises the following steps: calculating the attention score of the adjacent triplets of each target entity
In the knowledge graph, the entity plays different roles according to the relationship in the current triplet, for example, in the triplet { liuqiang, chief executive officer, kyoto } and the triplet { liuqiang, husband, and toshiba }, the entity "liuqiangdong" appears in two different triplets, and due to the different relationship, plays two roles of chief executive officer and husband, respectively, in order to cope with this phenomenon, the ERP-GAT algorithm uses an image attention machine system layer which takes two embedded matrices as input, wherein,
Figure FDA0003392461120000011
representing an entity embedding matrix, wherein the ith behavior entity eiEmbedded vector of, NeRepresenting the total number of entities, and T representing the feature dimension of each entity embedding vector;
Figure FDA0003392461120000012
represents a relational embedding matrix, where NrRepresenting the total number of relationships, P representing the feature dimension of each relationship embedded vector, and the graph attention mechanism layer takes two corresponding embedded matrixes as output, namely
Figure FDA0003392461120000013
And
Figure FDA0003392461120000014
to obtain an entity eiThe ERP-GAT model is based on the entity and the closing of the triples connected with the target entityLinear transformation of the characteristic vector
Figure FDA0003392461120000015
To learn with entity eiEach triplet associated, as shown in equation 1:
Figure FDA0003392461120000016
wherein
Figure FDA0003392461120000017
Is a triplet
Figure FDA0003392461120000018
A vector representation of (a);
Figure FDA0003392461120000019
respectively representing entity embedding vectors ei、ejAnd relation embedding vector rk;W1A linear transformation matrix is represented.
To measure the importance of triples associated with a target entity, the ERP-GAT model uses αijkTo define the importance of the triples, using a linear transformation matrix W2For is to
Figure FDA00033924611200000110
After linear transformation, the attention score is obtained through a LeakyReLU function, and normalization is performed through a softmax function, as shown in formula 2:
Figure FDA00033924611200000111
step two: updating an embedded matrix
In order to solve the problem that the CNN can only independently consider each triple, ignore rich semantic information and potential relations near a given entity in the knowledge graph, and do not consider the relations among the triples,the ERP-GAT model uses a multi-head attention mechanism to stabilize the learning process, encapsulates more information in triplets near a given entity, respectively calculates updated embedding vectors for M triplets, and then connects the embedding vectors in series to finally obtain updated entity embedding vectors
Figure FDA0003392461120000021
As shown in equation 3:
Figure FDA0003392461120000022
specifically, in the last layer of the graph attention machine layer, the ERP-GAT model performs weighted averaging on a plurality of embedded vectors to obtain a final output entity embedded vector, as shown in equation 4:
Figure FDA0003392461120000023
using a linear transformation matrix W for a relational embedding matrixR∈RT×T′Linear transformation is performed, where T' is the dimension of the output relationship embedding matrix, as shown in equation 5:
G'=GWR#(5)
the model was trained using hinge loss as a loss function, as shown in equation 6:
Figure FDA0003392461120000024
wherein γ >0 is an edge hyper-parameter; s represents the correct triplet set and S' represents the incorrect triplet set.
Step three: decoder
The ERP-GAT model uses ConvKB as a decoder, uses convolutional layer scoring functions to analyze global embedding characteristics in each dimension and generalizes transition characteristics in the ERP-GAT model, and makes triplets
Figure FDA0003392461120000025
After passing through the convolution filter, the output of each convolution filter is connected in series through a ReLU function, and finally linear transformation is carried out
Figure FDA0003392461120000026
As shown in equation 7:
Figure FDA0003392461120000027
where Ω denotes the number of convolution filters, ωmRepresenting the mth convolution filter.
The decoder is trained using the soft boundary loss as a loss function, as shown in equation 8:
Figure FDA0003392461120000028
wherein,
Figure FDA0003392461120000029
2. the method of claim 1, wherein an attention score mechanism is used in step 1 to calculate and measure the importance of triples associated with a given entity.
3. The method of claim 1, wherein step 2 uses a multi-attention mechanism to stabilize the learning process and to encapsulate more information in triples around a given entity.
4. The method of claim 1, wherein ConvKB is used as the decoder in step 3.
CN202111471322.0A 2021-12-04 2021-12-04 Economic field knowledge graph completion algorithm based on graph attention machine mechanism Pending CN114625881A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111471322.0A CN114625881A (en) 2021-12-04 2021-12-04 Economic field knowledge graph completion algorithm based on graph attention machine mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111471322.0A CN114625881A (en) 2021-12-04 2021-12-04 Economic field knowledge graph completion algorithm based on graph attention machine mechanism

Publications (1)

Publication Number Publication Date
CN114625881A true CN114625881A (en) 2022-06-14

Family

ID=81898858

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111471322.0A Pending CN114625881A (en) 2021-12-04 2021-12-04 Economic field knowledge graph completion algorithm based on graph attention machine mechanism

Country Status (1)

Country Link
CN (1) CN114625881A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116306936A (en) * 2022-11-24 2023-06-23 北京建筑大学 Knowledge graph embedding method and model based on hierarchical relation rotation and entity rotation
CN116629356A (en) * 2023-05-09 2023-08-22 华中师范大学 Encoder and Gaussian mixture model-based small-sample knowledge graph completion method
CN117540799A (en) * 2023-10-20 2024-02-09 上海歆广数据科技有限公司 Individual case map creation and generation method and system

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116306936A (en) * 2022-11-24 2023-06-23 北京建筑大学 Knowledge graph embedding method and model based on hierarchical relation rotation and entity rotation
CN116629356A (en) * 2023-05-09 2023-08-22 华中师范大学 Encoder and Gaussian mixture model-based small-sample knowledge graph completion method
CN116629356B (en) * 2023-05-09 2024-01-26 华中师范大学 Encoder and Gaussian mixture model-based small-sample knowledge graph completion method
CN117540799A (en) * 2023-10-20 2024-02-09 上海歆广数据科技有限公司 Individual case map creation and generation method and system
CN117540799B (en) * 2023-10-20 2024-04-09 上海歆广数据科技有限公司 Individual case map creation and generation method and system

Similar Documents

Publication Publication Date Title
CN114625881A (en) Economic field knowledge graph completion algorithm based on graph attention machine mechanism
CN108334948B (en) Mechanical bearing fault diagnosis technology based on wide residual error network learning model
CN109620152B (en) MutifacolLoss-densenert-based electrocardiosignal classification method
CN109034194B (en) Transaction fraud behavior deep detection method based on feature differentiation
CN110097755A (en) Freeway traffic flow amount state identification method based on deep neural network
CN107770517A (en) Full reference image quality appraisement method based on image fault type
CN112087447B (en) Rare attack-oriented network intrusion detection method
CN111178319A (en) Video behavior identification method based on compression reward and punishment mechanism
CN114841257A (en) Small sample target detection method based on self-supervision contrast constraint
CN110070116B (en) Segmented selection integration image classification method based on deep tree training strategy
CN116153495A (en) Prognosis survival prediction method for immunotherapy of esophageal cancer patient
CN109214298A (en) A kind of Asia women face value Rating Model method based on depth convolutional network
CN109034062A (en) A kind of Weakly supervised anomaly detection method based on temporal consistency
CN110826702A (en) Abnormal event detection method for multitask deep network
CN114913379B (en) Remote sensing image small sample scene classification method based on multitasking dynamic contrast learning
CN108596044B (en) Pedestrian detection method based on deep convolutional neural network
CN112668809A (en) Method for establishing autism child rehabilitation effect prediction model and method and system for predicting autism child rehabilitation effect
CN113344077A (en) Anti-noise solanaceae disease identification method based on convolution capsule network structure
CN112651360A (en) Skeleton action recognition method under small sample
CN114722216A (en) Entity alignment method based on Chinese electronic medical record knowledge graph
CN114169504B (en) Self-adaptive filtering-based graph convolution neural network pooling method
CN116527346A (en) Threat node perception method based on deep learning graph neural network theory
CN115269861A (en) Reinforced learning knowledge graph reasoning method based on generative confrontation and imitation learning
CN103281555B (en) Half reference assessment-based quality of experience (QoE) objective assessment method for video streaming service
CN112861443B (en) Advanced learning fault diagnosis method integrated with priori knowledge

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication