CN112000815A - Knowledge graph complementing method and device, electronic equipment and storage medium - Google Patents

Knowledge graph complementing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN112000815A
CN112000815A CN202011168569.0A CN202011168569A CN112000815A CN 112000815 A CN112000815 A CN 112000815A CN 202011168569 A CN202011168569 A CN 202011168569A CN 112000815 A CN112000815 A CN 112000815A
Authority
CN
China
Prior art keywords
entity
description
candidate tail
head
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011168569.0A
Other languages
Chinese (zh)
Other versions
CN112000815B (en
Inventor
李直旭
陈志刚
何莹
郑新
付晨鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Iflytek Suzhou Technology Co Ltd
Original Assignee
Iflytek Suzhou Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Iflytek Suzhou Technology Co Ltd filed Critical Iflytek Suzhou Technology Co Ltd
Priority to CN202011168569.0A priority Critical patent/CN112000815B/en
Publication of CN112000815A publication Critical patent/CN112000815A/en
Application granted granted Critical
Publication of CN112000815B publication Critical patent/CN112000815B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a knowledge graph complementing method, a knowledge graph complementing device, electronic equipment and a storage medium, wherein the method comprises the following steps: determining a head entity to be complemented and a description text of a plurality of candidate tail entities corresponding to the relationship; inputting the head entity and the relation and description texts of a plurality of candidate tail entities into a knowledge graph spectrum complementing model to obtain a complementing result output by the knowledge graph complementing model; the knowledge graph completion model determines the description characteristics of each candidate tail entity based on the relevance between the description texts of each candidate tail entity, and determines the completion results corresponding to the head entity and the relation based on the description characteristics of each candidate tail entity. The method, the device, the electronic equipment and the storage medium provided by the embodiment of the invention realize the full utilization of the information contained in the description text and improve the accuracy of the completion of the knowledge graph.

Description

Knowledge graph complementing method and device, electronic equipment and storage medium
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a knowledge graph complementing method and device, electronic equipment and a storage medium.
Background
In order to complement the missing relationship between the entities in the knowledge-graph, the knowledge can be obtained by utilizing the resources in the open world, and new entities which do not appear in the knowledge-graph are added into the knowledge-graph.
In the prior art, the knowledge graph is usually complemented by using entity description text in the open world or jointly using the entity description text and the topological structure of the knowledge graph. The complexity of knowledge in the open world is high, so that the utilization rate of an entity description text by the conventional completion method is low, and the accuracy of a completed knowledge graph is poor.
Disclosure of Invention
The embodiment of the invention provides a knowledge graph complementing method, a knowledge graph complementing device, electronic equipment and a storage medium, and aims to solve the problems that an existing knowledge graph complementing method is low in utilization rate of an entity description text and poor in accuracy of a complemented knowledge graph.
In a first aspect, an embodiment of the present invention provides a knowledge graph completing method, including:
determining a head entity to be complemented and a description text of a plurality of candidate tail entities corresponding to the relationship;
inputting the head entity and the relation and description texts of a plurality of candidate tail entities into a knowledge graph spectrum complementing model to obtain a complementing result output by the knowledge graph complementing model;
the knowledge graph completion model determines the description characteristics of each candidate tail entity based on the relevance between the description texts of each candidate tail entity, and determines the completion results corresponding to the head entity and the relation based on the description characteristics of each candidate tail entity;
the knowledge graph completion model is obtained by training based on the relation between a sample head entity and a sample and the description texts of a plurality of corresponding sample candidate tail entities and the sample completion result.
Optionally, the inputting the description texts of the head entity and the relationship and the plurality of candidate tail entities into a knowledge graph spectrum completion model to obtain a completion result output by the knowledge graph completion model specifically includes:
inputting the head entity and the relation and description texts of a plurality of candidate tail entities into a feature coding layer of the knowledge graph complementing model to obtain the coding features of the head entity and the relation output by the feature coding layer and the coding features of the description texts of each candidate tail entity;
inputting the coding features of the description text of each candidate tail entity into a candidate text association layer of the knowledge graph completion model to obtain the description features of each candidate tail entity output by the candidate text association layer;
and inputting the coding characteristics of the head entity and the relation and the description characteristics of each candidate tail entity into an entity completion layer of the knowledge graph completion model to obtain the completion result output by the entity completion layer.
Optionally, the inputting the coding feature of the description text of each candidate tail entity into the candidate text association layer of the knowledge graph completion model to obtain the description feature of each candidate tail entity output by the candidate text association layer specifically includes:
inputting the coding features of the description texts of each candidate tail entity into an associated feature extraction layer of the candidate text association layer to obtain associated features between the description texts of any candidate tail entity and the description texts of each other candidate tail entity, which are output by the associated feature extraction layer based on an attention mechanism;
and inputting the association characteristics between any candidate tail entity and each of the rest candidate tail entities and the coding characteristics of the description text of any candidate tail entity into a description characteristic extraction layer of the candidate text association layer to obtain the description characteristics of any candidate tail entity output by the description characteristic extraction layer.
Optionally, the inputting the association feature between any candidate tail entity and each of the remaining candidate tail entities and the coding feature of the description text of any candidate tail entity into a description feature extraction layer of the candidate text association layer to obtain the description feature of any candidate tail entity output by the description feature extraction layer specifically includes:
inputting the association features between any candidate tail entity and each of the rest candidate tail entities and the coding features of the description text of any candidate tail entity into the description feature extraction layer, and fusing the coding features of the description text of any candidate tail entity and the difference between the coding features and the association features by the description feature extraction layer to obtain a fusion result output by the description feature extraction layer as the description features of any candidate tail entity.
Optionally, the inputting the head entity and the relationship and the description texts of a plurality of candidate tail entities into a feature coding layer of the knowledge graph completion model to obtain the coding features of the head entity and the relationship output by the feature coding layer and the coding features of the description texts of each candidate tail entity specifically includes:
inputting the description texts of the head entities, the head entities and the relations and the description texts of a plurality of candidate tail entities into a feature representation layer of the feature coding layer to obtain feature representations of the description texts of the head entities, the head entities and the relations and feature representations of the description texts of the plurality of candidate tail entities, which are output by the feature representation layer;
inputting the feature representation of the description text of the head entity, the feature representation of the head entity and the relationship, and the feature representation of the description texts of a plurality of candidate tail entities into a context coding layer of the feature coding layer, and obtaining the coding features of the description text of the head entity, the coding features of the head entity and the relationship, and the coding features of the description texts of each candidate tail entity, which are output by the context coding layer.
Optionally, the inputting the description text of the head entity, the head entity and the relationship, and the description texts of a plurality of candidate tail entities into the feature representation layer of the feature coding layer to obtain the feature representation of the description text of the head entity, the feature representation of the head entity and the relationship, and the feature representation of the description texts of the plurality of candidate tail entities, which are output by the feature representation layer specifically includes:
inputting the description text of the head entity, the head entity and the relation, and the description text of any candidate tail entity into the feature representation layer, and performing attention interaction on the vector representation of the description text of the head entity, the vector representation of the head entity and the relation, and the vector representation of the description text of any candidate tail entity by the feature representation layer to obtain the feature representation of the description text of the head entity, the feature representation of the head entity and the relation, and the feature representation of the description texts of a plurality of candidate tail entities, which are output by the feature representation layer.
Optionally, the feature representation of the descriptive text of the head entity further includes at least one of a part-of-speech vector, an entity type vector, and a word frequency co-occurrence vector of the head entity.
Optionally, the inputting the coding features of the head entity and the relationship and the description features of each candidate tail entity into an entity completion layer of the knowledge graph completion model to obtain the completion result output by the entity completion layer specifically includes:
inputting the coding features of the description texts of the head entities, the coding features of the head entities and the relations, and the description features of each candidate tail entity into a feature interaction layer of the entity completion layer, performing self-attention transformation on the coding features of the head entities and the relations by the feature interaction layer to obtain problem features, and performing attention interaction on the problem features with the coding features of the description texts of the head entities and the description features of the description texts of each candidate tail entity respectively to obtain interactive coding features of the description texts of the head entities and interactive coding features of the description texts of each candidate tail entity, which are output by the feature interaction layer;
inputting the interactive coding features of the description texts of the head entity and the interactive coding features of the description texts of each candidate tail entity into a result output layer of the entity completion layer, determining the similarity between the interactive coding features of the description texts of the head entity and the interactive coding features of the description texts of each candidate tail entity by the result output layer, and completing based on the similarity corresponding to each candidate tail entity to obtain the completion result output by the result output layer.
In a second aspect, an embodiment of the present invention provides a knowledge graph spectrum complementing apparatus, including:
the text determining unit is used for determining a head entity to be complemented and description texts of a plurality of candidate tail entities corresponding to the relations;
the map completion unit is used for inputting the description texts of the head entity, the relation and the candidate tail entities into a knowledge map completion model to obtain a completion result output by the knowledge map completion model;
the knowledge graph completion model determines the description characteristics of each candidate tail entity based on the relevance between the description texts of each candidate tail entity, and determines the completion results corresponding to the head entity and the relation based on the description characteristics of each candidate tail entity;
the knowledge graph completion model is obtained by training based on the relation between a sample head entity and a sample and the description texts of a plurality of corresponding sample candidate tail entities and the sample completion result.
In a third aspect, an embodiment of the present invention provides an electronic device, including a processor, a communication interface, a memory, and a bus, where the processor and the communication interface, the memory complete mutual communication through the bus, and the processor may call a logic command in the memory to perform the steps of the method provided in the first aspect.
In a fourth aspect, an embodiment of the present invention provides a non-transitory computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of the method as provided in the first aspect.
According to the knowledge graph completion method, the knowledge graph completion device, the electronic equipment and the storage medium, the description characteristics of each candidate tail entity are determined based on the relevance among the description texts of a plurality of candidate tail entities, so that the description characteristics of each candidate tail entity can fully reflect the outstanding difference of the corresponding candidate tail entity compared with the rest candidate tail entities, the completion result corresponding to the head entity and the relation is determined based on the description characteristics of each candidate tail entity, the information contained in the description texts is fully utilized, and the accuracy of knowledge graph completion is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a schematic flow chart of a method for supplementing a knowledge graph according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of an operation flow of a knowledge-graph completion result model according to an embodiment of the present invention;
fig. 3 is a schematic flowchart of a method for determining description characteristics of candidate tail entities according to an embodiment of the present invention;
fig. 4 is a schematic operation flow diagram of a feature encoding layer according to an embodiment of the present invention;
FIG. 5 is a flowchart illustrating a feature interaction and result output method according to an embodiment of the present invention;
FIG. 6 is a block diagram of a knowledge-graph compensation model according to an embodiment of the present invention;
FIG. 7 is a schematic structural diagram of a knowledge-graph complementing device according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Knowledge Graph (KG) aims to construct a database of structured information, represent objects (such as proper nouns like names of people, places, and organizations) and abstract concepts in the world as entities, and represent interactions and connections between entities as relationships. The entity and the relationship between the entities form a huge graph, wherein the entities are nodes in the graph, and the relationship is used as an edge in the graph. In a knowledge graph, the world's vast knowledge is represented as triplets (triplets) between entities using relationships as connections.
The completion of the knowledge-graph aims to find out missing parts in triples (head entities, relations and tail entities) in the knowledge-graph, so that the knowledge-graph becomes more complete. Knowledge in the open world is usually used for complementing the knowledge map, and due to the high complexity of the knowledge in the open world, the utilization rate of an entity description text by the conventional complementing method is low, and the accuracy of the complementing of the knowledge map is poor.
In order to overcome the defects in the prior art, fig. 1 is a schematic flow chart of a knowledge graph completion method provided by an embodiment of the present invention, and as shown in fig. 1, the method includes:
step 110, determining a head entity to be complemented and description texts of a plurality of candidate tail entities corresponding to the relations;
specifically, the description text of the entity is a text for interpreting the entity using a natural language. Description texts of entities in the open world can be from online encyclopedia or from public corpora, such as news corpora, english corpora, chinese corpora, and the like.
The triple only has a head entity and a relation, when no tail entity exists, the head entity and the relation can be used as the head entity and the relation to be completed, and the missing tail entity in the triple is the part needing to be completed. For the head entities and the relations to be complemented, a plurality of candidate tail entities exist in the open world, and each candidate tail entity corresponds to a description text. For example, the head entities "actor a" and relationship "incumbent wife" to be complemented, there are two candidate tail entities "actress B" and "actress C" in the open world, where the description text of "actress B" and "actress C" are "actor a and actress B have a wedding this year's day" and "actor a and actress C have finished marriage in the past secret", respectively.
The process of complementing the knowledge-graph by using the open world knowledge can be understood as finding the tail entity which best meets the requirement of the relationship from the description texts of the entities in the open world according to the head entity and the relationship in the knowledge-graph. In the above example, the task of the knowledge-graph completion is to determine the tail entity that best meets the question "who is the incumbent wife of the actor a" corresponding to the head entity and the relationship to be completed from the description text of the candidate tail entities.
Step 120, inputting the head entity and the relation and description texts of the candidate tail entities into a knowledge graph spectrum complementing model to obtain a complementing result output by the knowledge graph complementing model;
the knowledge graph spectrum completion model determines the description characteristics of each candidate tail entity based on the relevance between the description texts of each candidate tail entity, and determines the completion results corresponding to the head entity and the relation based on the description characteristics of each candidate tail entity;
the knowledge graph completion model is obtained by training based on the relation between the sample head entity and the sample and the description texts of a plurality of corresponding sample candidate tail entities and the sample completion result.
Specifically, the description texts of different candidate tail entities are not completely independent, and there may be some mutual association or coincidence between the description texts of different candidate tail entities. The relevance between the description texts of the candidate tail entities can be expressed as the correlation degree of semantic information in the description texts of the candidate tail entities.
After the knowledge graph complementing model obtains the description texts of the candidate tail entities, the relevance between the description texts of the candidate tail entities can be captured, on the basis, aiming at the description text of one candidate tail entity, the influence of each candidate tail entity on the candidate tail entity in the selection process of the candidate tail entity can be simulated according to the relevance between the description text of the candidate tail entity and the description texts of each candidate tail entity, and then the description characteristics capable of fully reflecting the outstanding difference of the candidate tail entity compared with each candidate tail entity are obtained. Here, the description feature of any candidate tail entity is determined based on the description text of the candidate tail entity and the association between the description texts of the remaining candidate tail entities and the description text of the candidate tail entity, and is used for reflecting the feature of the description text of the candidate tail entity, which is different from the description texts of the other candidate tail entities.
The knowledge graph completion model obtains the description characteristics of each candidate tail entity, and can select the tail entity corresponding to the head entity and the relation from all candidate tail entities as a completion result based on the description characteristics of each candidate tail entity and by combining the information carried in the head entity and the relation to be completed.
For example, in the above example, the head entities "actor a" and relationship "incumbent wife" and the description texts "actor a and actress B have a wedding this year in their original date" and "actor a and actress C have finished marriage in the past secret" of two candidate tail entities are input into the knowledge spectrum complementation model, the knowledge spectrum complementation model captures the correlation between the description texts of the two candidate tail entities, and further weakens the information with higher correlation in the description texts of the two candidate tail entities, such as "actor a", and further highlights the information with lower correlation and higher differentiation in the description texts, such as "wedding show", "marriage finish", and the like, so as to obtain the description features that the two candidate tail entities respectively highlight "wedding show" and "marriage finish", on the basis of which, the knowledge spectrum complementation model combines the head entities "actor a" and relationship "incumbent wife" to be complemented, based on the description characteristics of the two candidate tail entities, a tail entity 'actress B' with the description characteristics more conforming to the 'incumbent wife of the actor A' is selected from the two candidate tail entities and is output as a completion result.
Before step 120 is executed, the knowledge-graph completion model may be obtained by pre-training, and specifically, the knowledge-graph completion model may be obtained by the following training methods: firstly, a large number of sample head entities and sample relations and description texts of a plurality of sample candidate tail entities corresponding to each sample head entity and sample relation are collected. And obtaining a sample completion result corresponding to each sample head entity and the sample relation through manual marking. And then inputting a large number of description texts of the sample head entities and the sample relations, a plurality of sample candidate tail entities corresponding to each sample head entity and each sample relation, and a sample completion result corresponding to each sample head entity and each sample relation into the initial model for training, thereby obtaining a knowledge graph completion model.
The method for completing the knowledge graph provided by the embodiment of the invention determines the description characteristics of each candidate tail entity based on the relevance among the description texts of a plurality of candidate tail entities, so that the description characteristics of each candidate tail entity can fully reflect the outstanding difference of the corresponding candidate tail entity compared with the rest candidate tail entities, and determines the completion result corresponding to the head entity and the relation based on the description characteristics of each candidate tail entity, thereby realizing the full utilization of the information contained in the description texts and improving the accuracy of the completion of the knowledge graph.
Based on the above embodiment, the knowledge graph completion model includes a feature coding layer, a candidate text association layer, and an entity completion layer. Correspondingly, fig. 2 is a schematic view of an operation flow of the knowledge-graph completion model provided in the embodiment of the present invention, and as shown in fig. 2, step 120 specifically includes:
step 121, inputting the head entity and the relationship and the description texts of the multiple candidate tail entities into a feature coding layer of a knowledge graph spectrum completion model to obtain the coding features of the head entity and the relationship output by the feature coding layer and the coding features of the description texts of each candidate tail entity;
specifically, a head entity and a relationship and description texts of a plurality of candidate tail entities are input to a feature coding layer, and feature coding is respectively carried out on the input head entity and relationship and the description texts of the plurality of candidate tail entities by the feature coding layer, so that coding features of the head entity and the relationship and coding features of the description texts of each candidate tail entity are obtained.
Step 122, inputting the coding features of the description text of each candidate tail entity into a candidate text association layer of a knowledge graph spectrum completion model to obtain the description features of each candidate tail entity output by the candidate text association layer;
specifically, the candidate text association layer may capture the association between the description texts of each candidate tail entity according to the coding features of the input description texts of each candidate tail entity, weaken information with a higher degree of correlation in the description texts of each candidate tail entity, and highlight information with a lower degree of association and a higher degree of distinction in the description texts, thereby obtaining the description features of each candidate tail entity. It should be noted that, here, the capturing of the relevance between the description texts of each candidate tail entity can be realized through an attention interaction mechanism.
And step 123, inputting the coding characteristics of the head entity and the relation and the description characteristics of each candidate tail entity into an entity completion layer of the knowledge graph spectrum completion model to obtain a completion result output by the entity completion layer.
Specifically, the entity completion layer determines candidate tail entities corresponding to the head entities and the relations as completion results output by the entity completion layer according to the input description features of each candidate tail entity and the coding features of the head entities and the relations. For example, the completion result may be determined by calculating the matching degree between each candidate tail entity and the head entity and the relationship according to the description features of each candidate tail entity and the coding features of the head entity and the relationship.
Based on any embodiment, the candidate text association layer comprises an association feature extraction layer and a description feature extraction layer. Fig. 3 is a flowchart illustrating a method for determining description features of candidate tail entities according to an embodiment of the present invention, and as shown in fig. 3, step 122 specifically includes:
step 1221, inputting the coding features of the description text of each candidate tail entity into an associated feature extraction layer of an associated layer of the candidate text to obtain associated features between the description text of any candidate tail entity and the description texts of each remaining candidate tail entity, which are output by the associated feature extraction layer based on an attention mechanism;
specifically, since a plurality of candidate tail entities are not independent and are in mutual connection, part of information in the description text of any one candidate tail entity may appear in the description text of each remaining candidate tail entity multiple times. The repeated occurrence of the partial information weakens the information with prominent difference in the description text of each candidate tail entity, and the interference is brought to the completion of the knowledge graph, so that a plurality of candidate tail entities are difficult to be accurately distinguished, and the completion accuracy of the knowledge graph is low. For example, in the above example, the description texts "actor a and actress B have a wedding this year and" actor a and actress C have a confidential marriage in the last year "of the two candidate tail entities have the same information" actor a "and highly correlated information" marriage "and" wedding ", which weaken the information that can represent the beginning and end of a spouse relationship with a prominent distinction in the two description texts, so that the candidate tail entities" actress B "and" actress C "become indistinguishable.
Aiming at the problems, the relevance feature extraction layer carries out attention interaction on the coding features of the input description texts of any candidate tail entity and the coding features of the input description texts of each candidate tail entity on the basis of an attention mechanism, and obtains the relevance features between the description texts of any candidate tail entity and the description texts of each candidate tail entity.
Here, the association feature is a feature that any candidate tail entity is associated with the remaining candidate tail entities in the description text, and may be used to characterize influence factors of each remaining candidate tail entity on the candidate tail entity.
Further, can use
Figure 981014DEST_PATH_IMAGE001
A description text representing the ith candidate tail entity
Figure 174097DEST_PATH_IMAGE002
Coding characteristics of description text representing ith candidate tail entity
Figure 109692DEST_PATH_IMAGE003
Representing the encoding characteristics of the description text of the jth candidate tail entity, the correlation between the ith candidate tail entity and the jth candidate tail entity
Figure 286596DEST_PATH_IMAGE004
Can be formulated as:
Figure 270732DEST_PATH_IMAGE005
in the formula (I), the compound is shown in the specification,
Figure 52744DEST_PATH_IMAGE006
for the correlation calculation matrix, the matrix can be obtained by means of model training.
On the basis of the above, to
Figure 221557DEST_PATH_IMAGE004
Normalization is carried out, and the correlation coefficient of the jth candidate tail entity relative to the ith candidate tail entity can be obtained
Figure 89019DEST_PATH_IMAGE007
It can be formulated as:
Figure 611267DEST_PATH_IMAGE008
wherein m is the number of candidate tail entities, k is the label of the candidate tail entity,
Figure 307172DEST_PATH_IMAGE009
is based on natural constant
Figure 584569DEST_PATH_IMAGE010
Bottom fingerA function of the number.
The associated characteristics between the description text of the ith candidate tail entity and the description texts of each of the other candidate tail entities
Figure 611431DEST_PATH_IMAGE011
Can be formulated as:
Figure 265266DEST_PATH_IMAGE012
in the formula, k is the label of the candidate tail entity.
Step 1222, inputting the association features between any candidate tail entity and each of the other candidate tail entities and the encoding features of the description text of the candidate tail entity into the description feature extraction layer of the candidate text association layer, so as to obtain the description features of the candidate tail entity output by the description feature extraction layer.
Specifically, the description feature extraction layer is used for extracting the description features of any candidate tail entity in combination with the association features between the candidate tail entity and each of the rest candidate tail entities and the coding features of the description text of the candidate tail entity.
Here, the description feature of any candidate tail entity may be obtained by processing the association feature and the coding feature in a summation, a difference calculation or other operation manner, which is not specifically limited in the embodiment of the present invention.
Based on any of the above embodiments, step 1222 specifically includes:
and inputting the association characteristics between any candidate tail entity and each of the rest candidate tail entities and the coding characteristics of the description text of the candidate tail entity into a description characteristic extraction layer, and fusing the coding characteristics of the description text of the candidate tail entity and the difference between the coding characteristics and the association characteristics by the description characteristic extraction layer to obtain a fusion result output by the description characteristic extraction layer as the description characteristics of the candidate tail entity.
Specifically, the description feature extraction layer extracts the coding features of the description text of the ith candidate tail entity
Figure 818608DEST_PATH_IMAGE013
And coding features
Figure 939010DEST_PATH_IMAGE014
And associated features
Figure 781064DEST_PATH_IMAGE015
The difference between the two is fused to obtain a fusion result
Figure 35328DEST_PATH_IMAGE016
As a descriptive feature of the candidate tail entity. Further, the description feature of any candidate tail entity here can be directly expressed as the encoding feature of the description text of the candidate tail entity, and the concatenation form of the difference between the encoding feature and the associated feature, and is formulated as:
Figure 380859DEST_PATH_IMAGE017
wherein, the difference value of the coding feature of the description text of any candidate tail entity and the correlation feature between the candidate tail entity and each of the other candidate tail entities
Figure 203321DEST_PATH_IMAGE018
The influence of the relevance between the description texts of the candidate tail entities on the candidate tail entities is eliminated, so that the information that the candidate tail entities have distinctiveness compared with each of the rest candidate tail entities can be highlighted; in addition, the coding features of the description texts of the candidate tail entities are reserved, so that any information of the description texts cannot be omitted in the description features of the candidate tail entities, and a knowledge graph completion model can conveniently and accurately complete the description.
According to any of the above embodiments, the feature encoding layer includes a feature representation layer and a context encoding layer. Fig. 4 is a schematic diagram of an operation flow of a feature encoding layer according to an embodiment of the present invention, as shown in fig. 4, step 121 includes:
step 1211, inputting the description text, the head entity and the relationship of the head entity and the description texts of the candidate tail entities into a feature representation layer of a feature coding layer, and obtaining feature representation of the description text, the head entity and the relationship of the head entity and feature representation of the description texts of the candidate tail entities output by the feature representation layer;
specifically, in the knowledge of the open world, the description text of the head entity contains a lot of description information related to the head entity, the description information is likely to appear in the description text of the candidate tail entity, and the evaluation basis of the knowledge-graph completion model on the matching degree between the head entity, the relation and the tail entity can be refined by fully utilizing the description information. Therefore, the embodiment of the invention also takes the description text of the head entity as one input of the knowledge graph completion model, thereby improving the accuracy of the knowledge graph completion.
Further, the combination of the head entity to be complemented and the relationship may constitute a form of a question, for example, the head entity is "actor a", the relationship is "incumbent wife", and the question that the two are combined is "who the incumbent wife of actor a is". The representation mode of the problem form is more helpful for the knowledge graph completion model to understand the information required to be satisfied by the candidate tail entity for completion. Therefore, in the embodiment of the invention, the combination of the head entity to be complemented and the relation is also used as one input of the knowledge graph complementing model.
Assume that the description text of the head entity is
Figure 267092DEST_PATH_IMAGE019
Head entity and relationship are
Figure 796818DEST_PATH_IMAGE020
And a plurality of candidate tail entities as a description text
Figure 200118DEST_PATH_IMAGE021
Then, after the word segmentation is performed on each text, a word sequence corresponding to each text can be obtained, which is specifically expressed as:
the word sequence of the descriptive text of the head entity is represented as
Figure 255798DEST_PATH_IMAGE022
Wherein, in the step (A),
Figure 603603DEST_PATH_IMAGE023
description text as head entity
Figure 402932DEST_PATH_IMAGE019
Z is the number of any word in the descriptive text of the head entity.
Word sequence representation of head entities and relationships as
Figure 395159DEST_PATH_IMAGE024
Wherein, in the step (A),
Figure 418478DEST_PATH_IMAGE025
as head entities and relationships
Figure 456842DEST_PATH_IMAGE026
J is the label of any word in the head entity and relationship.
Word sequences of candidate tail entities are represented as
Figure 731965DEST_PATH_IMAGE027
Wherein, in the step (A),
Figure 703332DEST_PATH_IMAGE028
description text of any candidate tail entity
Figure 100815DEST_PATH_IMAGE021
N is the description text of any candidate tail entity
Figure 298579DEST_PATH_IMAGE021
The index of any word, i is the index of any candidate tail entity.
Description text of head entity to be input by feature representation layer
Figure 436780DEST_PATH_IMAGE019
Head entity and relationship
Figure 528233DEST_PATH_IMAGE026
And a description text of a plurality of candidate tail entities
Figure 768721DEST_PATH_IMAGE021
Each word in the text is subjected to vector conversion to obtain a semantic vector with a fixed length as a feature representation of a description text of a head entity output by a feature representation layer
Figure 781677DEST_PATH_IMAGE029
Characterization of head entities and relationships
Figure 523237DEST_PATH_IMAGE030
And a feature representation of the description text for each candidate tail entity
Figure 344562DEST_PATH_IMAGE031
Step 1212, inputting the feature representation of the description text of the head entity, the feature representation of the head entity and the relationship, and the feature representation of the description texts of the plurality of candidate tail entities into a context coding layer of the feature coding layer, so as to obtain the coding features of the description text of the head entity, the coding features of the head entity and the relationship, which are output by the context coding layer, and the coding features of the description texts of each candidate tail entity.
Specifically, the descriptive text of the head entity, the head entity and the relationship, and the feature representations of the descriptive text of the plurality of candidate tail entities are all composed of word sequences. Due to the fact that knowledge in the open world is complex, the description text of the entity contains a large number of words, and word sequences are long. There is a semantic logical relationship between the words in the sequence, i.e. information expressed by one of the words is often associated with information expressed by several words before and after the word in the sequence.
In order to improve the semantic representation capability of the features, the context coding layer carries out semantic coding based on the context on the feature representation of the description text of the input head entity, the feature representation of the head entity and the relationship and the feature representation of the description texts of a plurality of candidate tail entities respectively, so as to obtain the coding features of the description text of the head entity, the coding features of the head entity and the relationship and the coding features of the description text of each candidate tail entity. Here, the context coding layer may employ a Bi-directional Long Short-Term Memory (Bi-LSTM) network.
In the embodiment of the invention, the description text, the head entity and relationship of the head entity and the description texts of the candidate tail entities are used as the input of the knowledge graph complementing model, the knowledge graph complementing model carries out context-based semantic coding on each input text, the semantic logical relationship existing in the semanteme among all words in each text is mined, and the semantic representation capability of the coding features corresponding to each text is improved.
Based on any of the above embodiments, step 1211 specifically includes:
inputting the description texts, the head entities and the relations of the head entities and the description texts of any candidate tail entities into a feature representation layer, and performing attention interaction on the vector representation of the description texts, the vector representation of the head entities and the relations of the head entities and the vector representation of the description texts of any candidate tail entities by the feature representation layer in a pairwise manner to obtain the feature representation of the description texts, the feature representation of the head entities and the relations of the head entities and the feature representation of the description texts of a plurality of candidate tail entities output by the feature representation layer.
Specifically, the feature representation layer is used for performing attention interaction on the vector representation of the description text of the head entity, the vector representation of the head entity and the relation, and the vector representation of the description text of any candidate tail entity in pairs, wherein the attention interaction between the vector representation of the description text of the head entity and the vector representation of the head entity and the relation is helpful for highlighting information associated with the head entity and the relation in the description text of the head entity; attention interaction between the vector representation of the description text of any candidate tail entity and the vector representations of the head entity and the relation helps to highlight information related to the head entity and the relation in the description text of the candidate tail entity; the attention interaction between the vector representation of the description text of the head entity and the vector representation of the description text of any candidate tail entity is helpful for highlighting the information in the description text of the candidate tail entity, which is associated with the description text of the head entity.
Optionally, the feature representation of the descriptive text of the head entity obtained thereby may be a representation after combining the attention interaction results with the head entity and the relationship; the feature representation of the head entity and the relationship may be the vector representation of the head entity and the relationship itself; the feature representation of the descriptive text of any candidate tail entity may be a representation combining the attention interaction results with the head entity and the relationship, respectively, and the attention interaction results with the descriptive text of the head entity.
Further, the feature representation layer may employ the same word-level sequence-aligned attention mechanism for each attention interaction, for a given input word vector X and word vector sequence
Figure 818269DEST_PATH_IMAGE032
The attention function is:
Figure 115258DEST_PATH_IMAGE033
wherein p is a word vector sequence
Figure 598192DEST_PATH_IMAGE032
The total number of word vectors in (a), q is the index of a word vector in the word vector sequence Y,
Figure 539603DEST_PATH_IMAGE034
as a word vector X and a word vector
Figure 246528DEST_PATH_IMAGE035
The attention coefficient between can be formulated as:
Figure 971426DEST_PATH_IMAGE036
in the formula (I), the compound is shown in the specification,
Figure 930155DEST_PATH_IMAGE037
in order to have the ReLU activation function, W is a linear transformation matrix and can be obtained through a model training mode.
The attention interaction results between each word in the description text of the head entity and each word in the head entity and the relationship, between each word in the description text of any candidate tail entity and each word in the head entity and the relationship, and between each word in the description text of any candidate tail entity and each word in the description text of the head entity are solved in turn according to the word-level sequence alignment attention mechanism.
Before the attention interaction, corresponding word vector sequences may be respectively created according to the description text, the head entity and the relationship of the head entity and the word sequence of the description text of any candidate tail entity by using a Global word vector (GloVe) method, and may be used as vector representations, for example, the word vector of the description text of the head entity may be used as a vector representation
Figure 53969DEST_PATH_IMAGE038
Word vectors of head entities and relationships
Figure 197374DEST_PATH_IMAGE039
And a word vector of the description text of any candidate tail entity
Figure 141059DEST_PATH_IMAGE040
Respectively as the descriptive text of the head entity, the head entity and the relation, and the vector representation of the descriptive text of any candidate tail entity.
Vector representation of descriptive text of head entities by means of feature representation layers
Figure 637900DEST_PATH_IMAGE038
Word vectors of head entities and relationships
Figure 678537DEST_PATH_IMAGE039
Performing attention interaction to obtain the attention interaction result between the description text of the head entity and the relationship
Figure 930527DEST_PATH_IMAGE041
Figure 564770DEST_PATH_IMAGE042
Feature representation of head entities and relationships through a feature representation layer
Figure 927619DEST_PATH_IMAGE039
And a word vector of the description text of any candidate tail entity
Figure 557183DEST_PATH_IMAGE043
Performing attention interaction to obtain the attention interaction result between the description text of any candidate tail entity and the head entity and relation
Figure 917757DEST_PATH_IMAGE044
Figure 856542DEST_PATH_IMAGE045
Word vector of description text of head entity through feature representation layer
Figure 85398DEST_PATH_IMAGE038
And word vectors of description text of any candidate tail entity
Figure 38311DEST_PATH_IMAGE043
Performing attention interaction to obtain the attention interaction result between the description text of any candidate tail entity and the description text of the head entity
Figure 304207DEST_PATH_IMAGE046
Figure 772098DEST_PATH_IMAGE047
Finally, the feature representation layer represents the vector of the description text of the head entity and the attention interaction result between the description text of the head entity and the relationship
Figure 476748DEST_PATH_IMAGE041
In combination, as the feature representation of the description text of the head entity, the vector representation of the head entity and the relation is taken as the feature representation of the head entity and the relation, the vector representation of the description text of any candidate tail entity and the attention interaction result of the description text of the candidate tail entity and the description text of the head entity, the head entity and the relation respectively
Figure 221851DEST_PATH_IMAGE046
Figure 986544DEST_PATH_IMAGE044
As a feature representation of the descriptive text of the candidate tail entity.
According to any of the above embodiments, the feature representation of the descriptive text of the head entity further includes at least one of a part-of-speech vector, an entity type vector, and a word frequency co-occurrence vector of the head entity.
Specifically, the part-of-speech vector is used for representing the part of speech of each word in the description text of the head entity, and is specifically obtained by performing part-of-speech tagging through spaCy (Python natural language processing kit) or other part-of-speech tagging tools;
the entity type vector is used for representing the entity type described by each word in the description text of the head entity, and can be obtained by carrying out entity type recognition through spaCy (Python natural language processing toolkit) or other entity type recognition tools;
the word frequency co-occurrence vector is used for representing whether a word of the description text of the head entity appears in a word sequence of the head entity and the relation or a word sequence of the description text of the candidate tail entity, and can be obtained by a word frequency statistical tool or self-defined addition.
It should be noted that the vector representation of the head entity description text may also include at least one of a part-of-speech vector, an entity type vector, and a word frequency co-occurrence vector of the head entity, and at least one of the part-of-speech vector, the entity type vector, and the word frequency co-occurrence vector of the head entity may be directly used as a part of the description feature of the head entity without participating in the attention interaction of the feature representation layer.
Further, assume a word sequence of descriptive text for the head entity
Figure 941731DEST_PATH_IMAGE048
Performing part-of-speech tagging to obtain a part-of-speech vector
Figure 122176DEST_PATH_IMAGE049
Word sequence of descriptive text for head entity
Figure 49681DEST_PATH_IMAGE048
Identifying entity type to obtain entity type vector
Figure 50523DEST_PATH_IMAGE050
Identifying according to whether the words of the description text of the head entity appear in the word sequences of the description texts of the head entity and the relation or the word sequences of the description texts of the candidate tail entity to obtain word-frequency co-occurrence vectors
Figure 837213DEST_PATH_IMAGE051
Word of the description text of the entity at the head
Figure 149246DEST_PATH_IMAGE052
Word sequences that appear in head entities and relationships
Figure 993574DEST_PATH_IMAGE053
Or word sequences of the description text of candidate tail entities
Figure 100070DEST_PATH_IMAGE054
Medium time, word frequency co-occurrence vector
Figure 374057DEST_PATH_IMAGE051
Is true 1 and vice versa is 0.
Fusing the vectors obtained by the four embedding methods to obtain the vector representation of the head entity
Figure 286518DEST_PATH_IMAGE055
Comprises the following steps:
Figure 923036DEST_PATH_IMAGE056
in addition, word sequences of head entities and relationships can be aligned
Figure 138116DEST_PATH_IMAGE057
Obtaining vector representation of head entity and relation by using word vector creation and part of speech tagging
Figure 492874DEST_PATH_IMAGE058
Comprises the following steps:
Figure 677868DEST_PATH_IMAGE059
word sequence representation for candidate tail entities as
Figure 372154DEST_PATH_IMAGE060
Obtaining a vector representation of a description text of a candidate tail entity using word vector creation
Figure 817524DEST_PATH_IMAGE061
Comprises the following steps:
Figure 456315DEST_PATH_IMAGE062
accordingly, the context coding layer may be a vector table of description texts for the head entityDisplay device
Figure 320366DEST_PATH_IMAGE055
Vector representation of head entities and relationships
Figure 931476DEST_PATH_IMAGE058
Vector representation of the description text of the candidate tail entity
Figure 612993DEST_PATH_IMAGE061
And a feature representation of the descriptive text of the head entity after the attention interaction is performed
Figure 676764DEST_PATH_IMAGE063
Characterization of head entities and relationships
Figure 610085DEST_PATH_IMAGE064
And feature representation of the description text of the candidate tail entity
Figure 872439DEST_PATH_IMAGE065
Respectively using independent Bi-LSTM networks to extract features to obtain the coding features of the description text of the head entity
Figure 662541DEST_PATH_IMAGE066
Coding features of head entities and relationships
Figure 151291DEST_PATH_IMAGE067
And the encoding characteristics of the description text of any candidate tail entity
Figure 950619DEST_PATH_IMAGE068
Is formulated as:
Figure 335989DEST_PATH_IMAGE069
based on any embodiment, the entity complementing layer comprises a feature interaction layer and a result output layer. Fig. 5 is a schematic flowchart of a feature interaction and result output method provided in the embodiment of the present invention, and as shown in fig. 5, step 123 specifically includes:
step 1231, inputting the coding features of the description texts of the head entity, the coding features of the head entity and the relation, and the description features of each candidate tail entity into a feature interaction layer of the entity completion layer, performing self-attention transformation on the coding features of the head entity and the relation by the feature interaction layer to obtain problem features, and performing attention interaction on the problem features and the coding features of the description texts of the head entity and the description features of the description texts of each candidate tail entity respectively to obtain interactive coding features of the description texts of the head entity and interactive coding features of the description texts of each candidate tail entity, which are output by the feature interaction layer;
in particular, the feature interaction layer encodes features of the inputted head entities and relationships
Figure 969096DEST_PATH_IMAGE067
Performing self-attention transformation to obtain problem features
Figure 7459DEST_PATH_IMAGE070
Is formulated as:
Figure 407216DEST_PATH_IMAGE071
feature interaction layer will issue features
Figure 581845DEST_PATH_IMAGE070
Coding features of descriptive text separately associated with header entities
Figure 917012DEST_PATH_IMAGE072
And the descriptive characteristics of the descriptive text of each candidate tail entity
Figure 973830DEST_PATH_IMAGE073
Performing attention interaction to obtain interactive coding characteristics of description texts of head entities output by the characteristic interaction layer
Figure 114961DEST_PATH_IMAGE074
And the interactive coding characteristics of the description text of each candidate tail entity
Figure 81780DEST_PATH_IMAGE075
Is formulated as:
Figure 650164DEST_PATH_IMAGE076
wherein the features are inter-coded
Figure 459858DEST_PATH_IMAGE074
Representing problem features
Figure 811204DEST_PATH_IMAGE070
Coding features of descriptive text with header entities
Figure 957496DEST_PATH_IMAGE072
And the relevance between the problem characteristics corresponding to the head entity and the relation and the coding characteristics of the description text of the head entity is represented by adopting the word-level sequence to align the coding characteristics after the attention mechanism interaction.
Inter-coding features
Figure 759099DEST_PATH_IMAGE075
Representing problem features
Figure 993771DEST_PATH_IMAGE070
Description characteristics of description text of each candidate tail entity
Figure 148809DEST_PATH_IMAGE073
And the relevance between the problem characteristics corresponding to the head entity and the relation and the description characteristics of the description text of each candidate tail entity is represented by adopting the word-level sequence to align the coding characteristics after the attention mechanism interaction.
Due to generation of problem features
Figure 214854DEST_PATH_IMAGE070
Is/are as follows
Figure 125041DEST_PATH_IMAGE077
Coding features of descriptive text of head entity
Figure 784693DEST_PATH_IMAGE078
And the description characteristics of the description text of each candidate tail entity
Figure 805738DEST_PATH_IMAGE073
The interactive coding features obtained on the basis of the semantic expression capability are obtained after the context coding layer is processed, semantic logical relations among words in texts are mined, the semantic expression capability is more accurate, the utilization degree of information in description texts by the interactive coding features obtained on the basis reaches the level of word level, and the accuracy of knowledge graph completion is greatly improved.
And step 1232, inputting the interactive coding features of the description texts of the head entity and the interactive coding features of the description texts of each candidate tail entity into a result output layer of the entity completion layer, determining the similarity between the interactive coding features of the description texts of the head entity and the interactive coding features of the description texts of each candidate tail entity by the result output layer, and completing based on the similarity corresponding to each candidate tail entity to obtain a completion result output by the result output layer.
Specifically, the similarity of the interactive coding features of the description text of the head entity and the interactive coding features of the description text of any candidate tail entity reflects the amount of coincidence in the information respectively associated with the head entity and the relationship to be complemented in the description text of the head entity and the description text of the candidate tail entity, and if the similarity is high, the more coincidence information is, and the higher the probability of taking the candidate tail entity as a completion result is.
For example, the result output layer may encode features interactively based on the descriptive text of the head entity
Figure 726290DEST_PATH_IMAGE079
Interactive coding features with description text of any candidate tail entity
Figure 479482DEST_PATH_IMAGE080
The formula for performing the similarity calculation can be expressed as:
Figure 688747DEST_PATH_IMAGE081
in the formula (I), the compound is shown in the specification,
Figure 578730DEST_PATH_IMAGE082
the corresponding similarity of the ith candidate tail entity,
Figure 557050DEST_PATH_IMAGE083
the similarity linear transformation can be obtained through custom setting or through a training mode.
Because the similarity corresponding to each candidate tail entity is an absolute value which is calculated independently, the similarities corresponding to the candidate tail entities lack mutual comparison, and the result output layer can carry out softmax processing on the similarity corresponding to the candidate tail entities, so that the candidate tail entity with the highest similarity is selected from the candidate tail entities to serve as a completion result. The similarity comparison result can be represented by the comparison sequence y:
Figure 481144DEST_PATH_IMAGE084
in the formula, m is the number of candidate tail entities.
After the softmax processing, the similarity corresponding to each candidate tail entity is mapped into the (0, 1) interval by the comparison sequence, so that the similarity of each candidate tail entity is compared with each other, the probability distribution is met, and the candidate tail entity with the maximum probability value has the highest similarity.
And finally, the result output layer outputs the candidate tail entity with the highest similarity as a completion result, so that the knowledge graph is completed.
According to the knowledge graph complementing method provided by the embodiment of the invention, the coding features of the head entity and the relation are subjected to self-attention transformation, the internal correlation of the head entity and the relation is concerned more, and the complementing efficiency of the knowledge graph spectrum complementing model is improved.
Based on any of the above embodiments, fig. 6 is a schematic diagram of a framework of a knowledge graph completion model provided by an embodiment of the present invention, and as shown in fig. 6, the knowledge graph completion model may include a feature coding layer, a candidate text association layer, and an entity completion layer. The feature coding layer comprises a feature representation layer and a context coding layer, the candidate text association layer comprises an association feature extraction layer and a description feature extraction layer, and the entity completion layer comprises a feature interaction layer and a result output layer. Inputting the description text, the head entity and the relation of the head entity corresponding to the knowledge graph to be supplemented and the description texts of a plurality of candidate tail entities into a feature representation layer of a knowledge graph spectrum supplementation model, determining the description feature of each candidate tail entity based on the relevance between the description texts of each candidate tail entity by the knowledge graph supplementation model, determining the supplementation result corresponding to the head entity and the relation based on the description feature of each candidate tail entity, outputting the supplementation result from a result output layer, and supplementing the knowledge graph.
Based on any of the above embodiments, fig. 7 is a schematic structural diagram of a knowledge-graph completing device provided in an embodiment of the present invention, as shown in fig. 7, the device includes:
a text determining unit 710, configured to determine description texts of a plurality of candidate tail entities corresponding to the head entity and the relationship to be complemented;
the map completion unit 720 is used for inputting the description texts of the head entity, the relation and the plurality of candidate tail entities into the knowledge map completion model to obtain a completion result output by the knowledge map completion model;
the knowledge graph spectrum completion model determines the description characteristics of each candidate tail entity based on the relevance between the description texts of each candidate tail entity, and determines the completion results corresponding to the head entity and the relation based on the description characteristics of each candidate tail entity;
the knowledge graph completion model is obtained by training based on the relation between the sample head entity and the sample and the description texts of a plurality of corresponding sample candidate tail entities and the sample completion result.
Specifically, the text determining unit 710 is configured to determine description texts of a plurality of candidate tail entities corresponding to the head entity and the relationship to be complemented. The atlas complementing unit 720 is configured to input the head entities and the relations determined by the text determining unit 710 and the description texts of the multiple candidate tail entities into the knowledge atlas complementing model to obtain a complementing result output by the knowledge atlas complementing model.
The knowledge graph completion model determines the relevance among the description texts of the candidate tail entities by utilizing the effective information in the description texts of the candidate tail entities based on an attention mechanism, determines the description characteristics of each candidate tail entity based on the relevance among the description texts of each candidate tail entity, determines the similarity between each candidate tail entity and the head entity and the relation based on the description characteristics of each candidate tail entity, and finally determines the completion result corresponding to the head entity and the relation.
The knowledge graph completion device provided by the embodiment of the invention determines the description characteristics of each candidate tail entity based on the relevance among the description texts of a plurality of candidate tail entities, so that the description characteristics of each candidate tail entity can fully reflect the outstanding difference of the corresponding candidate tail entity compared with the rest candidate tail entities, and the completion result corresponding to the head entity and the relation is determined based on the description characteristics of each candidate tail entity, thereby realizing the full utilization of the information contained in the description texts and improving the accuracy of knowledge graph completion.
Based on any of the above embodiments, the atlas complementing unit 720 specifically includes:
the feature coding subunit is used for inputting the head entity and the relationship and the description texts of the multiple candidate tail entities into a feature coding layer of the knowledge graph spectrum completion model to obtain the coding features of the head entity and the relationship output by the feature coding layer and the coding features of the description texts of each candidate tail entity;
the candidate text association subunit is used for inputting the coding features of the description text of each candidate tail entity to a candidate text association layer of the knowledge graph spectrum completion model to obtain the description features of each candidate tail entity output by the candidate text association layer;
and the entity completion subunit is used for inputting the coding characteristics of the head entity and the relation and the description characteristics of each candidate tail entity into an entity completion layer of the knowledge graph spectrum completion model to obtain a completion result output by the entity completion layer.
Based on any of the above embodiments, the candidate text association subunit specifically includes:
the associated feature extraction module is used for inputting the coding features of the description texts of each candidate tail entity into an associated feature extraction layer of a candidate text association layer to obtain associated features between the description texts of each candidate tail entity and the description texts of the rest candidate tail entities, which are output by the associated feature extraction layer based on an attention mechanism;
and the description feature extraction module is used for inputting the association features between any candidate tail entity and each of the rest candidate tail entities and the coding features of the description text of any candidate tail entity into a description feature extraction layer of a candidate text association layer to obtain the description features of any candidate tail entity output by the description feature extraction layer.
Based on any of the embodiments described above, the description feature extraction module is specifically configured to:
and inputting the association characteristics between any candidate tail entity and each of the rest candidate tail entities and the coding characteristics of the description text of any candidate tail entity into a description characteristic extraction layer, and fusing the coding characteristics of the description text of any candidate tail entity and the difference between the coding characteristics and the association characteristics by the description characteristic extraction layer to obtain a fusion result output by the description characteristic extraction layer as the description characteristics of any candidate tail entity.
Based on any of the above embodiments, the feature encoding subunit includes:
the feature representation module is used for inputting the description texts, the head entities and the relations of the head entities and the description texts of the candidate tail entities into a feature representation layer of the feature coding layer to obtain feature representations of the description texts, the head entities and the relations of the head entities and the feature representations of the description texts of the candidate tail entities, which are output by the feature representation layer;
and the context coding module is used for inputting the feature representation of the description text of the head entity, the feature representation of the head entity and the relationship, and the feature representation of the description texts of a plurality of candidate tail entities into a context coding layer of the feature coding layer to obtain the coding features of the description text of the head entity, the coding features of the head entity and the relationship, and the coding features of the description text of each candidate tail entity, which are output by the context coding layer.
Based on any of the embodiments described above, the feature representation module is specifically configured to:
inputting the description texts, the head entities and the relations of the head entities and the description texts of any candidate tail entities into a feature representation layer, and performing attention interaction on the vector representation of the description texts, the vector representation of the head entities and the relations of the head entities and the vector representation of the description texts of any candidate tail entities by the feature representation layer in a pairwise manner to obtain the feature representation of the description texts, the feature representation of the head entities and the relations of the head entities and the feature representation of the description texts of a plurality of candidate tail entities output by the feature representation layer.
According to any of the above embodiments, the feature representation of the descriptive text of the head entity further includes at least one of a part-of-speech vector, an entity type vector, and a word frequency co-occurrence vector of the head entity.
Based on any of the above embodiments, the entity complementing subunit specifically includes:
the feature interaction module is used for inputting the coding features of the description texts of the head entities, the coding features of the head entities and the relation and the description features of each candidate tail entity into a feature interaction layer of the entity completion layer, performing self-attention transformation on the coding features of the head entities and the relation by the feature interaction layer to obtain problem features, and performing attention interaction on the problem features and the coding features of the description texts of the head entities and the description features of the description texts of each candidate tail entity respectively to obtain interactive coding features of the description texts of the head entities and interactive coding features of the description texts of each candidate tail entity, wherein the interactive coding features of the description texts of the head entities and the interactive coding features of the description texts of each candidate tail entity are output by the feature interaction layer;
and the result output module is used for inputting the interactive coding features of the description texts of the head entity and the interactive coding features of the description texts of each candidate tail entity into a result output layer of the entity completion layer, determining the similarity between the interactive coding features of the description texts of the head entity and the interactive coding features of the description texts of each candidate tail entity by the result output layer, and completing based on the similarity corresponding to each candidate tail entity to obtain a completion result output by the result output layer.
Based on any of the above embodiments, fig. 8 is a schematic structural diagram of an electronic device provided in an embodiment of the present invention, and as shown in fig. 8, the electronic device may include: a Processor (Processor) 810, a communication Interface (Communications Interface) 820, a Memory (Memory) 830 and a communication Bus (Communications Bus) 840, wherein the Processor 810, the communication Interface 820 and the Memory 830 communicate with each other via the communication Bus 840. The processor 810 may call logical commands in the memory 830 to perform the following method:
determining a head entity to be complemented and a description text of a plurality of candidate tail entities corresponding to the relationship; inputting the head entity, the relation and the description texts of the candidate tail entities into a knowledge graph spectrum complementing model to obtain a complementing result output by the knowledge graph complementing model; the knowledge graph spectrum completion model determines the description characteristics of each candidate tail entity based on the relevance between the description texts of each candidate tail entity, and determines the completion results corresponding to the head entity and the relation based on the description characteristics of each candidate tail entity; the knowledge graph completion model is obtained by training based on the relation between the sample head entity and the sample and the description texts of a plurality of corresponding sample candidate tail entities and the sample completion result.
In addition, the logic commands in the memory 830 can be implemented in the form of software functional units and stored in a computer readable storage medium when the logic commands are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes a plurality of commands for enabling a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Embodiments of the present invention further provide a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented to perform the method provided in the foregoing embodiments when executed by a processor, and the method includes:
determining a head entity to be complemented and a description text of a plurality of candidate tail entities corresponding to the relationship; inputting the head entity, the relation and the description texts of the candidate tail entities into a knowledge graph spectrum complementing model to obtain a complementing result output by the knowledge graph complementing model; the knowledge graph spectrum completion model determines the description characteristics of each candidate tail entity based on the relevance between the description texts of each candidate tail entity, and determines the completion results corresponding to the head entity and the relation based on the description characteristics of each candidate tail entity; the knowledge graph completion model is obtained by training based on the relation between the sample head entity and the sample and the description texts of a plurality of corresponding sample candidate tail entities and the sample completion result.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes commands for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (11)

1. A method for supplementing a knowledge graph, comprising:
determining a head entity to be complemented and a description text of a plurality of candidate tail entities corresponding to the relationship;
inputting the head entity and the relation and description texts of a plurality of candidate tail entities into a knowledge graph spectrum complementing model to obtain a complementing result output by the knowledge graph complementing model;
the knowledge graph completion model determines the description characteristics of each candidate tail entity based on the relevance between the description texts of each candidate tail entity, and determines the completion results corresponding to the head entity and the relation based on the description characteristics of each candidate tail entity;
the knowledge graph completion model is obtained by training based on the relation between a sample head entity and a sample and the description texts of a plurality of corresponding sample candidate tail entities and the sample completion result.
2. The method for completing the knowledge graph according to claim 1, wherein the inputting the head entities and the relations and the description texts of the plurality of candidate tail entities into a knowledge graph spectrum completion model to obtain the completion result output by the knowledge graph completion model specifically comprises:
inputting the head entity and the relation and description texts of a plurality of candidate tail entities into a feature coding layer of the knowledge graph complementing model to obtain the coding features of the head entity and the relation output by the feature coding layer and the coding features of the description texts of each candidate tail entity;
inputting the coding features of the description text of each candidate tail entity into a candidate text association layer of the knowledge graph completion model to obtain the description features of each candidate tail entity output by the candidate text association layer;
and inputting the coding characteristics of the head entity and the relation and the description characteristics of each candidate tail entity into an entity completion layer of the knowledge graph completion model to obtain the completion result output by the entity completion layer.
3. The method according to claim 2, wherein the inputting the encoding features of the description text of each candidate tail entity into the candidate text association layer of the knowledge-graph completion model to obtain the description features of each candidate tail entity output by the candidate text association layer specifically comprises:
inputting the coding features of the description texts of each candidate tail entity into an associated feature extraction layer of the candidate text association layer to obtain associated features between the description texts of any candidate tail entity and the description texts of each other candidate tail entity, which are output by the associated feature extraction layer based on an attention mechanism;
and inputting the association characteristics between any candidate tail entity and each of the rest candidate tail entities and the coding characteristics of the description text of any candidate tail entity into a description characteristic extraction layer of the candidate text association layer to obtain the description characteristics of any candidate tail entity output by the description characteristic extraction layer.
4. The method for completing a knowledge graph according to claim 3, wherein the inputting the association features between any candidate tail entity and each of the remaining candidate tail entities and the encoding features of the description text of any candidate tail entity into a description feature extraction layer of the candidate text association layer to obtain the description features of any candidate tail entity output by the description feature extraction layer specifically comprises:
inputting the association features between any candidate tail entity and each of the rest candidate tail entities and the coding features of the description text of any candidate tail entity into the description feature extraction layer, and fusing the coding features of the description text of any candidate tail entity and the difference between the coding features and the association features by the description feature extraction layer to obtain a fusion result output by the description feature extraction layer as the description features of any candidate tail entity.
5. The method according to claim 2, wherein the inputting the head entity and the relationship and the description texts of a plurality of candidate tail entities into a feature coding layer of the knowledge graph completion model to obtain the coding features of the head entity and the relationship and the coding features of the description texts of each candidate tail entity output by the feature coding layer specifically comprises:
inputting the description texts of the head entities, the head entities and the relations and the description texts of a plurality of candidate tail entities into a feature representation layer of the feature coding layer to obtain feature representations of the description texts of the head entities, the head entities and the relations and feature representations of the description texts of the plurality of candidate tail entities, which are output by the feature representation layer;
inputting the feature representation of the description text of the head entity, the feature representation of the head entity and the relationship, and the feature representation of the description texts of a plurality of candidate tail entities into a context coding layer of the feature coding layer, and obtaining the coding features of the description text of the head entity, the coding features of the head entity and the relationship, and the coding features of the description texts of each candidate tail entity, which are output by the context coding layer.
6. The method for completing a knowledge graph according to claim 5, wherein the inputting the description texts of the head entities, the head entities and the relationships, and the description texts of the candidate tail entities into the feature representation layer of the feature coding layer, and obtaining the feature representation of the description texts of the head entities, the feature representation of the head entities and the relationships, and the feature representation of the description texts of the candidate tail entities output by the feature representation layer specifically comprises:
inputting the description text of the head entity, the head entity and the relation, and the description text of any candidate tail entity into the feature representation layer, and performing attention interaction on the vector representation of the description text of the head entity, the vector representation of the head entity and the relation, and the vector representation of the description text of any candidate tail entity by the feature representation layer to obtain the feature representation of the description text of the head entity, the feature representation of the head entity and the relation, and the feature representation of the description texts of a plurality of candidate tail entities, which are output by the feature representation layer.
7. The method of knowledgegraph completion according to claim 5 or 6, wherein the feature representation of the descriptive text of the head entity further comprises at least one of a part-of-speech vector, an entity type vector, and a word frequency co-occurrence vector of the head entity.
8. The method according to claim 5, wherein the inputting the coding features of the head entities and the relations and the description features of each candidate tail entity into an entity completion layer of the knowledge-graph completion model to obtain the completion result output by the entity completion layer specifically comprises:
inputting the coding features of the description texts of the head entities, the coding features of the head entities and the relations, and the description features of each candidate tail entity into a feature interaction layer of the entity completion layer, performing self-attention transformation on the coding features of the head entities and the relations by the feature interaction layer to obtain problem features, and performing attention interaction on the problem features with the coding features of the description texts of the head entities and the description features of the description texts of each candidate tail entity respectively to obtain interactive coding features of the description texts of the head entities and interactive coding features of the description texts of each candidate tail entity, which are output by the feature interaction layer;
inputting the interactive coding features of the description texts of the head entity and the interactive coding features of the description texts of each candidate tail entity into a result output layer of the entity completion layer, determining the similarity between the interactive coding features of the description texts of the head entity and the interactive coding features of the description texts of each candidate tail entity by the result output layer, and completing based on the similarity corresponding to each candidate tail entity to obtain the completion result output by the result output layer.
9. A knowledge graph complementing device, comprising:
the text determining unit is used for determining a head entity to be complemented and description texts of a plurality of candidate tail entities corresponding to the relations;
the map completion unit is used for inputting the description texts of the head entity, the relation and the candidate tail entities into a knowledge map completion model to obtain a completion result output by the knowledge map completion model;
the knowledge graph completion model determines the description characteristics of each candidate tail entity based on the relevance between the description texts of each candidate tail entity, and determines the completion results corresponding to the head entity and the relation based on the description characteristics of each candidate tail entity;
the knowledge graph completion model is obtained by training based on the relation between a sample head entity and a sample and the description texts of a plurality of corresponding sample candidate tail entities and the sample completion result.
10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the knowledge-graph complementing method of any one of claims 1 to 8 when executing the computer program.
11. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the steps of the knowledge-graph complementation method of any of claims 1 to 8.
CN202011168569.0A 2020-10-28 2020-10-28 Knowledge graph complementing method and device, electronic equipment and storage medium Active CN112000815B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011168569.0A CN112000815B (en) 2020-10-28 2020-10-28 Knowledge graph complementing method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011168569.0A CN112000815B (en) 2020-10-28 2020-10-28 Knowledge graph complementing method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112000815A true CN112000815A (en) 2020-11-27
CN112000815B CN112000815B (en) 2021-03-02

Family

ID=73475193

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011168569.0A Active CN112000815B (en) 2020-10-28 2020-10-28 Knowledge graph complementing method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112000815B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112560477A (en) * 2020-12-09 2021-03-26 中科讯飞互联(北京)信息科技有限公司 Text completion method, electronic device and storage device
CN113360675A (en) * 2021-06-25 2021-09-07 中关村智慧城市产业技术创新战略联盟 Knowledge graph specific relation completion method based on Internet open world
CN113360664A (en) * 2021-05-31 2021-09-07 电子科技大学 Knowledge graph complementing method
CN113569056A (en) * 2021-07-27 2021-10-29 科大讯飞(苏州)科技有限公司 Knowledge graph complementing method and device, electronic equipment and storage medium
CN113626610A (en) * 2021-08-10 2021-11-09 南方电网数字电网研究院有限公司 Knowledge graph embedding method and device, computer equipment and storage medium
CN114579762A (en) * 2022-03-04 2022-06-03 腾讯科技(深圳)有限公司 Knowledge graph alignment method, device, equipment, storage medium and program product
CN116842958A (en) * 2023-09-01 2023-10-03 北京邮电大学 Time sequence knowledge graph completion method, entity prediction method based on time sequence knowledge graph completion method and device thereof
CN117094395A (en) * 2023-10-19 2023-11-21 腾讯科技(深圳)有限公司 Method, device and computer storage medium for complementing knowledge graph

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112560477A (en) * 2020-12-09 2021-03-26 中科讯飞互联(北京)信息科技有限公司 Text completion method, electronic device and storage device
CN112560477B (en) * 2020-12-09 2024-04-16 科大讯飞(北京)有限公司 Text completion method, electronic equipment and storage device
CN113360664A (en) * 2021-05-31 2021-09-07 电子科技大学 Knowledge graph complementing method
CN113360664B (en) * 2021-05-31 2022-03-25 电子科技大学 Knowledge graph complementing method
CN113360675B (en) * 2021-06-25 2024-02-13 中关村智慧城市产业技术创新战略联盟 Knowledge graph specific relationship completion method based on Internet open world
CN113360675A (en) * 2021-06-25 2021-09-07 中关村智慧城市产业技术创新战略联盟 Knowledge graph specific relation completion method based on Internet open world
CN113569056A (en) * 2021-07-27 2021-10-29 科大讯飞(苏州)科技有限公司 Knowledge graph complementing method and device, electronic equipment and storage medium
CN113626610A (en) * 2021-08-10 2021-11-09 南方电网数字电网研究院有限公司 Knowledge graph embedding method and device, computer equipment and storage medium
CN114579762A (en) * 2022-03-04 2022-06-03 腾讯科技(深圳)有限公司 Knowledge graph alignment method, device, equipment, storage medium and program product
CN114579762B (en) * 2022-03-04 2024-03-22 腾讯科技(深圳)有限公司 Knowledge graph alignment method, device, equipment, storage medium and program product
CN116842958B (en) * 2023-09-01 2024-02-06 北京邮电大学 Time sequence knowledge graph completion method, entity prediction method based on time sequence knowledge graph completion method and device thereof
CN116842958A (en) * 2023-09-01 2023-10-03 北京邮电大学 Time sequence knowledge graph completion method, entity prediction method based on time sequence knowledge graph completion method and device thereof
CN117094395B (en) * 2023-10-19 2024-02-09 腾讯科技(深圳)有限公司 Method, device and computer storage medium for complementing knowledge graph
CN117094395A (en) * 2023-10-19 2023-11-21 腾讯科技(深圳)有限公司 Method, device and computer storage medium for complementing knowledge graph

Also Published As

Publication number Publication date
CN112000815B (en) 2021-03-02

Similar Documents

Publication Publication Date Title
CN112000815B (en) Knowledge graph complementing method and device, electronic equipment and storage medium
EP3964998A1 (en) Text processing method and model training method and apparatus
Chen et al. Semantically conditioned dialog response generation via hierarchical disentangled self-attention
RU2619193C1 (en) Multi stage recognition of the represent essentials in texts on the natural language on the basis of morphological and semantic signs
CN105608218B (en) The method for building up of intelligent answer knowledge base establishes device and establishes system
CN110121705A (en) Pragmatics principle is applied to the system and method interacted with visual analysis
CN112270196B (en) Entity relationship identification method and device and electronic equipment
CN110019732B (en) Intelligent question answering method and related device
JP2020027649A (en) Method, apparatus, device and storage medium for generating entity relationship data
CN111966812B (en) Automatic question answering method based on dynamic word vector and storage medium
US20220138193A1 (en) Conversion method and systems from natural language to structured query language
CN115470338B (en) Multi-scenario intelligent question answering method and system based on multi-path recall
CN113094512B (en) Fault analysis system and method in industrial production and manufacturing
WO2024098524A1 (en) Text and video cross-searching method and apparatus, model training method and apparatus, device, and medium
CN114997181A (en) Intelligent question-answering method and system based on user feedback correction
CN114612921A (en) Form recognition method and device, electronic equipment and computer readable medium
CN110347812A (en) A kind of search ordering method and system towards judicial style
CN110929022A (en) Text abstract generation method and system
CN106484660A (en) Title treating method and apparatus
Pinheiro et al. ChartText: Linking Text with Charts in Documents
CN116226638A (en) Model training method, data benchmarking method, device and computer storage medium
CN113987536A (en) Method and device for determining security level of field in data table, electronic equipment and medium
WO2022141855A1 (en) Text regularization method and apparatus, and electronic device and storage medium
CN117371534B (en) Knowledge graph construction method and system based on BERT
Liu et al. An Enhanced ESIM Model for Sentence Pair Matching with Self-Attention.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant