CN112000815A - Knowledge graph complementing method and device, electronic equipment and storage medium - Google Patents
Knowledge graph complementing method and device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN112000815A CN112000815A CN202011168569.0A CN202011168569A CN112000815A CN 112000815 A CN112000815 A CN 112000815A CN 202011168569 A CN202011168569 A CN 202011168569A CN 112000815 A CN112000815 A CN 112000815A
- Authority
- CN
- China
- Prior art keywords
- entity
- description
- candidate tail
- head
- layer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the invention provides a knowledge graph complementing method, a knowledge graph complementing device, electronic equipment and a storage medium, wherein the method comprises the following steps: determining a head entity to be complemented and a description text of a plurality of candidate tail entities corresponding to the relationship; inputting the head entity and the relation and description texts of a plurality of candidate tail entities into a knowledge graph spectrum complementing model to obtain a complementing result output by the knowledge graph complementing model; the knowledge graph completion model determines the description characteristics of each candidate tail entity based on the relevance between the description texts of each candidate tail entity, and determines the completion results corresponding to the head entity and the relation based on the description characteristics of each candidate tail entity. The method, the device, the electronic equipment and the storage medium provided by the embodiment of the invention realize the full utilization of the information contained in the description text and improve the accuracy of the completion of the knowledge graph.
Description
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a knowledge graph complementing method and device, electronic equipment and a storage medium.
Background
In order to complement the missing relationship between the entities in the knowledge-graph, the knowledge can be obtained by utilizing the resources in the open world, and new entities which do not appear in the knowledge-graph are added into the knowledge-graph.
In the prior art, the knowledge graph is usually complemented by using entity description text in the open world or jointly using the entity description text and the topological structure of the knowledge graph. The complexity of knowledge in the open world is high, so that the utilization rate of an entity description text by the conventional completion method is low, and the accuracy of a completed knowledge graph is poor.
Disclosure of Invention
The embodiment of the invention provides a knowledge graph complementing method, a knowledge graph complementing device, electronic equipment and a storage medium, and aims to solve the problems that an existing knowledge graph complementing method is low in utilization rate of an entity description text and poor in accuracy of a complemented knowledge graph.
In a first aspect, an embodiment of the present invention provides a knowledge graph completing method, including:
determining a head entity to be complemented and a description text of a plurality of candidate tail entities corresponding to the relationship;
inputting the head entity and the relation and description texts of a plurality of candidate tail entities into a knowledge graph spectrum complementing model to obtain a complementing result output by the knowledge graph complementing model;
the knowledge graph completion model determines the description characteristics of each candidate tail entity based on the relevance between the description texts of each candidate tail entity, and determines the completion results corresponding to the head entity and the relation based on the description characteristics of each candidate tail entity;
the knowledge graph completion model is obtained by training based on the relation between a sample head entity and a sample and the description texts of a plurality of corresponding sample candidate tail entities and the sample completion result.
Optionally, the inputting the description texts of the head entity and the relationship and the plurality of candidate tail entities into a knowledge graph spectrum completion model to obtain a completion result output by the knowledge graph completion model specifically includes:
inputting the head entity and the relation and description texts of a plurality of candidate tail entities into a feature coding layer of the knowledge graph complementing model to obtain the coding features of the head entity and the relation output by the feature coding layer and the coding features of the description texts of each candidate tail entity;
inputting the coding features of the description text of each candidate tail entity into a candidate text association layer of the knowledge graph completion model to obtain the description features of each candidate tail entity output by the candidate text association layer;
and inputting the coding characteristics of the head entity and the relation and the description characteristics of each candidate tail entity into an entity completion layer of the knowledge graph completion model to obtain the completion result output by the entity completion layer.
Optionally, the inputting the coding feature of the description text of each candidate tail entity into the candidate text association layer of the knowledge graph completion model to obtain the description feature of each candidate tail entity output by the candidate text association layer specifically includes:
inputting the coding features of the description texts of each candidate tail entity into an associated feature extraction layer of the candidate text association layer to obtain associated features between the description texts of any candidate tail entity and the description texts of each other candidate tail entity, which are output by the associated feature extraction layer based on an attention mechanism;
and inputting the association characteristics between any candidate tail entity and each of the rest candidate tail entities and the coding characteristics of the description text of any candidate tail entity into a description characteristic extraction layer of the candidate text association layer to obtain the description characteristics of any candidate tail entity output by the description characteristic extraction layer.
Optionally, the inputting the association feature between any candidate tail entity and each of the remaining candidate tail entities and the coding feature of the description text of any candidate tail entity into a description feature extraction layer of the candidate text association layer to obtain the description feature of any candidate tail entity output by the description feature extraction layer specifically includes:
inputting the association features between any candidate tail entity and each of the rest candidate tail entities and the coding features of the description text of any candidate tail entity into the description feature extraction layer, and fusing the coding features of the description text of any candidate tail entity and the difference between the coding features and the association features by the description feature extraction layer to obtain a fusion result output by the description feature extraction layer as the description features of any candidate tail entity.
Optionally, the inputting the head entity and the relationship and the description texts of a plurality of candidate tail entities into a feature coding layer of the knowledge graph completion model to obtain the coding features of the head entity and the relationship output by the feature coding layer and the coding features of the description texts of each candidate tail entity specifically includes:
inputting the description texts of the head entities, the head entities and the relations and the description texts of a plurality of candidate tail entities into a feature representation layer of the feature coding layer to obtain feature representations of the description texts of the head entities, the head entities and the relations and feature representations of the description texts of the plurality of candidate tail entities, which are output by the feature representation layer;
inputting the feature representation of the description text of the head entity, the feature representation of the head entity and the relationship, and the feature representation of the description texts of a plurality of candidate tail entities into a context coding layer of the feature coding layer, and obtaining the coding features of the description text of the head entity, the coding features of the head entity and the relationship, and the coding features of the description texts of each candidate tail entity, which are output by the context coding layer.
Optionally, the inputting the description text of the head entity, the head entity and the relationship, and the description texts of a plurality of candidate tail entities into the feature representation layer of the feature coding layer to obtain the feature representation of the description text of the head entity, the feature representation of the head entity and the relationship, and the feature representation of the description texts of the plurality of candidate tail entities, which are output by the feature representation layer specifically includes:
inputting the description text of the head entity, the head entity and the relation, and the description text of any candidate tail entity into the feature representation layer, and performing attention interaction on the vector representation of the description text of the head entity, the vector representation of the head entity and the relation, and the vector representation of the description text of any candidate tail entity by the feature representation layer to obtain the feature representation of the description text of the head entity, the feature representation of the head entity and the relation, and the feature representation of the description texts of a plurality of candidate tail entities, which are output by the feature representation layer.
Optionally, the feature representation of the descriptive text of the head entity further includes at least one of a part-of-speech vector, an entity type vector, and a word frequency co-occurrence vector of the head entity.
Optionally, the inputting the coding features of the head entity and the relationship and the description features of each candidate tail entity into an entity completion layer of the knowledge graph completion model to obtain the completion result output by the entity completion layer specifically includes:
inputting the coding features of the description texts of the head entities, the coding features of the head entities and the relations, and the description features of each candidate tail entity into a feature interaction layer of the entity completion layer, performing self-attention transformation on the coding features of the head entities and the relations by the feature interaction layer to obtain problem features, and performing attention interaction on the problem features with the coding features of the description texts of the head entities and the description features of the description texts of each candidate tail entity respectively to obtain interactive coding features of the description texts of the head entities and interactive coding features of the description texts of each candidate tail entity, which are output by the feature interaction layer;
inputting the interactive coding features of the description texts of the head entity and the interactive coding features of the description texts of each candidate tail entity into a result output layer of the entity completion layer, determining the similarity between the interactive coding features of the description texts of the head entity and the interactive coding features of the description texts of each candidate tail entity by the result output layer, and completing based on the similarity corresponding to each candidate tail entity to obtain the completion result output by the result output layer.
In a second aspect, an embodiment of the present invention provides a knowledge graph spectrum complementing apparatus, including:
the text determining unit is used for determining a head entity to be complemented and description texts of a plurality of candidate tail entities corresponding to the relations;
the map completion unit is used for inputting the description texts of the head entity, the relation and the candidate tail entities into a knowledge map completion model to obtain a completion result output by the knowledge map completion model;
the knowledge graph completion model determines the description characteristics of each candidate tail entity based on the relevance between the description texts of each candidate tail entity, and determines the completion results corresponding to the head entity and the relation based on the description characteristics of each candidate tail entity;
the knowledge graph completion model is obtained by training based on the relation between a sample head entity and a sample and the description texts of a plurality of corresponding sample candidate tail entities and the sample completion result.
In a third aspect, an embodiment of the present invention provides an electronic device, including a processor, a communication interface, a memory, and a bus, where the processor and the communication interface, the memory complete mutual communication through the bus, and the processor may call a logic command in the memory to perform the steps of the method provided in the first aspect.
In a fourth aspect, an embodiment of the present invention provides a non-transitory computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of the method as provided in the first aspect.
According to the knowledge graph completion method, the knowledge graph completion device, the electronic equipment and the storage medium, the description characteristics of each candidate tail entity are determined based on the relevance among the description texts of a plurality of candidate tail entities, so that the description characteristics of each candidate tail entity can fully reflect the outstanding difference of the corresponding candidate tail entity compared with the rest candidate tail entities, the completion result corresponding to the head entity and the relation is determined based on the description characteristics of each candidate tail entity, the information contained in the description texts is fully utilized, and the accuracy of knowledge graph completion is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a schematic flow chart of a method for supplementing a knowledge graph according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of an operation flow of a knowledge-graph completion result model according to an embodiment of the present invention;
fig. 3 is a schematic flowchart of a method for determining description characteristics of candidate tail entities according to an embodiment of the present invention;
fig. 4 is a schematic operation flow diagram of a feature encoding layer according to an embodiment of the present invention;
FIG. 5 is a flowchart illustrating a feature interaction and result output method according to an embodiment of the present invention;
FIG. 6 is a block diagram of a knowledge-graph compensation model according to an embodiment of the present invention;
FIG. 7 is a schematic structural diagram of a knowledge-graph complementing device according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Knowledge Graph (KG) aims to construct a database of structured information, represent objects (such as proper nouns like names of people, places, and organizations) and abstract concepts in the world as entities, and represent interactions and connections between entities as relationships. The entity and the relationship between the entities form a huge graph, wherein the entities are nodes in the graph, and the relationship is used as an edge in the graph. In a knowledge graph, the world's vast knowledge is represented as triplets (triplets) between entities using relationships as connections.
The completion of the knowledge-graph aims to find out missing parts in triples (head entities, relations and tail entities) in the knowledge-graph, so that the knowledge-graph becomes more complete. Knowledge in the open world is usually used for complementing the knowledge map, and due to the high complexity of the knowledge in the open world, the utilization rate of an entity description text by the conventional complementing method is low, and the accuracy of the complementing of the knowledge map is poor.
In order to overcome the defects in the prior art, fig. 1 is a schematic flow chart of a knowledge graph completion method provided by an embodiment of the present invention, and as shown in fig. 1, the method includes:
specifically, the description text of the entity is a text for interpreting the entity using a natural language. Description texts of entities in the open world can be from online encyclopedia or from public corpora, such as news corpora, english corpora, chinese corpora, and the like.
The triple only has a head entity and a relation, when no tail entity exists, the head entity and the relation can be used as the head entity and the relation to be completed, and the missing tail entity in the triple is the part needing to be completed. For the head entities and the relations to be complemented, a plurality of candidate tail entities exist in the open world, and each candidate tail entity corresponds to a description text. For example, the head entities "actor a" and relationship "incumbent wife" to be complemented, there are two candidate tail entities "actress B" and "actress C" in the open world, where the description text of "actress B" and "actress C" are "actor a and actress B have a wedding this year's day" and "actor a and actress C have finished marriage in the past secret", respectively.
The process of complementing the knowledge-graph by using the open world knowledge can be understood as finding the tail entity which best meets the requirement of the relationship from the description texts of the entities in the open world according to the head entity and the relationship in the knowledge-graph. In the above example, the task of the knowledge-graph completion is to determine the tail entity that best meets the question "who is the incumbent wife of the actor a" corresponding to the head entity and the relationship to be completed from the description text of the candidate tail entities.
Step 120, inputting the head entity and the relation and description texts of the candidate tail entities into a knowledge graph spectrum complementing model to obtain a complementing result output by the knowledge graph complementing model;
the knowledge graph spectrum completion model determines the description characteristics of each candidate tail entity based on the relevance between the description texts of each candidate tail entity, and determines the completion results corresponding to the head entity and the relation based on the description characteristics of each candidate tail entity;
the knowledge graph completion model is obtained by training based on the relation between the sample head entity and the sample and the description texts of a plurality of corresponding sample candidate tail entities and the sample completion result.
Specifically, the description texts of different candidate tail entities are not completely independent, and there may be some mutual association or coincidence between the description texts of different candidate tail entities. The relevance between the description texts of the candidate tail entities can be expressed as the correlation degree of semantic information in the description texts of the candidate tail entities.
After the knowledge graph complementing model obtains the description texts of the candidate tail entities, the relevance between the description texts of the candidate tail entities can be captured, on the basis, aiming at the description text of one candidate tail entity, the influence of each candidate tail entity on the candidate tail entity in the selection process of the candidate tail entity can be simulated according to the relevance between the description text of the candidate tail entity and the description texts of each candidate tail entity, and then the description characteristics capable of fully reflecting the outstanding difference of the candidate tail entity compared with each candidate tail entity are obtained. Here, the description feature of any candidate tail entity is determined based on the description text of the candidate tail entity and the association between the description texts of the remaining candidate tail entities and the description text of the candidate tail entity, and is used for reflecting the feature of the description text of the candidate tail entity, which is different from the description texts of the other candidate tail entities.
The knowledge graph completion model obtains the description characteristics of each candidate tail entity, and can select the tail entity corresponding to the head entity and the relation from all candidate tail entities as a completion result based on the description characteristics of each candidate tail entity and by combining the information carried in the head entity and the relation to be completed.
For example, in the above example, the head entities "actor a" and relationship "incumbent wife" and the description texts "actor a and actress B have a wedding this year in their original date" and "actor a and actress C have finished marriage in the past secret" of two candidate tail entities are input into the knowledge spectrum complementation model, the knowledge spectrum complementation model captures the correlation between the description texts of the two candidate tail entities, and further weakens the information with higher correlation in the description texts of the two candidate tail entities, such as "actor a", and further highlights the information with lower correlation and higher differentiation in the description texts, such as "wedding show", "marriage finish", and the like, so as to obtain the description features that the two candidate tail entities respectively highlight "wedding show" and "marriage finish", on the basis of which, the knowledge spectrum complementation model combines the head entities "actor a" and relationship "incumbent wife" to be complemented, based on the description characteristics of the two candidate tail entities, a tail entity 'actress B' with the description characteristics more conforming to the 'incumbent wife of the actor A' is selected from the two candidate tail entities and is output as a completion result.
Before step 120 is executed, the knowledge-graph completion model may be obtained by pre-training, and specifically, the knowledge-graph completion model may be obtained by the following training methods: firstly, a large number of sample head entities and sample relations and description texts of a plurality of sample candidate tail entities corresponding to each sample head entity and sample relation are collected. And obtaining a sample completion result corresponding to each sample head entity and the sample relation through manual marking. And then inputting a large number of description texts of the sample head entities and the sample relations, a plurality of sample candidate tail entities corresponding to each sample head entity and each sample relation, and a sample completion result corresponding to each sample head entity and each sample relation into the initial model for training, thereby obtaining a knowledge graph completion model.
The method for completing the knowledge graph provided by the embodiment of the invention determines the description characteristics of each candidate tail entity based on the relevance among the description texts of a plurality of candidate tail entities, so that the description characteristics of each candidate tail entity can fully reflect the outstanding difference of the corresponding candidate tail entity compared with the rest candidate tail entities, and determines the completion result corresponding to the head entity and the relation based on the description characteristics of each candidate tail entity, thereby realizing the full utilization of the information contained in the description texts and improving the accuracy of the completion of the knowledge graph.
Based on the above embodiment, the knowledge graph completion model includes a feature coding layer, a candidate text association layer, and an entity completion layer. Correspondingly, fig. 2 is a schematic view of an operation flow of the knowledge-graph completion model provided in the embodiment of the present invention, and as shown in fig. 2, step 120 specifically includes:
specifically, a head entity and a relationship and description texts of a plurality of candidate tail entities are input to a feature coding layer, and feature coding is respectively carried out on the input head entity and relationship and the description texts of the plurality of candidate tail entities by the feature coding layer, so that coding features of the head entity and the relationship and coding features of the description texts of each candidate tail entity are obtained.
Step 122, inputting the coding features of the description text of each candidate tail entity into a candidate text association layer of a knowledge graph spectrum completion model to obtain the description features of each candidate tail entity output by the candidate text association layer;
specifically, the candidate text association layer may capture the association between the description texts of each candidate tail entity according to the coding features of the input description texts of each candidate tail entity, weaken information with a higher degree of correlation in the description texts of each candidate tail entity, and highlight information with a lower degree of association and a higher degree of distinction in the description texts, thereby obtaining the description features of each candidate tail entity. It should be noted that, here, the capturing of the relevance between the description texts of each candidate tail entity can be realized through an attention interaction mechanism.
And step 123, inputting the coding characteristics of the head entity and the relation and the description characteristics of each candidate tail entity into an entity completion layer of the knowledge graph spectrum completion model to obtain a completion result output by the entity completion layer.
Specifically, the entity completion layer determines candidate tail entities corresponding to the head entities and the relations as completion results output by the entity completion layer according to the input description features of each candidate tail entity and the coding features of the head entities and the relations. For example, the completion result may be determined by calculating the matching degree between each candidate tail entity and the head entity and the relationship according to the description features of each candidate tail entity and the coding features of the head entity and the relationship.
Based on any embodiment, the candidate text association layer comprises an association feature extraction layer and a description feature extraction layer. Fig. 3 is a flowchart illustrating a method for determining description features of candidate tail entities according to an embodiment of the present invention, and as shown in fig. 3, step 122 specifically includes:
specifically, since a plurality of candidate tail entities are not independent and are in mutual connection, part of information in the description text of any one candidate tail entity may appear in the description text of each remaining candidate tail entity multiple times. The repeated occurrence of the partial information weakens the information with prominent difference in the description text of each candidate tail entity, and the interference is brought to the completion of the knowledge graph, so that a plurality of candidate tail entities are difficult to be accurately distinguished, and the completion accuracy of the knowledge graph is low. For example, in the above example, the description texts "actor a and actress B have a wedding this year and" actor a and actress C have a confidential marriage in the last year "of the two candidate tail entities have the same information" actor a "and highly correlated information" marriage "and" wedding ", which weaken the information that can represent the beginning and end of a spouse relationship with a prominent distinction in the two description texts, so that the candidate tail entities" actress B "and" actress C "become indistinguishable.
Aiming at the problems, the relevance feature extraction layer carries out attention interaction on the coding features of the input description texts of any candidate tail entity and the coding features of the input description texts of each candidate tail entity on the basis of an attention mechanism, and obtains the relevance features between the description texts of any candidate tail entity and the description texts of each candidate tail entity.
Here, the association feature is a feature that any candidate tail entity is associated with the remaining candidate tail entities in the description text, and may be used to characterize influence factors of each remaining candidate tail entity on the candidate tail entity.
Further, can useA description text representing the ith candidate tail entityCoding characteristics of description text representing ith candidate tail entityRepresenting the encoding characteristics of the description text of the jth candidate tail entity, the correlation between the ith candidate tail entity and the jth candidate tail entityCan be formulated as:
in the formula (I), the compound is shown in the specification,for the correlation calculation matrix, the matrix can be obtained by means of model training.
On the basis of the above, toNormalization is carried out, and the correlation coefficient of the jth candidate tail entity relative to the ith candidate tail entity can be obtainedIt can be formulated as:
wherein m is the number of candidate tail entities, k is the label of the candidate tail entity,is based on natural constantBottom fingerA function of the number.
The associated characteristics between the description text of the ith candidate tail entity and the description texts of each of the other candidate tail entitiesCan be formulated as:
in the formula, k is the label of the candidate tail entity.
Specifically, the description feature extraction layer is used for extracting the description features of any candidate tail entity in combination with the association features between the candidate tail entity and each of the rest candidate tail entities and the coding features of the description text of the candidate tail entity.
Here, the description feature of any candidate tail entity may be obtained by processing the association feature and the coding feature in a summation, a difference calculation or other operation manner, which is not specifically limited in the embodiment of the present invention.
Based on any of the above embodiments, step 1222 specifically includes:
and inputting the association characteristics between any candidate tail entity and each of the rest candidate tail entities and the coding characteristics of the description text of the candidate tail entity into a description characteristic extraction layer, and fusing the coding characteristics of the description text of the candidate tail entity and the difference between the coding characteristics and the association characteristics by the description characteristic extraction layer to obtain a fusion result output by the description characteristic extraction layer as the description characteristics of the candidate tail entity.
Specifically, the description feature extraction layer extracts the coding features of the description text of the ith candidate tail entityAnd coding featuresAnd associated featuresThe difference between the two is fused to obtain a fusion resultAs a descriptive feature of the candidate tail entity. Further, the description feature of any candidate tail entity here can be directly expressed as the encoding feature of the description text of the candidate tail entity, and the concatenation form of the difference between the encoding feature and the associated feature, and is formulated as:
wherein, the difference value of the coding feature of the description text of any candidate tail entity and the correlation feature between the candidate tail entity and each of the other candidate tail entitiesThe influence of the relevance between the description texts of the candidate tail entities on the candidate tail entities is eliminated, so that the information that the candidate tail entities have distinctiveness compared with each of the rest candidate tail entities can be highlighted; in addition, the coding features of the description texts of the candidate tail entities are reserved, so that any information of the description texts cannot be omitted in the description features of the candidate tail entities, and a knowledge graph completion model can conveniently and accurately complete the description.
According to any of the above embodiments, the feature encoding layer includes a feature representation layer and a context encoding layer. Fig. 4 is a schematic diagram of an operation flow of a feature encoding layer according to an embodiment of the present invention, as shown in fig. 4, step 121 includes:
specifically, in the knowledge of the open world, the description text of the head entity contains a lot of description information related to the head entity, the description information is likely to appear in the description text of the candidate tail entity, and the evaluation basis of the knowledge-graph completion model on the matching degree between the head entity, the relation and the tail entity can be refined by fully utilizing the description information. Therefore, the embodiment of the invention also takes the description text of the head entity as one input of the knowledge graph completion model, thereby improving the accuracy of the knowledge graph completion.
Further, the combination of the head entity to be complemented and the relationship may constitute a form of a question, for example, the head entity is "actor a", the relationship is "incumbent wife", and the question that the two are combined is "who the incumbent wife of actor a is". The representation mode of the problem form is more helpful for the knowledge graph completion model to understand the information required to be satisfied by the candidate tail entity for completion. Therefore, in the embodiment of the invention, the combination of the head entity to be complemented and the relation is also used as one input of the knowledge graph complementing model.
Assume that the description text of the head entity isHead entity and relationship areAnd a plurality of candidate tail entities as a description textThen, after the word segmentation is performed on each text, a word sequence corresponding to each text can be obtained, which is specifically expressed as:
the word sequence of the descriptive text of the head entity is represented asWherein, in the step (A),description text as head entityZ is the number of any word in the descriptive text of the head entity.
Word sequence representation of head entities and relationships asWherein, in the step (A),as head entities and relationshipsJ is the label of any word in the head entity and relationship.
Word sequences of candidate tail entities are represented asWherein, in the step (A),description text of any candidate tail entityN is the description text of any candidate tail entityThe index of any word, i is the index of any candidate tail entity.
Description text of head entity to be input by feature representation layerHead entity and relationshipAnd a description text of a plurality of candidate tail entitiesEach word in the text is subjected to vector conversion to obtain a semantic vector with a fixed length as a feature representation of a description text of a head entity output by a feature representation layerCharacterization of head entities and relationshipsAnd a feature representation of the description text for each candidate tail entity。
Specifically, the descriptive text of the head entity, the head entity and the relationship, and the feature representations of the descriptive text of the plurality of candidate tail entities are all composed of word sequences. Due to the fact that knowledge in the open world is complex, the description text of the entity contains a large number of words, and word sequences are long. There is a semantic logical relationship between the words in the sequence, i.e. information expressed by one of the words is often associated with information expressed by several words before and after the word in the sequence.
In order to improve the semantic representation capability of the features, the context coding layer carries out semantic coding based on the context on the feature representation of the description text of the input head entity, the feature representation of the head entity and the relationship and the feature representation of the description texts of a plurality of candidate tail entities respectively, so as to obtain the coding features of the description text of the head entity, the coding features of the head entity and the relationship and the coding features of the description text of each candidate tail entity. Here, the context coding layer may employ a Bi-directional Long Short-Term Memory (Bi-LSTM) network.
In the embodiment of the invention, the description text, the head entity and relationship of the head entity and the description texts of the candidate tail entities are used as the input of the knowledge graph complementing model, the knowledge graph complementing model carries out context-based semantic coding on each input text, the semantic logical relationship existing in the semanteme among all words in each text is mined, and the semantic representation capability of the coding features corresponding to each text is improved.
Based on any of the above embodiments, step 1211 specifically includes:
inputting the description texts, the head entities and the relations of the head entities and the description texts of any candidate tail entities into a feature representation layer, and performing attention interaction on the vector representation of the description texts, the vector representation of the head entities and the relations of the head entities and the vector representation of the description texts of any candidate tail entities by the feature representation layer in a pairwise manner to obtain the feature representation of the description texts, the feature representation of the head entities and the relations of the head entities and the feature representation of the description texts of a plurality of candidate tail entities output by the feature representation layer.
Specifically, the feature representation layer is used for performing attention interaction on the vector representation of the description text of the head entity, the vector representation of the head entity and the relation, and the vector representation of the description text of any candidate tail entity in pairs, wherein the attention interaction between the vector representation of the description text of the head entity and the vector representation of the head entity and the relation is helpful for highlighting information associated with the head entity and the relation in the description text of the head entity; attention interaction between the vector representation of the description text of any candidate tail entity and the vector representations of the head entity and the relation helps to highlight information related to the head entity and the relation in the description text of the candidate tail entity; the attention interaction between the vector representation of the description text of the head entity and the vector representation of the description text of any candidate tail entity is helpful for highlighting the information in the description text of the candidate tail entity, which is associated with the description text of the head entity.
Optionally, the feature representation of the descriptive text of the head entity obtained thereby may be a representation after combining the attention interaction results with the head entity and the relationship; the feature representation of the head entity and the relationship may be the vector representation of the head entity and the relationship itself; the feature representation of the descriptive text of any candidate tail entity may be a representation combining the attention interaction results with the head entity and the relationship, respectively, and the attention interaction results with the descriptive text of the head entity.
Further, the feature representation layer may employ the same word-level sequence-aligned attention mechanism for each attention interaction, for a given input word vector X and word vector sequenceThe attention function is:
wherein p is a word vector sequenceThe total number of word vectors in (a), q is the index of a word vector in the word vector sequence Y,as a word vector X and a word vectorThe attention coefficient between can be formulated as:
in the formula (I), the compound is shown in the specification,in order to have the ReLU activation function, W is a linear transformation matrix and can be obtained through a model training mode.
The attention interaction results between each word in the description text of the head entity and each word in the head entity and the relationship, between each word in the description text of any candidate tail entity and each word in the head entity and the relationship, and between each word in the description text of any candidate tail entity and each word in the description text of the head entity are solved in turn according to the word-level sequence alignment attention mechanism.
Before the attention interaction, corresponding word vector sequences may be respectively created according to the description text, the head entity and the relationship of the head entity and the word sequence of the description text of any candidate tail entity by using a Global word vector (GloVe) method, and may be used as vector representations, for example, the word vector of the description text of the head entity may be used as a vector representationWord vectors of head entities and relationshipsAnd a word vector of the description text of any candidate tail entityRespectively as the descriptive text of the head entity, the head entity and the relation, and the vector representation of the descriptive text of any candidate tail entity.
Vector representation of descriptive text of head entities by means of feature representation layersWord vectors of head entities and relationshipsPerforming attention interaction to obtain the attention interaction result between the description text of the head entity and the relationship:
Feature representation of head entities and relationships through a feature representation layerAnd a word vector of the description text of any candidate tail entityPerforming attention interaction to obtain the attention interaction result between the description text of any candidate tail entity and the head entity and relation:
Word vector of description text of head entity through feature representation layerAnd word vectors of description text of any candidate tail entityPerforming attention interaction to obtain the attention interaction result between the description text of any candidate tail entity and the description text of the head entity:
Finally, the feature representation layer represents the vector of the description text of the head entity and the attention interaction result between the description text of the head entity and the relationshipIn combination, as the feature representation of the description text of the head entity, the vector representation of the head entity and the relation is taken as the feature representation of the head entity and the relation, the vector representation of the description text of any candidate tail entity and the attention interaction result of the description text of the candidate tail entity and the description text of the head entity, the head entity and the relation respectively 、As a feature representation of the descriptive text of the candidate tail entity.
According to any of the above embodiments, the feature representation of the descriptive text of the head entity further includes at least one of a part-of-speech vector, an entity type vector, and a word frequency co-occurrence vector of the head entity.
Specifically, the part-of-speech vector is used for representing the part of speech of each word in the description text of the head entity, and is specifically obtained by performing part-of-speech tagging through spaCy (Python natural language processing kit) or other part-of-speech tagging tools;
the entity type vector is used for representing the entity type described by each word in the description text of the head entity, and can be obtained by carrying out entity type recognition through spaCy (Python natural language processing toolkit) or other entity type recognition tools;
the word frequency co-occurrence vector is used for representing whether a word of the description text of the head entity appears in a word sequence of the head entity and the relation or a word sequence of the description text of the candidate tail entity, and can be obtained by a word frequency statistical tool or self-defined addition.
It should be noted that the vector representation of the head entity description text may also include at least one of a part-of-speech vector, an entity type vector, and a word frequency co-occurrence vector of the head entity, and at least one of the part-of-speech vector, the entity type vector, and the word frequency co-occurrence vector of the head entity may be directly used as a part of the description feature of the head entity without participating in the attention interaction of the feature representation layer.
Further, assume a word sequence of descriptive text for the head entityPerforming part-of-speech tagging to obtain a part-of-speech vector;
Word sequence of descriptive text for head entityIdentifying entity type to obtain entity type vector;
Identifying according to whether the words of the description text of the head entity appear in the word sequences of the description texts of the head entity and the relation or the word sequences of the description texts of the candidate tail entity to obtain word-frequency co-occurrence vectorsWord of the description text of the entity at the headWord sequences that appear in head entities and relationshipsOr word sequences of the description text of candidate tail entitiesMedium time, word frequency co-occurrence vectorIs true 1 and vice versa is 0.
Fusing the vectors obtained by the four embedding methods to obtain the vector representation of the head entityComprises the following steps:
in addition, word sequences of head entities and relationships can be alignedObtaining vector representation of head entity and relation by using word vector creation and part of speech taggingComprises the following steps:
word sequence representation for candidate tail entities asObtaining a vector representation of a description text of a candidate tail entity using word vector creationComprises the following steps:
accordingly, the context coding layer may be a vector table of description texts for the head entityDisplay deviceVector representation of head entities and relationshipsVector representation of the description text of the candidate tail entityAnd a feature representation of the descriptive text of the head entity after the attention interaction is performedCharacterization of head entities and relationshipsAnd feature representation of the description text of the candidate tail entityRespectively using independent Bi-LSTM networks to extract features to obtain the coding features of the description text of the head entityCoding features of head entities and relationshipsAnd the encoding characteristics of the description text of any candidate tail entityIs formulated as:
based on any embodiment, the entity complementing layer comprises a feature interaction layer and a result output layer. Fig. 5 is a schematic flowchart of a feature interaction and result output method provided in the embodiment of the present invention, and as shown in fig. 5, step 123 specifically includes:
in particular, the feature interaction layer encodes features of the inputted head entities and relationshipsPerforming self-attention transformation to obtain problem featuresIs formulated as:
feature interaction layer will issue featuresCoding features of descriptive text separately associated with header entitiesAnd the descriptive characteristics of the descriptive text of each candidate tail entityPerforming attention interaction to obtain interactive coding characteristics of description texts of head entities output by the characteristic interaction layerAnd the interactive coding characteristics of the description text of each candidate tail entityIs formulated as:
wherein the features are inter-codedRepresenting problem featuresCoding features of descriptive text with header entitiesAnd the relevance between the problem characteristics corresponding to the head entity and the relation and the coding characteristics of the description text of the head entity is represented by adopting the word-level sequence to align the coding characteristics after the attention mechanism interaction.
Inter-coding featuresRepresenting problem featuresDescription characteristics of description text of each candidate tail entityAnd the relevance between the problem characteristics corresponding to the head entity and the relation and the description characteristics of the description text of each candidate tail entity is represented by adopting the word-level sequence to align the coding characteristics after the attention mechanism interaction.
Due to generation of problem featuresIs/are as followsCoding features of descriptive text of head entityAnd the description characteristics of the description text of each candidate tail entityThe interactive coding features obtained on the basis of the semantic expression capability are obtained after the context coding layer is processed, semantic logical relations among words in texts are mined, the semantic expression capability is more accurate, the utilization degree of information in description texts by the interactive coding features obtained on the basis reaches the level of word level, and the accuracy of knowledge graph completion is greatly improved.
And step 1232, inputting the interactive coding features of the description texts of the head entity and the interactive coding features of the description texts of each candidate tail entity into a result output layer of the entity completion layer, determining the similarity between the interactive coding features of the description texts of the head entity and the interactive coding features of the description texts of each candidate tail entity by the result output layer, and completing based on the similarity corresponding to each candidate tail entity to obtain a completion result output by the result output layer.
Specifically, the similarity of the interactive coding features of the description text of the head entity and the interactive coding features of the description text of any candidate tail entity reflects the amount of coincidence in the information respectively associated with the head entity and the relationship to be complemented in the description text of the head entity and the description text of the candidate tail entity, and if the similarity is high, the more coincidence information is, and the higher the probability of taking the candidate tail entity as a completion result is.
For example, the result output layer may encode features interactively based on the descriptive text of the head entityInteractive coding features with description text of any candidate tail entityThe formula for performing the similarity calculation can be expressed as:
in the formula (I), the compound is shown in the specification,the corresponding similarity of the ith candidate tail entity,the similarity linear transformation can be obtained through custom setting or through a training mode.
Because the similarity corresponding to each candidate tail entity is an absolute value which is calculated independently, the similarities corresponding to the candidate tail entities lack mutual comparison, and the result output layer can carry out softmax processing on the similarity corresponding to the candidate tail entities, so that the candidate tail entity with the highest similarity is selected from the candidate tail entities to serve as a completion result. The similarity comparison result can be represented by the comparison sequence y:
in the formula, m is the number of candidate tail entities.
After the softmax processing, the similarity corresponding to each candidate tail entity is mapped into the (0, 1) interval by the comparison sequence, so that the similarity of each candidate tail entity is compared with each other, the probability distribution is met, and the candidate tail entity with the maximum probability value has the highest similarity.
And finally, the result output layer outputs the candidate tail entity with the highest similarity as a completion result, so that the knowledge graph is completed.
According to the knowledge graph complementing method provided by the embodiment of the invention, the coding features of the head entity and the relation are subjected to self-attention transformation, the internal correlation of the head entity and the relation is concerned more, and the complementing efficiency of the knowledge graph spectrum complementing model is improved.
Based on any of the above embodiments, fig. 6 is a schematic diagram of a framework of a knowledge graph completion model provided by an embodiment of the present invention, and as shown in fig. 6, the knowledge graph completion model may include a feature coding layer, a candidate text association layer, and an entity completion layer. The feature coding layer comprises a feature representation layer and a context coding layer, the candidate text association layer comprises an association feature extraction layer and a description feature extraction layer, and the entity completion layer comprises a feature interaction layer and a result output layer. Inputting the description text, the head entity and the relation of the head entity corresponding to the knowledge graph to be supplemented and the description texts of a plurality of candidate tail entities into a feature representation layer of a knowledge graph spectrum supplementation model, determining the description feature of each candidate tail entity based on the relevance between the description texts of each candidate tail entity by the knowledge graph supplementation model, determining the supplementation result corresponding to the head entity and the relation based on the description feature of each candidate tail entity, outputting the supplementation result from a result output layer, and supplementing the knowledge graph.
Based on any of the above embodiments, fig. 7 is a schematic structural diagram of a knowledge-graph completing device provided in an embodiment of the present invention, as shown in fig. 7, the device includes:
a text determining unit 710, configured to determine description texts of a plurality of candidate tail entities corresponding to the head entity and the relationship to be complemented;
the map completion unit 720 is used for inputting the description texts of the head entity, the relation and the plurality of candidate tail entities into the knowledge map completion model to obtain a completion result output by the knowledge map completion model;
the knowledge graph spectrum completion model determines the description characteristics of each candidate tail entity based on the relevance between the description texts of each candidate tail entity, and determines the completion results corresponding to the head entity and the relation based on the description characteristics of each candidate tail entity;
the knowledge graph completion model is obtained by training based on the relation between the sample head entity and the sample and the description texts of a plurality of corresponding sample candidate tail entities and the sample completion result.
Specifically, the text determining unit 710 is configured to determine description texts of a plurality of candidate tail entities corresponding to the head entity and the relationship to be complemented. The atlas complementing unit 720 is configured to input the head entities and the relations determined by the text determining unit 710 and the description texts of the multiple candidate tail entities into the knowledge atlas complementing model to obtain a complementing result output by the knowledge atlas complementing model.
The knowledge graph completion model determines the relevance among the description texts of the candidate tail entities by utilizing the effective information in the description texts of the candidate tail entities based on an attention mechanism, determines the description characteristics of each candidate tail entity based on the relevance among the description texts of each candidate tail entity, determines the similarity between each candidate tail entity and the head entity and the relation based on the description characteristics of each candidate tail entity, and finally determines the completion result corresponding to the head entity and the relation.
The knowledge graph completion device provided by the embodiment of the invention determines the description characteristics of each candidate tail entity based on the relevance among the description texts of a plurality of candidate tail entities, so that the description characteristics of each candidate tail entity can fully reflect the outstanding difference of the corresponding candidate tail entity compared with the rest candidate tail entities, and the completion result corresponding to the head entity and the relation is determined based on the description characteristics of each candidate tail entity, thereby realizing the full utilization of the information contained in the description texts and improving the accuracy of knowledge graph completion.
Based on any of the above embodiments, the atlas complementing unit 720 specifically includes:
the feature coding subunit is used for inputting the head entity and the relationship and the description texts of the multiple candidate tail entities into a feature coding layer of the knowledge graph spectrum completion model to obtain the coding features of the head entity and the relationship output by the feature coding layer and the coding features of the description texts of each candidate tail entity;
the candidate text association subunit is used for inputting the coding features of the description text of each candidate tail entity to a candidate text association layer of the knowledge graph spectrum completion model to obtain the description features of each candidate tail entity output by the candidate text association layer;
and the entity completion subunit is used for inputting the coding characteristics of the head entity and the relation and the description characteristics of each candidate tail entity into an entity completion layer of the knowledge graph spectrum completion model to obtain a completion result output by the entity completion layer.
Based on any of the above embodiments, the candidate text association subunit specifically includes:
the associated feature extraction module is used for inputting the coding features of the description texts of each candidate tail entity into an associated feature extraction layer of a candidate text association layer to obtain associated features between the description texts of each candidate tail entity and the description texts of the rest candidate tail entities, which are output by the associated feature extraction layer based on an attention mechanism;
and the description feature extraction module is used for inputting the association features between any candidate tail entity and each of the rest candidate tail entities and the coding features of the description text of any candidate tail entity into a description feature extraction layer of a candidate text association layer to obtain the description features of any candidate tail entity output by the description feature extraction layer.
Based on any of the embodiments described above, the description feature extraction module is specifically configured to:
and inputting the association characteristics between any candidate tail entity and each of the rest candidate tail entities and the coding characteristics of the description text of any candidate tail entity into a description characteristic extraction layer, and fusing the coding characteristics of the description text of any candidate tail entity and the difference between the coding characteristics and the association characteristics by the description characteristic extraction layer to obtain a fusion result output by the description characteristic extraction layer as the description characteristics of any candidate tail entity.
Based on any of the above embodiments, the feature encoding subunit includes:
the feature representation module is used for inputting the description texts, the head entities and the relations of the head entities and the description texts of the candidate tail entities into a feature representation layer of the feature coding layer to obtain feature representations of the description texts, the head entities and the relations of the head entities and the feature representations of the description texts of the candidate tail entities, which are output by the feature representation layer;
and the context coding module is used for inputting the feature representation of the description text of the head entity, the feature representation of the head entity and the relationship, and the feature representation of the description texts of a plurality of candidate tail entities into a context coding layer of the feature coding layer to obtain the coding features of the description text of the head entity, the coding features of the head entity and the relationship, and the coding features of the description text of each candidate tail entity, which are output by the context coding layer.
Based on any of the embodiments described above, the feature representation module is specifically configured to:
inputting the description texts, the head entities and the relations of the head entities and the description texts of any candidate tail entities into a feature representation layer, and performing attention interaction on the vector representation of the description texts, the vector representation of the head entities and the relations of the head entities and the vector representation of the description texts of any candidate tail entities by the feature representation layer in a pairwise manner to obtain the feature representation of the description texts, the feature representation of the head entities and the relations of the head entities and the feature representation of the description texts of a plurality of candidate tail entities output by the feature representation layer.
According to any of the above embodiments, the feature representation of the descriptive text of the head entity further includes at least one of a part-of-speech vector, an entity type vector, and a word frequency co-occurrence vector of the head entity.
Based on any of the above embodiments, the entity complementing subunit specifically includes:
the feature interaction module is used for inputting the coding features of the description texts of the head entities, the coding features of the head entities and the relation and the description features of each candidate tail entity into a feature interaction layer of the entity completion layer, performing self-attention transformation on the coding features of the head entities and the relation by the feature interaction layer to obtain problem features, and performing attention interaction on the problem features and the coding features of the description texts of the head entities and the description features of the description texts of each candidate tail entity respectively to obtain interactive coding features of the description texts of the head entities and interactive coding features of the description texts of each candidate tail entity, wherein the interactive coding features of the description texts of the head entities and the interactive coding features of the description texts of each candidate tail entity are output by the feature interaction layer;
and the result output module is used for inputting the interactive coding features of the description texts of the head entity and the interactive coding features of the description texts of each candidate tail entity into a result output layer of the entity completion layer, determining the similarity between the interactive coding features of the description texts of the head entity and the interactive coding features of the description texts of each candidate tail entity by the result output layer, and completing based on the similarity corresponding to each candidate tail entity to obtain a completion result output by the result output layer.
Based on any of the above embodiments, fig. 8 is a schematic structural diagram of an electronic device provided in an embodiment of the present invention, and as shown in fig. 8, the electronic device may include: a Processor (Processor) 810, a communication Interface (Communications Interface) 820, a Memory (Memory) 830 and a communication Bus (Communications Bus) 840, wherein the Processor 810, the communication Interface 820 and the Memory 830 communicate with each other via the communication Bus 840. The processor 810 may call logical commands in the memory 830 to perform the following method:
determining a head entity to be complemented and a description text of a plurality of candidate tail entities corresponding to the relationship; inputting the head entity, the relation and the description texts of the candidate tail entities into a knowledge graph spectrum complementing model to obtain a complementing result output by the knowledge graph complementing model; the knowledge graph spectrum completion model determines the description characteristics of each candidate tail entity based on the relevance between the description texts of each candidate tail entity, and determines the completion results corresponding to the head entity and the relation based on the description characteristics of each candidate tail entity; the knowledge graph completion model is obtained by training based on the relation between the sample head entity and the sample and the description texts of a plurality of corresponding sample candidate tail entities and the sample completion result.
In addition, the logic commands in the memory 830 can be implemented in the form of software functional units and stored in a computer readable storage medium when the logic commands are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes a plurality of commands for enabling a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Embodiments of the present invention further provide a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented to perform the method provided in the foregoing embodiments when executed by a processor, and the method includes:
determining a head entity to be complemented and a description text of a plurality of candidate tail entities corresponding to the relationship; inputting the head entity, the relation and the description texts of the candidate tail entities into a knowledge graph spectrum complementing model to obtain a complementing result output by the knowledge graph complementing model; the knowledge graph spectrum completion model determines the description characteristics of each candidate tail entity based on the relevance between the description texts of each candidate tail entity, and determines the completion results corresponding to the head entity and the relation based on the description characteristics of each candidate tail entity; the knowledge graph completion model is obtained by training based on the relation between the sample head entity and the sample and the description texts of a plurality of corresponding sample candidate tail entities and the sample completion result.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes commands for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (11)
1. A method for supplementing a knowledge graph, comprising:
determining a head entity to be complemented and a description text of a plurality of candidate tail entities corresponding to the relationship;
inputting the head entity and the relation and description texts of a plurality of candidate tail entities into a knowledge graph spectrum complementing model to obtain a complementing result output by the knowledge graph complementing model;
the knowledge graph completion model determines the description characteristics of each candidate tail entity based on the relevance between the description texts of each candidate tail entity, and determines the completion results corresponding to the head entity and the relation based on the description characteristics of each candidate tail entity;
the knowledge graph completion model is obtained by training based on the relation between a sample head entity and a sample and the description texts of a plurality of corresponding sample candidate tail entities and the sample completion result.
2. The method for completing the knowledge graph according to claim 1, wherein the inputting the head entities and the relations and the description texts of the plurality of candidate tail entities into a knowledge graph spectrum completion model to obtain the completion result output by the knowledge graph completion model specifically comprises:
inputting the head entity and the relation and description texts of a plurality of candidate tail entities into a feature coding layer of the knowledge graph complementing model to obtain the coding features of the head entity and the relation output by the feature coding layer and the coding features of the description texts of each candidate tail entity;
inputting the coding features of the description text of each candidate tail entity into a candidate text association layer of the knowledge graph completion model to obtain the description features of each candidate tail entity output by the candidate text association layer;
and inputting the coding characteristics of the head entity and the relation and the description characteristics of each candidate tail entity into an entity completion layer of the knowledge graph completion model to obtain the completion result output by the entity completion layer.
3. The method according to claim 2, wherein the inputting the encoding features of the description text of each candidate tail entity into the candidate text association layer of the knowledge-graph completion model to obtain the description features of each candidate tail entity output by the candidate text association layer specifically comprises:
inputting the coding features of the description texts of each candidate tail entity into an associated feature extraction layer of the candidate text association layer to obtain associated features between the description texts of any candidate tail entity and the description texts of each other candidate tail entity, which are output by the associated feature extraction layer based on an attention mechanism;
and inputting the association characteristics between any candidate tail entity and each of the rest candidate tail entities and the coding characteristics of the description text of any candidate tail entity into a description characteristic extraction layer of the candidate text association layer to obtain the description characteristics of any candidate tail entity output by the description characteristic extraction layer.
4. The method for completing a knowledge graph according to claim 3, wherein the inputting the association features between any candidate tail entity and each of the remaining candidate tail entities and the encoding features of the description text of any candidate tail entity into a description feature extraction layer of the candidate text association layer to obtain the description features of any candidate tail entity output by the description feature extraction layer specifically comprises:
inputting the association features between any candidate tail entity and each of the rest candidate tail entities and the coding features of the description text of any candidate tail entity into the description feature extraction layer, and fusing the coding features of the description text of any candidate tail entity and the difference between the coding features and the association features by the description feature extraction layer to obtain a fusion result output by the description feature extraction layer as the description features of any candidate tail entity.
5. The method according to claim 2, wherein the inputting the head entity and the relationship and the description texts of a plurality of candidate tail entities into a feature coding layer of the knowledge graph completion model to obtain the coding features of the head entity and the relationship and the coding features of the description texts of each candidate tail entity output by the feature coding layer specifically comprises:
inputting the description texts of the head entities, the head entities and the relations and the description texts of a plurality of candidate tail entities into a feature representation layer of the feature coding layer to obtain feature representations of the description texts of the head entities, the head entities and the relations and feature representations of the description texts of the plurality of candidate tail entities, which are output by the feature representation layer;
inputting the feature representation of the description text of the head entity, the feature representation of the head entity and the relationship, and the feature representation of the description texts of a plurality of candidate tail entities into a context coding layer of the feature coding layer, and obtaining the coding features of the description text of the head entity, the coding features of the head entity and the relationship, and the coding features of the description texts of each candidate tail entity, which are output by the context coding layer.
6. The method for completing a knowledge graph according to claim 5, wherein the inputting the description texts of the head entities, the head entities and the relationships, and the description texts of the candidate tail entities into the feature representation layer of the feature coding layer, and obtaining the feature representation of the description texts of the head entities, the feature representation of the head entities and the relationships, and the feature representation of the description texts of the candidate tail entities output by the feature representation layer specifically comprises:
inputting the description text of the head entity, the head entity and the relation, and the description text of any candidate tail entity into the feature representation layer, and performing attention interaction on the vector representation of the description text of the head entity, the vector representation of the head entity and the relation, and the vector representation of the description text of any candidate tail entity by the feature representation layer to obtain the feature representation of the description text of the head entity, the feature representation of the head entity and the relation, and the feature representation of the description texts of a plurality of candidate tail entities, which are output by the feature representation layer.
7. The method of knowledgegraph completion according to claim 5 or 6, wherein the feature representation of the descriptive text of the head entity further comprises at least one of a part-of-speech vector, an entity type vector, and a word frequency co-occurrence vector of the head entity.
8. The method according to claim 5, wherein the inputting the coding features of the head entities and the relations and the description features of each candidate tail entity into an entity completion layer of the knowledge-graph completion model to obtain the completion result output by the entity completion layer specifically comprises:
inputting the coding features of the description texts of the head entities, the coding features of the head entities and the relations, and the description features of each candidate tail entity into a feature interaction layer of the entity completion layer, performing self-attention transformation on the coding features of the head entities and the relations by the feature interaction layer to obtain problem features, and performing attention interaction on the problem features with the coding features of the description texts of the head entities and the description features of the description texts of each candidate tail entity respectively to obtain interactive coding features of the description texts of the head entities and interactive coding features of the description texts of each candidate tail entity, which are output by the feature interaction layer;
inputting the interactive coding features of the description texts of the head entity and the interactive coding features of the description texts of each candidate tail entity into a result output layer of the entity completion layer, determining the similarity between the interactive coding features of the description texts of the head entity and the interactive coding features of the description texts of each candidate tail entity by the result output layer, and completing based on the similarity corresponding to each candidate tail entity to obtain the completion result output by the result output layer.
9. A knowledge graph complementing device, comprising:
the text determining unit is used for determining a head entity to be complemented and description texts of a plurality of candidate tail entities corresponding to the relations;
the map completion unit is used for inputting the description texts of the head entity, the relation and the candidate tail entities into a knowledge map completion model to obtain a completion result output by the knowledge map completion model;
the knowledge graph completion model determines the description characteristics of each candidate tail entity based on the relevance between the description texts of each candidate tail entity, and determines the completion results corresponding to the head entity and the relation based on the description characteristics of each candidate tail entity;
the knowledge graph completion model is obtained by training based on the relation between a sample head entity and a sample and the description texts of a plurality of corresponding sample candidate tail entities and the sample completion result.
10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the knowledge-graph complementing method of any one of claims 1 to 8 when executing the computer program.
11. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the steps of the knowledge-graph complementation method of any of claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011168569.0A CN112000815B (en) | 2020-10-28 | 2020-10-28 | Knowledge graph complementing method and device, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011168569.0A CN112000815B (en) | 2020-10-28 | 2020-10-28 | Knowledge graph complementing method and device, electronic equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112000815A true CN112000815A (en) | 2020-11-27 |
CN112000815B CN112000815B (en) | 2021-03-02 |
Family
ID=73475193
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011168569.0A Active CN112000815B (en) | 2020-10-28 | 2020-10-28 | Knowledge graph complementing method and device, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112000815B (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112560477A (en) * | 2020-12-09 | 2021-03-26 | 中科讯飞互联(北京)信息科技有限公司 | Text completion method, electronic device and storage device |
CN113360675A (en) * | 2021-06-25 | 2021-09-07 | 中关村智慧城市产业技术创新战略联盟 | Knowledge graph specific relation completion method based on Internet open world |
CN113360664A (en) * | 2021-05-31 | 2021-09-07 | 电子科技大学 | Knowledge graph complementing method |
CN113569056A (en) * | 2021-07-27 | 2021-10-29 | 科大讯飞(苏州)科技有限公司 | Knowledge graph complementing method and device, electronic equipment and storage medium |
CN113626610A (en) * | 2021-08-10 | 2021-11-09 | 南方电网数字电网研究院有限公司 | Knowledge graph embedding method and device, computer equipment and storage medium |
CN114579762A (en) * | 2022-03-04 | 2022-06-03 | 腾讯科技(深圳)有限公司 | Knowledge graph alignment method, device, equipment, storage medium and program product |
CN116842958A (en) * | 2023-09-01 | 2023-10-03 | 北京邮电大学 | Time sequence knowledge graph completion method, entity prediction method based on time sequence knowledge graph completion method and device thereof |
CN117094395A (en) * | 2023-10-19 | 2023-11-21 | 腾讯科技(深圳)有限公司 | Method, device and computer storage medium for complementing knowledge graph |
-
2020
- 2020-10-28 CN CN202011168569.0A patent/CN112000815B/en active Active
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112560477A (en) * | 2020-12-09 | 2021-03-26 | 中科讯飞互联(北京)信息科技有限公司 | Text completion method, electronic device and storage device |
CN112560477B (en) * | 2020-12-09 | 2024-04-16 | 科大讯飞(北京)有限公司 | Text completion method, electronic equipment and storage device |
CN113360664A (en) * | 2021-05-31 | 2021-09-07 | 电子科技大学 | Knowledge graph complementing method |
CN113360664B (en) * | 2021-05-31 | 2022-03-25 | 电子科技大学 | Knowledge graph complementing method |
CN113360675B (en) * | 2021-06-25 | 2024-02-13 | 中关村智慧城市产业技术创新战略联盟 | Knowledge graph specific relationship completion method based on Internet open world |
CN113360675A (en) * | 2021-06-25 | 2021-09-07 | 中关村智慧城市产业技术创新战略联盟 | Knowledge graph specific relation completion method based on Internet open world |
CN113569056A (en) * | 2021-07-27 | 2021-10-29 | 科大讯飞(苏州)科技有限公司 | Knowledge graph complementing method and device, electronic equipment and storage medium |
CN113626610A (en) * | 2021-08-10 | 2021-11-09 | 南方电网数字电网研究院有限公司 | Knowledge graph embedding method and device, computer equipment and storage medium |
CN114579762A (en) * | 2022-03-04 | 2022-06-03 | 腾讯科技(深圳)有限公司 | Knowledge graph alignment method, device, equipment, storage medium and program product |
CN114579762B (en) * | 2022-03-04 | 2024-03-22 | 腾讯科技(深圳)有限公司 | Knowledge graph alignment method, device, equipment, storage medium and program product |
CN116842958B (en) * | 2023-09-01 | 2024-02-06 | 北京邮电大学 | Time sequence knowledge graph completion method, entity prediction method based on time sequence knowledge graph completion method and device thereof |
CN116842958A (en) * | 2023-09-01 | 2023-10-03 | 北京邮电大学 | Time sequence knowledge graph completion method, entity prediction method based on time sequence knowledge graph completion method and device thereof |
CN117094395B (en) * | 2023-10-19 | 2024-02-09 | 腾讯科技(深圳)有限公司 | Method, device and computer storage medium for complementing knowledge graph |
CN117094395A (en) * | 2023-10-19 | 2023-11-21 | 腾讯科技(深圳)有限公司 | Method, device and computer storage medium for complementing knowledge graph |
Also Published As
Publication number | Publication date |
---|---|
CN112000815B (en) | 2021-03-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112000815B (en) | Knowledge graph complementing method and device, electronic equipment and storage medium | |
EP3964998A1 (en) | Text processing method and model training method and apparatus | |
Chen et al. | Semantically conditioned dialog response generation via hierarchical disentangled self-attention | |
RU2619193C1 (en) | Multi stage recognition of the represent essentials in texts on the natural language on the basis of morphological and semantic signs | |
CN105608218B (en) | The method for building up of intelligent answer knowledge base establishes device and establishes system | |
CN110121705A (en) | Pragmatics principle is applied to the system and method interacted with visual analysis | |
CN112270196B (en) | Entity relationship identification method and device and electronic equipment | |
CN110019732B (en) | Intelligent question answering method and related device | |
JP2020027649A (en) | Method, apparatus, device and storage medium for generating entity relationship data | |
CN111966812B (en) | Automatic question answering method based on dynamic word vector and storage medium | |
US20220138193A1 (en) | Conversion method and systems from natural language to structured query language | |
CN115470338B (en) | Multi-scenario intelligent question answering method and system based on multi-path recall | |
CN113094512B (en) | Fault analysis system and method in industrial production and manufacturing | |
WO2024098524A1 (en) | Text and video cross-searching method and apparatus, model training method and apparatus, device, and medium | |
CN114997181A (en) | Intelligent question-answering method and system based on user feedback correction | |
CN114612921A (en) | Form recognition method and device, electronic equipment and computer readable medium | |
CN110347812A (en) | A kind of search ordering method and system towards judicial style | |
CN110929022A (en) | Text abstract generation method and system | |
CN106484660A (en) | Title treating method and apparatus | |
Pinheiro et al. | ChartText: Linking Text with Charts in Documents | |
CN116226638A (en) | Model training method, data benchmarking method, device and computer storage medium | |
CN113987536A (en) | Method and device for determining security level of field in data table, electronic equipment and medium | |
WO2022141855A1 (en) | Text regularization method and apparatus, and electronic device and storage medium | |
CN117371534B (en) | Knowledge graph construction method and system based on BERT | |
Liu et al. | An Enhanced ESIM Model for Sentence Pair Matching with Self-Attention. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |