CN116383357A - Knowledge graph-oriented query graph generation method and system - Google Patents
Knowledge graph-oriented query graph generation method and system Download PDFInfo
- Publication number
- CN116383357A CN116383357A CN202310363988.7A CN202310363988A CN116383357A CN 116383357 A CN116383357 A CN 116383357A CN 202310363988 A CN202310363988 A CN 202310363988A CN 116383357 A CN116383357 A CN 116383357A
- Authority
- CN
- China
- Prior art keywords
- query graph
- path
- relation
- query
- graph
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 43
- 238000001514 detection method Methods 0.000 claims abstract description 15
- 239000013598 vector Substances 0.000 claims description 30
- 230000003993 interaction Effects 0.000 claims description 18
- 238000010276 construction Methods 0.000 claims description 8
- 230000007246 mechanism Effects 0.000 claims description 8
- 238000004590 computer program Methods 0.000 claims description 7
- 238000011176 pooling Methods 0.000 claims description 6
- 238000013528 artificial neural network Methods 0.000 claims description 5
- 230000014509 gene expression Effects 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 claims description 3
- 230000011218 segmentation Effects 0.000 claims description 3
- 238000012549 training Methods 0.000 claims description 3
- 230000008569 process Effects 0.000 abstract description 7
- 238000013461 design Methods 0.000 abstract description 4
- 238000012163 sequencing technique Methods 0.000 abstract description 2
- 230000002776 aggregation Effects 0.000 description 5
- 238000004220 aggregation Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 230000002457 bidirectional effect Effects 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000004836 empirical method Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3322—Query formulation using system suggestions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Animal Behavior & Ethology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a knowledge-graph-oriented query graph generation method and a knowledge-graph-oriented query graph generation system, wherein the method comprises the following steps: constructing a relation detection model, and selecting a main relation path with highest matching degree with the problem from the candidate relation set; serializing the query graph, and uniformly coding the problems and the query graph sequence; and constructing a query graph ordering model, ordering the query graphs in the candidate set according to the semantic similarity scores of the query graphs and the questions, and selecting the optimal query graph. The invention provides a new method for the query graph generation task in the knowledge graph question and answer, and improves the overall performance of the knowledge graph question and answer by completing each subtask in the query graph generation process, including relation detection, query graph serialization and query graph sequencing; compared with the prior art, the model provided by the invention is based on end-to-end matching, does not introduce the characteristic of manual design, and is simple to realize.
Description
Technical Field
The invention relates to a query graph generation technology, in particular to a knowledge graph-oriented query graph generation method and system.
Background
At present, under the large background of the Internet age, people are used to acquire information through a network, and only key words are needed to be input, and a search engine can return various information related to the key words, so that the daily work and life of people are greatly facilitated. However, in the face of the natural language question presented by the user, the conventional search engine can only simply match and combine keywords in the question, so that it is difficult to accurately understand complex logic relations in the question, and the result of the search feedback is a webpage list related to the keywords of the question, rather than the final answer of the question, which requires the user to further screen, thus reducing the search efficiency.
In recent years, with the development of knowledge graph, information retrieval, deep learning and other technologies, knowledge graph questions and answers become a new technology for solving question and answer tasks, which uses rich semantic association information in the knowledge graph to analyze natural language questions presented by users, fully understand user intention, retrieve the questions in the knowledge graph and return answers to the questions to the users. The query graph generation method simplifies the semantic analysis process of the questions into the query graph generation process, and the logical form of the questions can be clearly and intuitively expressed by mapping the natural language questions to the query graph, so that the computer understanding is facilitated, and the efficiency and performance of knowledge graph question and answer are further improved.
In order to enable the query graph to express semantic information of the problems as accurately as possible, researchers decompose a mapping process of the problems to the query graph into different subtasks, and complete each subtask based on a rule template or a neural network. Neural network based approaches are currently the dominant approach because predefined rule templates require excessive human intervention. Bao et al propose a query graph generation method (Bao J, duan N, yan Z, et al constraint-based Question Answering with Knowledge Graph [ C ]// Proceedings of COLING 2016,the 26th International Conference on Computational Linguistics:Technical Papers.2016:2503-2514.) for multi-constraint problems, transform the multi-constraint problems into multi-constraint query graphs, encode candidate query graphs using two CNN models, and manually design a series of constraint-related features such as the number of each constraint, the sum of constant vertex entity link scores in entity constraints, and the like in addition to the problems, entities and relationships of the CNN encoded features. Luo et al propose a semantic matching model (Luo K, lin F, luo X, et al knowledge Base Question Answering via Encoding of Complex Query Graphs [ C ]// Proceedings of the 2018Conference on Empirical Methods in Natural Language Processing.2018:2185-2194.) for matching natural language questions to query graphs, which model is based on a "code-compare" framework implementation, encoding the natural language questions and query graphs, respectively. Firstly, using Bi-GRU models to encode the problems in the global and local directions respectively, and taking the sum of the two as the encoding of the problems; then, a predicate sequence on a relation path is used for representing the query graph, and the predicate id sequence and the predicate name sequence are utilized to obtain the code of the query graph; and finally, calculating the similarity between the problem and the query graph by using a Cosine distance formula.
The method proposed by Bao et al decomposes the query graph into a plurality of semantic components, then comprehensively encodes the semantic components, ignores the structural information of the query graph, introduces the characteristics of manual design, and increases the complexity of encoding.
The method proposed by Luo et al is based on a "coding-comparison" framework, directly codes predicate sequences of a problem and a query graph into vector sequences respectively, compresses the vector sequences into a single vector through aggregation operation, and finally compares the similarity of the problem vector and the query graph vector. The method is used for carrying out high abstraction on the problems and the query graph, the mutual influence of the internal information of the problems and the query graph is not considered, and the vector aggregation is easy to lose key information required by matching.
Because of structural differences between the query graph and the problem, the query graph is independently encoded by the method, and the internal information interaction between the query graph and the problem cannot be realized.
Disclosure of Invention
The invention aims to provide a knowledge-graph-oriented query graph generation method and a knowledge-graph-oriented query graph generation system, which solve the problem that a coding-comparison framework ignores information interaction by using a bidirectional attention mechanism to realize information interaction of problems and relations; the method has the advantages that the local information extracted by local interaction is used as the supplement to global information obtained by aggregation, so that the problem that semantic information is easy to lose in aggregation operation is solved; converting the graph structure into a linear structure by utilizing query graph serialization to realize unified coding of the query graph and the problem, and solving the problem of structural difference between the problem and the query graph; by solving the key problems, the accuracy of question and answer is improved.
The technical scheme for realizing the purpose of the invention is as follows: in a first aspect, the present invention provides a knowledge graph-oriented query graph generating method, including the following steps:
constructing a relation detection model, and selecting a main relation path with highest matching degree with the problem from the candidate relation set;
serializing the query graph, and uniformly coding the problems and the query graph sequence;
and constructing a query graph ordering model, ordering the query graphs in the candidate set according to the semantic similarity scores of the query graphs and the questions, and selecting the optimal query graph.
In a second aspect, the present invention provides a knowledge-graph-oriented query graph generating system, including:
the relation detection model construction module is used for selecting a main relation path with highest matching degree with the problem from the candidate relation set;
the query graph serialization module is used for serializing the query graph and uniformly coding the problems and the query graph sequence;
and the query graph ordering model construction module is used for ordering the query graphs in the candidate set according to the semantic similarity scores of the query graphs and the problems, and selecting the optimal query graph.
In a third aspect, the present invention provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method of the first aspect when the computer program is executed.
In a fourth aspect, the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method of the first aspect.
Compared with the prior art, the invention has the remarkable advantages that:
(1) In the relation detection model, attention interaction of the problem and the relation is realized by using a bidirectional attention mechanism, and the mutual influence of the problem and the relation is considered, so that the model focuses on the part of the problem and the relation which are related to each other, and the extraction of key information of the problem and the relation matching is facilitated.
(2) In the relation detection model, global features and local features are comprehensively considered, and semantic information lost in the aggregation operation is effectively supplemented by extracting the local features through local interaction.
(3) In the query graph ordering task, the query graph is serialized according to the main relation path and different constraint sub-paths, so that certain structural characteristics of the query graph are reserved, the defect caused by structural differences of the query graph and the problem is overcome, and the problem and the query graph can be uniformly coded.
(4) In the query graph ordering model, the BERT is utilized to encode the entity mention and the entity name of the character level representation, so that the robustness of the model for processing the unregistered words is enhanced, and the overall performance of the model is improved.
Drawings
FIG. 1 is a diagram of a relationship detection model.
FIG. 2 is a diagram of a query graph serialization process.
FIG. 3 is a diagram of a query graph ranking model.
Detailed Description
The technical scheme of the present invention will be clearly and completely described below with reference to the accompanying drawings. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by one of ordinary skill in the art without creative efforts, are within the scope of the present invention based on the embodiments of the present invention.
The invention provides two models for a relation detection task and a query graph ordering task on the basis of completing the natural language problem and the link of the knowledge graph entities so as to improve the accuracy and efficiency of knowledge graph questions and answers.
FIG. 1 is a block diagram of a model of relationship detection for the purpose of selecting a primary relationship path from a candidate relationship set that matches a problem to a highest degree, comprising the steps of:
(1) Problem and relationship preprocessing. The questions are represented in word-level, first using physically linked knotsIf the entity in the question is mentioned to be replaced by a universal mark<e>The question is then represented as a word sequence W by segmentation q . The candidate relation adopts two granularity expressions of word level and relation level, divides a plurality of relations forming a relation path into a word sequence and a relation name sequence, and combines W of the word sequence and the relation name sequence r As a relational representation.
(2) Question and relation coding. First, the word embedding layer is used to represent the problem sequence W q And the relationship represents the sequence W r Respectively mapped to its corresponding word embedding sequence E q And E is r The word embedding layer is initialized by using a Glove pre-training language model; then using two Bi-LSTM pairs to respectively R q And E is r And carrying out semantic coding to obtain semantic vector representations Q and R of the problem and the candidate relation.
(3) Implementing a bi-directional attention mechanism. Attention weights of problems and relationships affecting each other are first calculatedAnd->Wherein (1)>W a Is a trainable parameter matrix; then calculate the weighted sum of the questions and relations under the attentive mechanisms +.>And->Attention representation of the question and relation>And->Where n represents the number of words in the question and m representsNumber of words in the relationship sequence.
(4) A global comparison score is calculated. Respectively toAnd->Performing maximum pooling operation to obtain global semantic vector q of problems and relations g And r g And calculating cosine similarity of the two to obtain a global comparison score of the problem and the relation.
(5) A local interaction score is calculated. First sum Q andthe vectors at the corresponding positions are spliced to obtain an interaction vector sequence C; then, extracting local semantic features of the C by using a new Bi-LSTM to obtain a local feature representation T; and finally, reducing the dimension of T by utilizing a feedforward neural network and maximum pooling to obtain the local interaction score of the problem and the relation.
(6) And calculating semantic similarity. And adding the global comparison score and the local interaction score to obtain a semantic similarity total score of the problem and the relation.
In order to overcome the coding difficulty caused by structural difference between the query graph and the problem, the invention sequences the query graph before coding is implemented, and then uniformly codes the problem and the query graph sequence. FIG. 2 illustrates an exemplary serialization process for a query graph, comprising the following steps:
(1) Query graph splitting. Firstly, splitting a query graph into five parts according to constraint types defined in a query graph expansion method proposed by Luo et al: a main relationship path, an entity constraint sub-path, a type constraint sub-path, a time constraint sub-path and an ordinal constraint sub-path; the entity name in the main relationship path is then replaced with the universal label [ unused1], and the literal value of the answer node in the main relationship path is replaced with the answer type.
In question "who is the first team leader of chinese men after 2012? "for example, the splitting result of the corresponding query graph is shown in fig. 2, and because the problem is a multi-constraint complex problem, the query graph further comprises four types of constraints besides the main relationship path. The main relation path { athlete, basketball athlete } indicates the main topic of the question, the main relation path { title, captain } is used for restricting the answering entity, the type restriction sub-path { man } indicates the type of the answer, the time restriction sub-path { >,2012} restricts the time range of the question, and the ordinal restriction sub-path { incumbent time, earliest } corresponds to the constraint of 'first incumbent' in the question. The information of various sub-paths corresponds to the constraint in the problem one by one, and the semantic information of the problem is completely represented.
(2) Sub-path serialization. Traversing each sub-path by using depth-first search, and sequentially recording information of corresponding nodes and directed edges to obtain a serialization representation of each sub-path.
(3) The query graph is serialized in its entirety. And merging the sub-path sequences according to the sequence of the main relation path, the entity constraint sub-path, the type constraint sub-path, the time constraint sub-path and the ordinal constraint sub-path to obtain the serialization representation of the complete query graph.
FIG. 3 is a diagram of a model of query graph ranking, wherein the task of query graph ranking is to rank query graphs in a candidate set according to semantic similarity scores of the query graphs and questions, and the best query graph is selected, and the steps are as follows:
(1) Question and query graph coding. The problem and query graph sequences are encoded using the BERT model as an encoder. First the entity mention in the question is replaced by the universal label [ unused0 ]]And representing the question and query graph sequences as word sequences W q And W is q The method comprises the steps of carrying out a first treatment on the surface of the Then introduce special tags of BERT model [ CLS ]]And [ SEP ]]Will W q And W is g Inputting BERT model in the form of "sentence pair", inputting [ CLS ]]Corresponding output vector t g As an overall feature representation of the sequence of question and query graphs.
(2) And (5) entity coding. Since entity names are often proper nouns, word-level representations contain a large number of unregistered words, and thus use the character level in encoding an entityAnd (3) representing. First, the entity references in the question and the entity names in the query graph sequence are respectively represented as a character sequence W eq And W is eg The method comprises the steps of carrying out a first treatment on the surface of the Then add [ CLS ]]And [ SEP ]]Tagging, using BERT model pair "sentence pair" W eq And W is eg Encoding [ CLS ]]Corresponding output vector t e As an overall semantic representation of entity references and entity names.
(3) A comparison score is calculated. Firstly, splicing output vectors corresponding to [ CLS ] in the BERT models in the steps (1) and (2); then using a linear layer to reduce the dimension of the spliced vector; finally, a new linear layer is used to calculate the comparison scores of the questions and the query graph.
Based on the same inventive concept, the invention also provides a knowledge-graph-oriented query graph generation system, which comprises:
the relation detection model construction module is used for selecting a main relation path with highest matching degree with the problem from the candidate relation set;
the query graph serialization module is used for serializing the query graph and uniformly coding the problems and the query graph sequence;
and the query graph ordering model construction module is used for ordering the query graphs in the candidate set according to the semantic similarity scores of the query graphs and the problems, and selecting the optimal query graph.
The specific implementation method of each module is the same as the query graph generation method facing the knowledge graph, and is not repeated here.
The invention provides a new method for the query graph generation task in the knowledge graph question and answer, and improves the overall performance of the knowledge graph question and answer by completing each subtask in the query graph generation process, including relation detection, query graph serialization and query graph sequencing. In addition, compared with the prior art, the model provided by the invention is based on end-to-end matching, does not introduce the characteristic of manual design, and is simple to realize.
Claims (10)
1. The knowledge graph-oriented query graph generation method is characterized by comprising the following steps of:
constructing a relation detection model, and selecting a main relation path with highest matching degree with the problem from the candidate relation set;
serializing the query graph, and uniformly coding the problems and the query graph sequence;
and constructing a query graph ordering model, ordering the query graphs in the candidate set according to the semantic similarity scores of the query graphs and the questions, and selecting the optimal query graph.
2. The knowledge-graph-oriented query graph generation method of claim 1, wherein the building of the relation detection model selects a main relation path with the highest matching degree with the problem from the candidate relation set, and the method is specifically as follows:
(1) Problem and relationship preprocessing; the questions are expressed in word level, and the results of entity links are utilized first to replace the entity references in the questions with common marks<e>The question is then represented as a word sequence W by segmentation q The method comprises the steps of carrying out a first treatment on the surface of the The candidate relation adopts two granularity expressions of word level and relation level, divides a plurality of relations forming a relation path into a word sequence and a relation name sequence, and combines W of the word sequence and the relation name sequence r As a relational representation;
(2) Question and relationship coding; first, the word embedding layer is used to represent the problem sequence W q And the relationship represents the sequence W r Respectively mapped to its corresponding word embedding sequence E q And E is r The word embedding layer is initialized by using a Glove pre-training language model; then use two Bi-LSTM pairs E q And E is r Carrying out semantic coding to obtain semantic vector representations Q and R of the problem and the candidate relation;
(3) Implementing a bi-directional attention mechanism; attention weights of problems and relationships affecting each other are first calculatedAnd->Wherein (1)>q i ∈Q,r j ∈R,W a Is a trainable parameter matrix; then calculate the weighted sum of the questions and relations under the attentive mechanisms +.>And->n represents the number of words in the question, m represents the number of words in the sequence of relations, resulting in the attention representation of the question and the relation +.>And->
(4) Calculating a global comparison score; respectively toAnd->Performing maximum pooling operation to obtain global semantic vector q of problems and relations g And r g Calculating cosine similarity of the two to obtain a global comparison score of the problem and the relation;
(5) Calculating a local interaction score; first sum Q andthe vectors at the corresponding positions are spliced to obtain an interaction vector sequence C; then, extracting local semantic features of the C by using a new Bi-LSTM to obtain a local feature representation T; finally, reducing the dimension of T by utilizing a feedforward neural network and maximum pooling to obtain a local interaction score of the problem and the relation;
(6) Calculating semantic similarity; and adding the global comparison score and the local interaction score to obtain a semantic similarity total score of the problem and the relation.
3. The knowledge-graph-oriented query graph generation method according to claim 1, wherein the query graph is serialized, and the problem and the query graph sequence are uniformly coded, and the specific steps are as follows:
(1) Splitting a query graph; firstly, splitting a query graph into five parts according to constraint types defined when the query graph is expanded: a main relationship path, an entity constraint sub-path, a type constraint sub-path, a time constraint sub-path and an ordinal constraint sub-path; then, the entity name in the main relation path is replaced by a universal mark [ unused1], and the literal value of the answer node in the main relation path is replaced by the answer type;
(2) Sub-path serialization; traversing each sub-path by using depth-first search, and sequentially recording information of corresponding nodes and directed edges to obtain a serialization representation of each sub-path;
(3) Serializing the whole query graph; and merging the sub-path sequences according to the sequence of the main relation path, the entity constraint sub-path, the type constraint sub-path, the time constraint sub-path and the ordinal constraint sub-path to obtain the serialization representation of the complete query graph.
4. The knowledge-graph-oriented query graph generation method of claim 1, wherein the constructing a query graph ordering model orders query graphs in a candidate set according to semantic similarity scores of the query graphs and the questions, and selects an optimal query graph, which is specifically as follows:
(1) Question and query graph coding; encoding the sequence of questions and query graphs using the BERT model as an encoder; first the entity mention in the question is replaced by the universal label [ unused0 ]]And representing the question and query graph sequences as word sequences W q And W is g The method comprises the steps of carrying out a first treatment on the surface of the Then introduce special tags of BERT model [ CLS ]]And [ SEP ]]Will W q And W is g Inputting BERT model in the form of "sentence pair", inputting [ CLS ]]Corresponding output vector t g As a problem anda global feature representation of the query graph sequence;
(2) Entity coding; first, the entity references in the question and the entity names in the query graph sequence are respectively represented as a character sequence W eq And W is eg The method comprises the steps of carrying out a first treatment on the surface of the Then add [ CLS ]]And [ SEP ]]Tagging, using BERT model pair "sentence pair" W eq And W is eg Encoding [ CLS ]]Corresponding output vector t e As an overall semantic representation of entity references and entity names;
(3) Calculating a comparison score; firstly, splicing output vectors corresponding to [ CLS ] in the BERT models in the steps (1) and (2); then using a linear layer to reduce the dimension of the spliced vector; finally, a new linear layer is used to calculate the comparison scores of the questions and the query graph.
5. A knowledge-graph-oriented query graph generation system, comprising:
the relation detection model construction module is used for selecting a main relation path with highest matching degree with the problem from the candidate relation set;
the query graph serialization module is used for serializing the query graph and uniformly coding the problems and the query graph sequence;
and the query graph ordering model construction module is used for ordering the query graphs in the candidate set according to the semantic similarity scores of the query graphs and the problems, and selecting the optimal query graph.
6. The knowledge-graph-oriented query graph generation system of claim 5, wherein the relationship detection model construction module is configured to select a main relationship path with a highest degree of matching with a problem from a candidate relationship set, and specifically comprises:
(1) Problem and relationship preprocessing; the questions are expressed in word level, and the results of entity links are utilized first to replace the entity references in the questions with common marks<e>The question is then represented as a word sequence W by segmentation q The method comprises the steps of carrying out a first treatment on the surface of the The candidate relation adopts two granularity expressions of word level and relation level, divides a plurality of relations forming a relation path into a word sequence and a relation name sequence, and divides the two into twoUnion of people W r As a relational representation;
(2) Question and relationship coding; first, the word embedding layer is used to represent the problem sequence W q And the relationship represents the sequence W r Respectively mapped to its corresponding word embedding sequence E q And E is r The word embedding layer is initialized by using a Glove pre-training language model; then use two Bi-LSTM pairs E q And E is r Carrying out semantic coding to obtain semantic vector representations Q and R of the problem and the candidate relation;
(3) Implementing a bi-directional attention mechanism; attention weights of problems and relationships affecting each other are first calculatedAnd->Wherein (1)>q i ∈Q,r j ∈R,W a Is a trainable parameter matrix; then calculate the weighted sum of the questions and relations under the attentive mechanisms +.>And->n represents the number of words in the question, m represents the number of words in the sequence of relations, resulting in the attention representation of the question and the relation +.>And->
(4) Calculating a global comparison score; respectively toAnd->Performing maximum pooling operation to obtain global semantic vector q of problems and relations g And r g Calculating cosine similarity of the two to obtain a global comparison score of the problem and the relation;
(5) Calculating a local interaction score; first sum Q andthe vectors at the corresponding positions are spliced to obtain an interaction vector sequence C; then, extracting local semantic features of the C by using a new Bi-LSTM to obtain a local feature representation T; finally, reducing the dimension of T by utilizing a feedforward neural network and maximum pooling to obtain a local interaction score of the problem and the relation;
(6) Calculating semantic similarity; and adding the global comparison score and the local interaction score to obtain a semantic similarity total score of the problem and the relation.
7. The knowledge-graph-oriented query graph generation system of claim 5, wherein the query graph serialization module is configured to serialize a query graph, uniformly encode a problem and a query graph sequence, and specifically comprises:
(1) Splitting a query graph; firstly, splitting a query graph into five parts according to constraint types defined when the query graph is expanded: a main relationship path, an entity constraint sub-path, a type constraint sub-path, a time constraint sub-path and an ordinal constraint sub-path; then, the entity name in the main relation path is replaced by a universal mark [ unused1], and the literal value of the answer node in the main relation path is replaced by the answer type;
(2) Sub-path serialization; traversing each sub-path by using depth-first search, and sequentially recording information of corresponding nodes and directed edges to obtain a serialization representation of each sub-path;
(3) Serializing the whole query graph; and merging the sub-path sequences according to the sequence of the main relation path, the entity constraint sub-path, the type constraint sub-path, the time constraint sub-path and the ordinal constraint sub-path to obtain the serialization representation of the complete query graph.
8. The knowledge-graph-oriented query graph generation system of claim 5, wherein the query graph ranking model construction module ranks query graphs in the candidate set according to semantic similarity scores of the query graphs and the questions, and selects an optimal query graph, specifically:
(1) Question and query graph coding; encoding the sequence of questions and query graphs using the BERT model as an encoder; first the entity mention in the question is replaced by the universal label [ unused0 ]]And representing the question and query graph sequences as word sequences W q And W is g The method comprises the steps of carrying out a first treatment on the surface of the Then introduce special tags of BERT model [ CLS ]]And [ SEP ]]Will W q And W is g Inputting BERT model in the form of "sentence pair", inputting [ CLS ]]Corresponding output vector t g As an overall feature representation of the sequence of question and query graphs;
(2) Entity coding; first, the entity references in the question and the entity names in the query graph sequence are respectively represented as a character sequence W eq And W is eg The method comprises the steps of carrying out a first treatment on the surface of the Then add [ CLS ]]And [ SEP ]]Tagging, using BERT model pair "sentence pair" W eq And W is eg Encoding [ CLS ]]Corresponding output vector t e As an overall semantic representation of entity references and entity names;
(3) Calculating a comparison score; firstly, splicing output vectors corresponding to [ CLS ] in the BERT models in the steps (1) and (2); then using a linear layer to reduce the dimension of the spliced vector; finally, a new linear layer is used to calculate the comparison scores of the questions and the query graph.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any one of claims 1-4 when the computer program is executed.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method according to any one of claims 1-4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310363988.7A CN116383357A (en) | 2023-04-06 | 2023-04-06 | Knowledge graph-oriented query graph generation method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310363988.7A CN116383357A (en) | 2023-04-06 | 2023-04-06 | Knowledge graph-oriented query graph generation method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116383357A true CN116383357A (en) | 2023-07-04 |
Family
ID=86964185
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310363988.7A Pending CN116383357A (en) | 2023-04-06 | 2023-04-06 | Knowledge graph-oriented query graph generation method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116383357A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117194633A (en) * | 2023-09-12 | 2023-12-08 | 河海大学 | Dam emergency response knowledge question-answering system based on multi-level multipath and implementation method |
-
2023
- 2023-04-06 CN CN202310363988.7A patent/CN116383357A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117194633A (en) * | 2023-09-12 | 2023-12-08 | 河海大学 | Dam emergency response knowledge question-answering system based on multi-level multipath and implementation method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113128229B (en) | Chinese entity relation joint extraction method | |
Xie et al. | Representation learning of knowledge graphs with entity descriptions | |
CN111611361A (en) | Intelligent reading, understanding, question answering system of extraction type machine | |
CN111930906A (en) | Knowledge graph question-answering method and device based on semantic block | |
CN113486667A (en) | Medical entity relationship joint extraction method based on entity type information | |
CN111897944B (en) | Knowledge graph question-answering system based on semantic space sharing | |
CN110232113B (en) | Method and system for improving question and answer accuracy of knowledge base | |
CN116097250A (en) | Layout aware multimodal pre-training for multimodal document understanding | |
CN112015868A (en) | Question-answering method based on knowledge graph completion | |
CN116127095A (en) | Question-answering method combining sequence model and knowledge graph | |
CN117171333A (en) | Electric power file question-answering type intelligent retrieval method and system | |
Zhang et al. | Hierarchical scene parsing by weakly supervised learning with image descriptions | |
CN112115253A (en) | Depth text ordering method based on multi-view attention mechanism | |
CN113779220A (en) | Mongolian multi-hop question-answering method based on three-channel cognitive map and graph attention network | |
CN112632250A (en) | Question and answer method and system under multi-document scene | |
CN115203421A (en) | Method, device and equipment for generating label of long text and storage medium | |
CN116127090A (en) | Aviation system knowledge graph construction method based on fusion and semi-supervision information extraction | |
CN112883199A (en) | Collaborative disambiguation method based on deep semantic neighbor and multi-entity association | |
CN115759092A (en) | Network threat information named entity identification method based on ALBERT | |
CN118227769B (en) | Knowledge graph enhancement-based large language model question-answer generation method | |
CN114117000A (en) | Response method, device, equipment and storage medium | |
CN117574898A (en) | Domain knowledge graph updating method and system based on power grid equipment | |
CN116383357A (en) | Knowledge graph-oriented query graph generation method and system | |
CN116521887A (en) | Knowledge graph complex question-answering system and method based on deep learning | |
CN116821291A (en) | Question-answering method and system based on knowledge graph embedding and language model alternate learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |