CN116383357A - Knowledge graph-oriented query graph generation method and system - Google Patents

Knowledge graph-oriented query graph generation method and system Download PDF

Info

Publication number
CN116383357A
CN116383357A CN202310363988.7A CN202310363988A CN116383357A CN 116383357 A CN116383357 A CN 116383357A CN 202310363988 A CN202310363988 A CN 202310363988A CN 116383357 A CN116383357 A CN 116383357A
Authority
CN
China
Prior art keywords
query graph
path
relation
query
graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310363988.7A
Other languages
Chinese (zh)
Inventor
徐建
张帆
邱婉春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Science and Technology
Original Assignee
Nanjing University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Science and Technology filed Critical Nanjing University of Science and Technology
Priority to CN202310363988.7A priority Critical patent/CN116383357A/en
Publication of CN116383357A publication Critical patent/CN116383357A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3322Query formulation using system suggestions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a knowledge-graph-oriented query graph generation method and a knowledge-graph-oriented query graph generation system, wherein the method comprises the following steps: constructing a relation detection model, and selecting a main relation path with highest matching degree with the problem from the candidate relation set; serializing the query graph, and uniformly coding the problems and the query graph sequence; and constructing a query graph ordering model, ordering the query graphs in the candidate set according to the semantic similarity scores of the query graphs and the questions, and selecting the optimal query graph. The invention provides a new method for the query graph generation task in the knowledge graph question and answer, and improves the overall performance of the knowledge graph question and answer by completing each subtask in the query graph generation process, including relation detection, query graph serialization and query graph sequencing; compared with the prior art, the model provided by the invention is based on end-to-end matching, does not introduce the characteristic of manual design, and is simple to realize.

Description

Knowledge graph-oriented query graph generation method and system
Technical Field
The invention relates to a query graph generation technology, in particular to a knowledge graph-oriented query graph generation method and system.
Background
At present, under the large background of the Internet age, people are used to acquire information through a network, and only key words are needed to be input, and a search engine can return various information related to the key words, so that the daily work and life of people are greatly facilitated. However, in the face of the natural language question presented by the user, the conventional search engine can only simply match and combine keywords in the question, so that it is difficult to accurately understand complex logic relations in the question, and the result of the search feedback is a webpage list related to the keywords of the question, rather than the final answer of the question, which requires the user to further screen, thus reducing the search efficiency.
In recent years, with the development of knowledge graph, information retrieval, deep learning and other technologies, knowledge graph questions and answers become a new technology for solving question and answer tasks, which uses rich semantic association information in the knowledge graph to analyze natural language questions presented by users, fully understand user intention, retrieve the questions in the knowledge graph and return answers to the questions to the users. The query graph generation method simplifies the semantic analysis process of the questions into the query graph generation process, and the logical form of the questions can be clearly and intuitively expressed by mapping the natural language questions to the query graph, so that the computer understanding is facilitated, and the efficiency and performance of knowledge graph question and answer are further improved.
In order to enable the query graph to express semantic information of the problems as accurately as possible, researchers decompose a mapping process of the problems to the query graph into different subtasks, and complete each subtask based on a rule template or a neural network. Neural network based approaches are currently the dominant approach because predefined rule templates require excessive human intervention. Bao et al propose a query graph generation method (Bao J, duan N, yan Z, et al constraint-based Question Answering with Knowledge Graph [ C ]// Proceedings of COLING 2016,the 26th International Conference on Computational Linguistics:Technical Papers.2016:2503-2514.) for multi-constraint problems, transform the multi-constraint problems into multi-constraint query graphs, encode candidate query graphs using two CNN models, and manually design a series of constraint-related features such as the number of each constraint, the sum of constant vertex entity link scores in entity constraints, and the like in addition to the problems, entities and relationships of the CNN encoded features. Luo et al propose a semantic matching model (Luo K, lin F, luo X, et al knowledge Base Question Answering via Encoding of Complex Query Graphs [ C ]// Proceedings of the 2018Conference on Empirical Methods in Natural Language Processing.2018:2185-2194.) for matching natural language questions to query graphs, which model is based on a "code-compare" framework implementation, encoding the natural language questions and query graphs, respectively. Firstly, using Bi-GRU models to encode the problems in the global and local directions respectively, and taking the sum of the two as the encoding of the problems; then, a predicate sequence on a relation path is used for representing the query graph, and the predicate id sequence and the predicate name sequence are utilized to obtain the code of the query graph; and finally, calculating the similarity between the problem and the query graph by using a Cosine distance formula.
The method proposed by Bao et al decomposes the query graph into a plurality of semantic components, then comprehensively encodes the semantic components, ignores the structural information of the query graph, introduces the characteristics of manual design, and increases the complexity of encoding.
The method proposed by Luo et al is based on a "coding-comparison" framework, directly codes predicate sequences of a problem and a query graph into vector sequences respectively, compresses the vector sequences into a single vector through aggregation operation, and finally compares the similarity of the problem vector and the query graph vector. The method is used for carrying out high abstraction on the problems and the query graph, the mutual influence of the internal information of the problems and the query graph is not considered, and the vector aggregation is easy to lose key information required by matching.
Because of structural differences between the query graph and the problem, the query graph is independently encoded by the method, and the internal information interaction between the query graph and the problem cannot be realized.
Disclosure of Invention
The invention aims to provide a knowledge-graph-oriented query graph generation method and a knowledge-graph-oriented query graph generation system, which solve the problem that a coding-comparison framework ignores information interaction by using a bidirectional attention mechanism to realize information interaction of problems and relations; the method has the advantages that the local information extracted by local interaction is used as the supplement to global information obtained by aggregation, so that the problem that semantic information is easy to lose in aggregation operation is solved; converting the graph structure into a linear structure by utilizing query graph serialization to realize unified coding of the query graph and the problem, and solving the problem of structural difference between the problem and the query graph; by solving the key problems, the accuracy of question and answer is improved.
The technical scheme for realizing the purpose of the invention is as follows: in a first aspect, the present invention provides a knowledge graph-oriented query graph generating method, including the following steps:
constructing a relation detection model, and selecting a main relation path with highest matching degree with the problem from the candidate relation set;
serializing the query graph, and uniformly coding the problems and the query graph sequence;
and constructing a query graph ordering model, ordering the query graphs in the candidate set according to the semantic similarity scores of the query graphs and the questions, and selecting the optimal query graph.
In a second aspect, the present invention provides a knowledge-graph-oriented query graph generating system, including:
the relation detection model construction module is used for selecting a main relation path with highest matching degree with the problem from the candidate relation set;
the query graph serialization module is used for serializing the query graph and uniformly coding the problems and the query graph sequence;
and the query graph ordering model construction module is used for ordering the query graphs in the candidate set according to the semantic similarity scores of the query graphs and the problems, and selecting the optimal query graph.
In a third aspect, the present invention provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method of the first aspect when the computer program is executed.
In a fourth aspect, the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method of the first aspect.
Compared with the prior art, the invention has the remarkable advantages that:
(1) In the relation detection model, attention interaction of the problem and the relation is realized by using a bidirectional attention mechanism, and the mutual influence of the problem and the relation is considered, so that the model focuses on the part of the problem and the relation which are related to each other, and the extraction of key information of the problem and the relation matching is facilitated.
(2) In the relation detection model, global features and local features are comprehensively considered, and semantic information lost in the aggregation operation is effectively supplemented by extracting the local features through local interaction.
(3) In the query graph ordering task, the query graph is serialized according to the main relation path and different constraint sub-paths, so that certain structural characteristics of the query graph are reserved, the defect caused by structural differences of the query graph and the problem is overcome, and the problem and the query graph can be uniformly coded.
(4) In the query graph ordering model, the BERT is utilized to encode the entity mention and the entity name of the character level representation, so that the robustness of the model for processing the unregistered words is enhanced, and the overall performance of the model is improved.
Drawings
FIG. 1 is a diagram of a relationship detection model.
FIG. 2 is a diagram of a query graph serialization process.
FIG. 3 is a diagram of a query graph ranking model.
Detailed Description
The technical scheme of the present invention will be clearly and completely described below with reference to the accompanying drawings. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by one of ordinary skill in the art without creative efforts, are within the scope of the present invention based on the embodiments of the present invention.
The invention provides two models for a relation detection task and a query graph ordering task on the basis of completing the natural language problem and the link of the knowledge graph entities so as to improve the accuracy and efficiency of knowledge graph questions and answers.
FIG. 1 is a block diagram of a model of relationship detection for the purpose of selecting a primary relationship path from a candidate relationship set that matches a problem to a highest degree, comprising the steps of:
(1) Problem and relationship preprocessing. The questions are represented in word-level, first using physically linked knotsIf the entity in the question is mentioned to be replaced by a universal mark<e>The question is then represented as a word sequence W by segmentation q . The candidate relation adopts two granularity expressions of word level and relation level, divides a plurality of relations forming a relation path into a word sequence and a relation name sequence, and combines W of the word sequence and the relation name sequence r As a relational representation.
(2) Question and relation coding. First, the word embedding layer is used to represent the problem sequence W q And the relationship represents the sequence W r Respectively mapped to its corresponding word embedding sequence E q And E is r The word embedding layer is initialized by using a Glove pre-training language model; then using two Bi-LSTM pairs to respectively R q And E is r And carrying out semantic coding to obtain semantic vector representations Q and R of the problem and the candidate relation.
(3) Implementing a bi-directional attention mechanism. Attention weights of problems and relationships affecting each other are first calculated
Figure BDA0004166027740000041
And->
Figure BDA0004166027740000042
Wherein (1)>
Figure BDA0004166027740000043
W a Is a trainable parameter matrix; then calculate the weighted sum of the questions and relations under the attentive mechanisms +.>
Figure BDA0004166027740000044
And->
Figure BDA0004166027740000045
Attention representation of the question and relation>
Figure BDA0004166027740000046
And->
Figure BDA0004166027740000047
Where n represents the number of words in the question and m representsNumber of words in the relationship sequence.
(4) A global comparison score is calculated. Respectively to
Figure BDA0004166027740000048
And->
Figure BDA0004166027740000049
Performing maximum pooling operation to obtain global semantic vector q of problems and relations g And r g And calculating cosine similarity of the two to obtain a global comparison score of the problem and the relation.
(5) A local interaction score is calculated. First sum Q and
Figure BDA00041660277400000410
the vectors at the corresponding positions are spliced to obtain an interaction vector sequence C; then, extracting local semantic features of the C by using a new Bi-LSTM to obtain a local feature representation T; and finally, reducing the dimension of T by utilizing a feedforward neural network and maximum pooling to obtain the local interaction score of the problem and the relation.
(6) And calculating semantic similarity. And adding the global comparison score and the local interaction score to obtain a semantic similarity total score of the problem and the relation.
In order to overcome the coding difficulty caused by structural difference between the query graph and the problem, the invention sequences the query graph before coding is implemented, and then uniformly codes the problem and the query graph sequence. FIG. 2 illustrates an exemplary serialization process for a query graph, comprising the following steps:
(1) Query graph splitting. Firstly, splitting a query graph into five parts according to constraint types defined in a query graph expansion method proposed by Luo et al: a main relationship path, an entity constraint sub-path, a type constraint sub-path, a time constraint sub-path and an ordinal constraint sub-path; the entity name in the main relationship path is then replaced with the universal label [ unused1], and the literal value of the answer node in the main relationship path is replaced with the answer type.
In question "who is the first team leader of chinese men after 2012? "for example, the splitting result of the corresponding query graph is shown in fig. 2, and because the problem is a multi-constraint complex problem, the query graph further comprises four types of constraints besides the main relationship path. The main relation path { athlete, basketball athlete } indicates the main topic of the question, the main relation path { title, captain } is used for restricting the answering entity, the type restriction sub-path { man } indicates the type of the answer, the time restriction sub-path { >,2012} restricts the time range of the question, and the ordinal restriction sub-path { incumbent time, earliest } corresponds to the constraint of 'first incumbent' in the question. The information of various sub-paths corresponds to the constraint in the problem one by one, and the semantic information of the problem is completely represented.
(2) Sub-path serialization. Traversing each sub-path by using depth-first search, and sequentially recording information of corresponding nodes and directed edges to obtain a serialization representation of each sub-path.
(3) The query graph is serialized in its entirety. And merging the sub-path sequences according to the sequence of the main relation path, the entity constraint sub-path, the type constraint sub-path, the time constraint sub-path and the ordinal constraint sub-path to obtain the serialization representation of the complete query graph.
FIG. 3 is a diagram of a model of query graph ranking, wherein the task of query graph ranking is to rank query graphs in a candidate set according to semantic similarity scores of the query graphs and questions, and the best query graph is selected, and the steps are as follows:
(1) Question and query graph coding. The problem and query graph sequences are encoded using the BERT model as an encoder. First the entity mention in the question is replaced by the universal label [ unused0 ]]And representing the question and query graph sequences as word sequences W q And W is q The method comprises the steps of carrying out a first treatment on the surface of the Then introduce special tags of BERT model [ CLS ]]And [ SEP ]]Will W q And W is g Inputting BERT model in the form of "sentence pair", inputting [ CLS ]]Corresponding output vector t g As an overall feature representation of the sequence of question and query graphs.
(2) And (5) entity coding. Since entity names are often proper nouns, word-level representations contain a large number of unregistered words, and thus use the character level in encoding an entityAnd (3) representing. First, the entity references in the question and the entity names in the query graph sequence are respectively represented as a character sequence W eq And W is eg The method comprises the steps of carrying out a first treatment on the surface of the Then add [ CLS ]]And [ SEP ]]Tagging, using BERT model pair "sentence pair" W eq And W is eg Encoding [ CLS ]]Corresponding output vector t e As an overall semantic representation of entity references and entity names.
(3) A comparison score is calculated. Firstly, splicing output vectors corresponding to [ CLS ] in the BERT models in the steps (1) and (2); then using a linear layer to reduce the dimension of the spliced vector; finally, a new linear layer is used to calculate the comparison scores of the questions and the query graph.
Based on the same inventive concept, the invention also provides a knowledge-graph-oriented query graph generation system, which comprises:
the relation detection model construction module is used for selecting a main relation path with highest matching degree with the problem from the candidate relation set;
the query graph serialization module is used for serializing the query graph and uniformly coding the problems and the query graph sequence;
and the query graph ordering model construction module is used for ordering the query graphs in the candidate set according to the semantic similarity scores of the query graphs and the problems, and selecting the optimal query graph.
The specific implementation method of each module is the same as the query graph generation method facing the knowledge graph, and is not repeated here.
The invention provides a new method for the query graph generation task in the knowledge graph question and answer, and improves the overall performance of the knowledge graph question and answer by completing each subtask in the query graph generation process, including relation detection, query graph serialization and query graph sequencing. In addition, compared with the prior art, the model provided by the invention is based on end-to-end matching, does not introduce the characteristic of manual design, and is simple to realize.

Claims (10)

1. The knowledge graph-oriented query graph generation method is characterized by comprising the following steps of:
constructing a relation detection model, and selecting a main relation path with highest matching degree with the problem from the candidate relation set;
serializing the query graph, and uniformly coding the problems and the query graph sequence;
and constructing a query graph ordering model, ordering the query graphs in the candidate set according to the semantic similarity scores of the query graphs and the questions, and selecting the optimal query graph.
2. The knowledge-graph-oriented query graph generation method of claim 1, wherein the building of the relation detection model selects a main relation path with the highest matching degree with the problem from the candidate relation set, and the method is specifically as follows:
(1) Problem and relationship preprocessing; the questions are expressed in word level, and the results of entity links are utilized first to replace the entity references in the questions with common marks<e>The question is then represented as a word sequence W by segmentation q The method comprises the steps of carrying out a first treatment on the surface of the The candidate relation adopts two granularity expressions of word level and relation level, divides a plurality of relations forming a relation path into a word sequence and a relation name sequence, and combines W of the word sequence and the relation name sequence r As a relational representation;
(2) Question and relationship coding; first, the word embedding layer is used to represent the problem sequence W q And the relationship represents the sequence W r Respectively mapped to its corresponding word embedding sequence E q And E is r The word embedding layer is initialized by using a Glove pre-training language model; then use two Bi-LSTM pairs E q And E is r Carrying out semantic coding to obtain semantic vector representations Q and R of the problem and the candidate relation;
(3) Implementing a bi-directional attention mechanism; attention weights of problems and relationships affecting each other are first calculated
Figure FDA0004166027680000011
And->
Figure FDA0004166027680000012
Wherein (1)>
Figure FDA0004166027680000013
q i ∈Q,r j ∈R,W a Is a trainable parameter matrix; then calculate the weighted sum of the questions and relations under the attentive mechanisms +.>
Figure FDA0004166027680000014
And->
Figure FDA0004166027680000015
n represents the number of words in the question, m represents the number of words in the sequence of relations, resulting in the attention representation of the question and the relation +.>
Figure FDA0004166027680000016
And->
Figure FDA0004166027680000017
(4) Calculating a global comparison score; respectively to
Figure FDA0004166027680000018
And->
Figure FDA0004166027680000019
Performing maximum pooling operation to obtain global semantic vector q of problems and relations g And r g Calculating cosine similarity of the two to obtain a global comparison score of the problem and the relation;
(5) Calculating a local interaction score; first sum Q and
Figure FDA00041660276800000110
the vectors at the corresponding positions are spliced to obtain an interaction vector sequence C; then, extracting local semantic features of the C by using a new Bi-LSTM to obtain a local feature representation T; finally, reducing the dimension of T by utilizing a feedforward neural network and maximum pooling to obtain a local interaction score of the problem and the relation;
(6) Calculating semantic similarity; and adding the global comparison score and the local interaction score to obtain a semantic similarity total score of the problem and the relation.
3. The knowledge-graph-oriented query graph generation method according to claim 1, wherein the query graph is serialized, and the problem and the query graph sequence are uniformly coded, and the specific steps are as follows:
(1) Splitting a query graph; firstly, splitting a query graph into five parts according to constraint types defined when the query graph is expanded: a main relationship path, an entity constraint sub-path, a type constraint sub-path, a time constraint sub-path and an ordinal constraint sub-path; then, the entity name in the main relation path is replaced by a universal mark [ unused1], and the literal value of the answer node in the main relation path is replaced by the answer type;
(2) Sub-path serialization; traversing each sub-path by using depth-first search, and sequentially recording information of corresponding nodes and directed edges to obtain a serialization representation of each sub-path;
(3) Serializing the whole query graph; and merging the sub-path sequences according to the sequence of the main relation path, the entity constraint sub-path, the type constraint sub-path, the time constraint sub-path and the ordinal constraint sub-path to obtain the serialization representation of the complete query graph.
4. The knowledge-graph-oriented query graph generation method of claim 1, wherein the constructing a query graph ordering model orders query graphs in a candidate set according to semantic similarity scores of the query graphs and the questions, and selects an optimal query graph, which is specifically as follows:
(1) Question and query graph coding; encoding the sequence of questions and query graphs using the BERT model as an encoder; first the entity mention in the question is replaced by the universal label [ unused0 ]]And representing the question and query graph sequences as word sequences W q And W is g The method comprises the steps of carrying out a first treatment on the surface of the Then introduce special tags of BERT model [ CLS ]]And [ SEP ]]Will W q And W is g Inputting BERT model in the form of "sentence pair", inputting [ CLS ]]Corresponding output vector t g As a problem anda global feature representation of the query graph sequence;
(2) Entity coding; first, the entity references in the question and the entity names in the query graph sequence are respectively represented as a character sequence W eq And W is eg The method comprises the steps of carrying out a first treatment on the surface of the Then add [ CLS ]]And [ SEP ]]Tagging, using BERT model pair "sentence pair" W eq And W is eg Encoding [ CLS ]]Corresponding output vector t e As an overall semantic representation of entity references and entity names;
(3) Calculating a comparison score; firstly, splicing output vectors corresponding to [ CLS ] in the BERT models in the steps (1) and (2); then using a linear layer to reduce the dimension of the spliced vector; finally, a new linear layer is used to calculate the comparison scores of the questions and the query graph.
5. A knowledge-graph-oriented query graph generation system, comprising:
the relation detection model construction module is used for selecting a main relation path with highest matching degree with the problem from the candidate relation set;
the query graph serialization module is used for serializing the query graph and uniformly coding the problems and the query graph sequence;
and the query graph ordering model construction module is used for ordering the query graphs in the candidate set according to the semantic similarity scores of the query graphs and the problems, and selecting the optimal query graph.
6. The knowledge-graph-oriented query graph generation system of claim 5, wherein the relationship detection model construction module is configured to select a main relationship path with a highest degree of matching with a problem from a candidate relationship set, and specifically comprises:
(1) Problem and relationship preprocessing; the questions are expressed in word level, and the results of entity links are utilized first to replace the entity references in the questions with common marks<e>The question is then represented as a word sequence W by segmentation q The method comprises the steps of carrying out a first treatment on the surface of the The candidate relation adopts two granularity expressions of word level and relation level, divides a plurality of relations forming a relation path into a word sequence and a relation name sequence, and divides the two into twoUnion of people W r As a relational representation;
(2) Question and relationship coding; first, the word embedding layer is used to represent the problem sequence W q And the relationship represents the sequence W r Respectively mapped to its corresponding word embedding sequence E q And E is r The word embedding layer is initialized by using a Glove pre-training language model; then use two Bi-LSTM pairs E q And E is r Carrying out semantic coding to obtain semantic vector representations Q and R of the problem and the candidate relation;
(3) Implementing a bi-directional attention mechanism; attention weights of problems and relationships affecting each other are first calculated
Figure FDA0004166027680000031
And->
Figure FDA0004166027680000032
Wherein (1)>
Figure FDA0004166027680000033
q i ∈Q,r j ∈R,W a Is a trainable parameter matrix; then calculate the weighted sum of the questions and relations under the attentive mechanisms +.>
Figure FDA0004166027680000034
And->
Figure FDA0004166027680000035
n represents the number of words in the question, m represents the number of words in the sequence of relations, resulting in the attention representation of the question and the relation +.>
Figure FDA0004166027680000036
And->
Figure FDA0004166027680000037
(4) Calculating a global comparison score; respectively to
Figure FDA0004166027680000038
And->
Figure FDA0004166027680000039
Performing maximum pooling operation to obtain global semantic vector q of problems and relations g And r g Calculating cosine similarity of the two to obtain a global comparison score of the problem and the relation;
(5) Calculating a local interaction score; first sum Q and
Figure FDA0004166027680000041
the vectors at the corresponding positions are spliced to obtain an interaction vector sequence C; then, extracting local semantic features of the C by using a new Bi-LSTM to obtain a local feature representation T; finally, reducing the dimension of T by utilizing a feedforward neural network and maximum pooling to obtain a local interaction score of the problem and the relation;
(6) Calculating semantic similarity; and adding the global comparison score and the local interaction score to obtain a semantic similarity total score of the problem and the relation.
7. The knowledge-graph-oriented query graph generation system of claim 5, wherein the query graph serialization module is configured to serialize a query graph, uniformly encode a problem and a query graph sequence, and specifically comprises:
(1) Splitting a query graph; firstly, splitting a query graph into five parts according to constraint types defined when the query graph is expanded: a main relationship path, an entity constraint sub-path, a type constraint sub-path, a time constraint sub-path and an ordinal constraint sub-path; then, the entity name in the main relation path is replaced by a universal mark [ unused1], and the literal value of the answer node in the main relation path is replaced by the answer type;
(2) Sub-path serialization; traversing each sub-path by using depth-first search, and sequentially recording information of corresponding nodes and directed edges to obtain a serialization representation of each sub-path;
(3) Serializing the whole query graph; and merging the sub-path sequences according to the sequence of the main relation path, the entity constraint sub-path, the type constraint sub-path, the time constraint sub-path and the ordinal constraint sub-path to obtain the serialization representation of the complete query graph.
8. The knowledge-graph-oriented query graph generation system of claim 5, wherein the query graph ranking model construction module ranks query graphs in the candidate set according to semantic similarity scores of the query graphs and the questions, and selects an optimal query graph, specifically:
(1) Question and query graph coding; encoding the sequence of questions and query graphs using the BERT model as an encoder; first the entity mention in the question is replaced by the universal label [ unused0 ]]And representing the question and query graph sequences as word sequences W q And W is g The method comprises the steps of carrying out a first treatment on the surface of the Then introduce special tags of BERT model [ CLS ]]And [ SEP ]]Will W q And W is g Inputting BERT model in the form of "sentence pair", inputting [ CLS ]]Corresponding output vector t g As an overall feature representation of the sequence of question and query graphs;
(2) Entity coding; first, the entity references in the question and the entity names in the query graph sequence are respectively represented as a character sequence W eq And W is eg The method comprises the steps of carrying out a first treatment on the surface of the Then add [ CLS ]]And [ SEP ]]Tagging, using BERT model pair "sentence pair" W eq And W is eg Encoding [ CLS ]]Corresponding output vector t e As an overall semantic representation of entity references and entity names;
(3) Calculating a comparison score; firstly, splicing output vectors corresponding to [ CLS ] in the BERT models in the steps (1) and (2); then using a linear layer to reduce the dimension of the spliced vector; finally, a new linear layer is used to calculate the comparison scores of the questions and the query graph.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any one of claims 1-4 when the computer program is executed.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method according to any one of claims 1-4.
CN202310363988.7A 2023-04-06 2023-04-06 Knowledge graph-oriented query graph generation method and system Pending CN116383357A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310363988.7A CN116383357A (en) 2023-04-06 2023-04-06 Knowledge graph-oriented query graph generation method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310363988.7A CN116383357A (en) 2023-04-06 2023-04-06 Knowledge graph-oriented query graph generation method and system

Publications (1)

Publication Number Publication Date
CN116383357A true CN116383357A (en) 2023-07-04

Family

ID=86964185

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310363988.7A Pending CN116383357A (en) 2023-04-06 2023-04-06 Knowledge graph-oriented query graph generation method and system

Country Status (1)

Country Link
CN (1) CN116383357A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117194633A (en) * 2023-09-12 2023-12-08 河海大学 Dam emergency response knowledge question-answering system based on multi-level multipath and implementation method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117194633A (en) * 2023-09-12 2023-12-08 河海大学 Dam emergency response knowledge question-answering system based on multi-level multipath and implementation method

Similar Documents

Publication Publication Date Title
CN113128229B (en) Chinese entity relation joint extraction method
Xie et al. Representation learning of knowledge graphs with entity descriptions
CN111611361A (en) Intelligent reading, understanding, question answering system of extraction type machine
CN111930906A (en) Knowledge graph question-answering method and device based on semantic block
CN113486667A (en) Medical entity relationship joint extraction method based on entity type information
CN111897944B (en) Knowledge graph question-answering system based on semantic space sharing
CN110232113B (en) Method and system for improving question and answer accuracy of knowledge base
CN116097250A (en) Layout aware multimodal pre-training for multimodal document understanding
CN112015868A (en) Question-answering method based on knowledge graph completion
CN116127095A (en) Question-answering method combining sequence model and knowledge graph
CN117171333A (en) Electric power file question-answering type intelligent retrieval method and system
Zhang et al. Hierarchical scene parsing by weakly supervised learning with image descriptions
CN112115253A (en) Depth text ordering method based on multi-view attention mechanism
CN113779220A (en) Mongolian multi-hop question-answering method based on three-channel cognitive map and graph attention network
CN112632250A (en) Question and answer method and system under multi-document scene
CN115203421A (en) Method, device and equipment for generating label of long text and storage medium
CN116127090A (en) Aviation system knowledge graph construction method based on fusion and semi-supervision information extraction
CN112883199A (en) Collaborative disambiguation method based on deep semantic neighbor and multi-entity association
CN115759092A (en) Network threat information named entity identification method based on ALBERT
CN118227769B (en) Knowledge graph enhancement-based large language model question-answer generation method
CN114117000A (en) Response method, device, equipment and storage medium
CN117574898A (en) Domain knowledge graph updating method and system based on power grid equipment
CN116383357A (en) Knowledge graph-oriented query graph generation method and system
CN116521887A (en) Knowledge graph complex question-answering system and method based on deep learning
CN116821291A (en) Question-answering method and system based on knowledge graph embedding and language model alternate learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination