CN118170894A

CN118170894A - Knowledge graph question-answering method, knowledge graph question-answering device and storage medium

Info

Publication number: CN118170894A
Application number: CN202410607312.2A
Authority: CN
Inventors: 鄂海红; 罗浩然; 孙博宇; 宋美娜; 汤子辰; 彭诗耀; 张文泰; 麻程昊; 朱一凡; 王苏宏
Original assignee: Beijing Baixinghua Technology Co ltd; Beijing University of Posts and Telecommunications
Current assignee: Beijing Baixinghua Technology Co ltd; Beijing University of Posts and Telecommunications
Priority date: 2024-05-16
Filing date: 2024-05-16
Publication date: 2024-06-11
Anticipated expiration: 2044-05-16
Also published as: CN118170894B

Abstract

The invention relates to the technical field of information, in particular to a knowledge graph question-answering method, a knowledge graph question-answering device, knowledge graph question-answering equipment and a computer storage medium. According to the knowledge graph question-answering method, firstly, the knowledge graph query statement applied to the knowledge graph is generated by using the trimmed large language model, and the large language model has great advantages in the aspect of processing natural language problems and can quickly and accurately generate training data. And then searching the entity and the relation extracted from the generated query sentence in an entity library and a relation library of the knowledge graph, performing further fuzzy logic operation on the searched result according to the similarity, and even if the knowledge graph is incomplete, obtaining the answer with highest possibility by the fuzzy logic operation, reducing the time complexity, improving the accuracy, and finally generating the answer sentence in the natural language form for the user.

Description

Knowledge graph question-answering method, knowledge graph question-answering device and storage medium

Technical Field

The invention relates to the technical field of information, in particular to a knowledge graph question-answering method, a knowledge graph question-answering device, knowledge graph question-answering equipment and a computer storage medium.

Background

Knowledge graph questions and answers refer to techniques for parsing questions in natural language form and providing accurate answers using specially constructed knowledge graphs. By mapping key entities and relationships in the questions to predefined structured data in the atlas, it is able to understand the actual intent of the questions and provide informative, accurate answers by querying the relevant data in the atlas. The technology is widely applied to intelligent searching, virtual assistants and decision support. Currently, many techniques and methods have emerged to implement knowledge-graph questions and answers, some of which include a semantic understanding-based question and answer model, a graph matching-based question and answer model, and an information retrieval-based question and answer model. However, despite these techniques, there are still problems that plague current knowledge-graph question-answering systems.

(1) The knowledge graph query statement generation effect is poor, and the training set quality is poor. Because of the complexity of natural language and the diversity of inquiry questions presented by people, the current knowledge graph question-answering system has unsatisfactory effect when processing the complex natural language questions, is easily influenced by ambiguity and polysemous words, and causes low quality of the generated training set and can not fine tune the large model well.

(2) Classical logic is used for knowledge graph question answering in the retrieval process, and has high complexity and insufficient accuracy. Although the traditional logic-based reasoning method can process natural language problems to a certain extent, in the face of a large-scale knowledge graph, accuracy cannot be effectively ensured due to higher complexity and high calculation cost.

(3) Most of the current common knowledge maps have the problems of insufficiency and sparsity. The information in the knowledge graph is usually extracted from the structured data, but a large amount of blank and missing information exists in the knowledge graph due to the limitation of data sources and the incompleteness of knowledge graph construction. In practical applications, this sparsity can pose challenges to the validity and usability of the knowledge graph, and also cause a significant problem in terms of time complexity in conventional graph queries.

In summary, the prior art has the problems of low accuracy and high time complexity.

Disclosure of Invention

Therefore, the invention aims to solve the technical problems of low accuracy and high time complexity in the prior art.

In order to solve the technical problems, the invention provides a knowledge graph question-answering method, which comprises the following steps:

acquiring a natural language problem proposed by a user;

inputting the natural language problem and the query instruction into a trained large language model to obtain a query logic expression, and linking the query logic expression with a knowledge base to obtain a knowledge graph query statement;

Calculating the similarity between each entity in the knowledge graph query statement entity set and the corresponding knowledge graph entity, and outputting the query statement set in the form of entity-similarity pairs;

using the query statement set as a fuzzy set of a head entity, predicting a fuzzy set of a tail entity by using a graph neural network, and carrying out fuzzy entity reasoning based on the fuzzy set of the tail entity;

And outputting an inference step and a natural language answer according to the intermediate variable and the inference result of each step in the inference process by using the large language model.

Preferably, the training process of the large language model includes:

acquiring a natural language question set;

Converting sample data in a natural language problem set into a corresponding query logic expression, and replacing abstract entity ids in the query logic expression with meaningful entity tags;

Taking natural language questions as input, taking a query logic expression in a specified format as output, and combining a query instruction to construct a training data set;

And training the large language model by using the training data set.

Preferably, the calculating the similarity between each entity in the knowledge-graph query statement entity set and the corresponding knowledge-graph entity, and outputting the query statement set in the form of entity-similarity pairs includes:

extracting entities from the knowledge graph query statement to obtain a knowledge graph query statement entity set;

calculating the similarity between each entity in the knowledge graph query statement entity set and the corresponding knowledge graph entity;

screening out the entities with the similarity larger than a preset threshold value, and placing the entities in query sentences in the form of entity-similarity pairs;

And ordering the entities in the entity set by taking the similarity as a standard, and reserving entity-similarity pairs corresponding to the first n entities with the highest similarity, wherein n is a preset constant, so as to obtain a final query statement set.

Preferably, the predicting the fuzzy set of the tail entity by using the graph neural network with the query statement set as the fuzzy set of the head entity includes:

embedding each head entity through the associated relation to obtain a corresponding tail entity.

Preferably, the fuzzy entity reasoning based on the fuzzy set of the tail entity comprises;

Defining a credibility range for each tail entity through a similarity function, calculating rule similarity by using a fuzzy inference engine, and performing inference operation by using a fuzzy logic operator to finally obtain a fuzzy result;

And performing the fold blurring on the blurring result to obtain an accurate result.

Preferably, the outputting the reasoning step and the natural language answer by using the large language model according to the intermediate variable and the reasoning result of each step in the reasoning process comprises:

and displaying the first k reasoning results with highest intermediate variable and rule similarity in each step of the reasoning process in a preset form by using a large language model, wherein k is a preset constant.

and converting the reasoning result in each step of the reasoning process into an answer segment set expressed in natural language, fusing process context information, and generating natural language text through a large language model.

The invention also provides a knowledge graph question-answering device, which comprises:

the problem acquisition module is used for acquiring natural language problems proposed by users;

The query sentence generation module is used for inputting the natural language problem and the query instruction into a trained large language model to obtain a query logic expression, and linking the query logic expression with a knowledge base to obtain a knowledge graph query sentence;

The entity relationship similarity calculation module is used for calculating the similarity between each entity in the knowledge graph query statement entity set and the corresponding knowledge graph entity and outputting a query statement set in the form of entity-similarity pairs;

the fuzzy logic executing module is used for taking the query statement set as a fuzzy set of a head entity, predicting a fuzzy set of a tail entity by using a graph neural network, and carrying out fuzzy entity reasoning based on the fuzzy set of the tail entity;

and the explanatory answer module is used for outputting an reasoning step and a natural language answer according to the intermediate variable and the reasoning result of each step in the reasoning process by using the large language model.

The invention also provides knowledge graph question answering equipment, which comprises:

A memory for storing a computer program;

and the processor is used for realizing the steps of the knowledge graph question-answering method when executing the computer program.

The invention also provides a computer readable storage medium, wherein the computer readable storage medium stores a computer program, and the computer program realizes the steps of the knowledge graph question-answering method when being executed by a processor.

Compared with the prior art, the technical scheme of the invention has the following advantages:

According to the knowledge graph question-answering method, firstly, the knowledge graph query statement applied to the knowledge graph is generated by using the trimmed large language model, and the large language model has great advantages in the aspect of processing natural language problems and can quickly and accurately generate training data. And then searching the entity and the relation extracted from the generated query sentence in an entity library and a relation library of the knowledge graph, performing further fuzzy logic operation on the searched result according to the similarity, and even if the knowledge graph is incomplete, obtaining the answer with highest possibility by the fuzzy logic operation, reducing the time complexity, improving the accuracy, and finally generating the answer sentence in the natural language form for the user.

Drawings

In order that the invention may be more readily understood, a more particular description of the invention will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings, in which:

FIG. 1 is a flow chart of an implementation of a knowledge graph question-answering method provided by the invention;

FIG. 2 is a schematic diagram of a process for converting a natural language question into a query logic expression;

FIG. 3 is a schematic diagram of a process for generating a knowledge-graph query statement from natural language questions;

FIG. 4 is a schematic diagram of a process for calculating similarity of entity relationships;

FIG. 5 is a schematic diagram of a fuzzy logic query process;

fig. 6 is a schematic structural diagram of a knowledge graph question-answering device according to an embodiment of the present invention.

Detailed Description

The core of the invention is to provide a knowledge graph question-answering method, a device, equipment and a computer storage medium, which effectively reduce the time complexity and improve the accuracy.

In order to better understand the aspects of the present invention, the present invention will be described in further detail with reference to the accompanying drawings and detailed description. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1, fig. 1 is a flowchart illustrating an implementation of a knowledge graph question-answering method provided by the present invention; the specific operation steps are as follows:

s101, acquiring a natural language problem proposed by a user;

S102, inputting the natural language problem and the query instruction into a trained large language model to obtain a query logic expression, and linking the query logic expression with a knowledge base to obtain a knowledge graph query statement;

S103, calculating the similarity between each entity in the knowledge graph query statement entity set and the corresponding knowledge graph entity, and outputting the query statement set in the form of entity-similarity pairs;

S104, taking the query statement set as a fuzzy set of a head entity, predicting a fuzzy set of a tail entity by using a graph neural network, and carrying out fuzzy entity reasoning based on the fuzzy set of the tail entity;

s105, utilizing a large language model, and outputting an reasoning step and a natural language answer according to the intermediate variable and the reasoning result of each step in the reasoning process.

The knowledge graph question-answering method provided by the invention is a knowledge graph question-answering framework which uses the retrieval-execution of the fine-tuned open-source large model. Firstly, the framework of the proposed knowledge graph question-answering method needs to carry out high-efficiency fine tuning on the selected open-source large model in the knowledge graph question-answering data set (natural language problem, knowledge graph query logic expression) through a guiding instruction. The trimmed large model converts the new natural language problem into a corresponding knowledge graph query logic expression through semantic analysis. Then, the provided knowledge graph question-answering method searches the entities and relations in the knowledge graph query logic expression, and after converting the entities and relations into query sentences in a given format, the answers and the similarities are searched in the knowledge graph. Finally, fuzzy logic reasoning is carried out by obtaining the final answer set and the similarity, and an accurate answer with interpretability is generated and fed back to the user in a natural language form.

Based on the above embodiment, the training process of the large language model includes:

Acquiring a natural language question set;

Converting sample data in the natural language problem set into corresponding query logic expressions, and replacing abstract entity ids in the query logic expressions with meaningful entity tags so that a large language model can understand entity meanings;

Taking natural language questions as input, taking query logic expressions in a specified format as output, and combining query instructions to construct a training data set;

Training a large language model using the training dataset.

For example, as shown in fig. 2, natural language questions: "which drugs have a synergistic effect with drugs that are compatible with hypertension and heart failure but do not fall in the renal failure tabu? "; graph query logic expression: "(JOIN synergy (AND (JOIN adapted hypertension) (JOIN adapted heart failure)) (NOT (JOIN tabu renal failure))))))"); instructions to: "generates a logical form query for retrieving information related to a given problem. "this constitutes training data that instructs to fine tune a large language model. According to the logic of FIG. 2, a set of data sets is constructed to learn a large language model to finish fine tuning of the large model.

Based on the above embodiment, step S102 will be described in detail:

the large language model has strong capability of generating a logic expression after fine tuning, natural language problems and instructions are input into the large model together in the application process of the knowledge graph query statement generation module to obtain a query logic expression, and then the query logic expression is linked with a knowledge base to obtain an executable knowledge graph query statement. As shown in fig. 3, the trimmed large model receives the natural language problem proposed by the user, extracts the entity and the relation according to the problem, generates a logic form, and generates a possible knowledge graph query statement according to the information in the knowledge base. The knowledge graph query statement generation module transmits possible knowledge graph query statements output by the large model to the entity relationship similarity calculation module.

Based on the above embodiment, step S103 will be described in detail:

The method used by the invention is based on the idea of contrast learning, namely that vectors of similar samples are closer in space, and dissimilar sentence vectors are farther in space. In the computation process, we first use a neural network to represent each sentence as a vector, then use a distance metric to measure the similarity between the vectors, and based on the computed similarity we update the representation of the vectors using a loss function, which makes similar sentences closer. The iteration is repeated for a plurality of times until a satisfactory result is obtained.

calculating the similarity between each entity in the knowledge-graph query statement entity set and the corresponding knowledge-graph entity:

The entity similarity calculation formula is as follows: Wherein/> Is an entity similarity score,/>Is an entity set,/>Is a knowledge graph, and the function can be represented by a method of measuring similarity between vectors such as cosine similarity or Euclidean distance.

As shown in fig. 4, the embodiment extracts the entity set and calculates the similarity according to the input knowledge-graph query statement, and then the final entity set and the similarity obtained after the calculation is completed.

As shown in fig. 5, based on the above embodiment, step S104 will be described in detail:

The step of predicting the fuzzy set of the tail entity by using the graph neural network by taking the query statement set as the fuzzy set of the head entity comprises the following steps:

Using a graph neural network, predicting the fuzzy set of the tail entity under the condition of the head entity of the given fuzzy set and the associated relation thereof by using the input final entity set and the similarity as the fuzzy set of the head entity, wherein the logic operation is converted into the product fuzzy logic operation on the fuzzy set, and the operations meet the logic rule and are easy to distinguish;

Embedding each head entity through a given relation to obtain a corresponding tail entity, wherein the fuzzy logic projection operation is as follows:

Where P is the projection operator, q is the entity embedding, And/>The weight matrix and the bias vector of the relation r, respectively.

The fuzzy entity reasoning based on the fuzzy set of the tail entity comprises;

Defining a credibility range for each tail entity through a similarity function, calculating rule similarity by using a fuzzy inference engine, AND performing inference operation by using fuzzy logic operators (such as fuzzy AND AND fuzzy OR) to finally obtain a fuzzy result; wherein the and or operation of the fuzzy logic is as follows (C, D, N represent and, or, not operation, respectively):

Wherein, Is the multiplication operator of the element.

The method of the present embodiment provides a set of assignments for each intermediate variable, thereby ensuring the interpretability in the process of multiple inferences.

Based on the above embodiment, step S105 will be described in detail:

The method for answering the knowledge graph provided by the invention adopts a retrieval result obtained by final execution to answer the questions provided by the user in a natural language mode, and simultaneously gives out a knowledge graph query logic expression and steps in the knowledge graph reasoning process so as to ensure the interpretation of the answers.

The step of outputting the reasoning step and the natural language answer by using the large language model according to the intermediate variable and the reasoning result of each step in the reasoning process comprises the following steps:

Using a large language model, displaying the first k reasoning results with highest intermediate variable and rule similarity in each step of the reasoning process in a preset form, wherein k is a preset constant:

based on a prompt algorithm, an entity set obtained in each step of an reasoning process and the similarity thereof are taken as parameters to be transmitted, the top k results with the highest similarity in each step are displayed as results to be output according to a similarity rule, then a large language model interface is called, and a clear and smooth answer which accords with natural language logic is generated according to a given prompt word, such as a query statement given by me, an intermediate result and a final result of reasoning in each step, a reasoning step in a knowledge graph is displayed in a form of a table and an arrow, and finally a reasoning result is given.

Converting the reasoning result in each step of the reasoning process into an answer segment set expressed in natural language, fusing process context information, and generating natural language text through a large language model, wherein the formula of the process mainly comprises: Wherein/> Is the answer obtained by fuzzy logic reasoning,/>Converting the answers obtained by reasoning into answer fragment sets which can be expressed by natural language, wherein the function can use grammar tree and other structures to construct sentence fragments with reasonable grammar, and can predefine a response template and insert related data according to actual conditions; Wherein/> Is the result of combining answer pieces and process context information,/>Step information and context information, including a collection of entities, relationships, and their similarity scores, the function may use reference resolution to determine the specific object or concept to which the pronoun or hint word refers, and perform consistency verification after generation; finally pass/>And finally generating natural language answers, wherein the function can generate smooth natural language texts through a large model, and the text quality is improved by using a text post-processing technology.

The method of the embodiment can show the answers of the questions, and simultaneously shows the reasoning process for the user, so that the user can easily understand and accept the questions, and the norms and the rationales of the knowledge graph question-answering system are displayed, thereby effectively avoiding the occurrence of the illusion of the large model.

In the process of solving the knowledge graph question-answering task, the method comprises three core stages: the initial stage includes applying semantic parsing technology to identify and extract the relation between the key entities in the question and their associated relation, and integrating these information to construct the logic conforming knowledge graph query expression. These expressions are then converted into query statements that can be run directly in the knowledge-graph system. Next, in the framework of fuzzy logic reasoning, the system evaluates the similarity of these query sentences and performs the query in conjunction with the reasoning mechanism, thereby producing an accurate answer. In this process, the invention not only provides question-answer replies, but also presents the reasoning path followed by the questions to the user in a natural language form, so that the answer generation process is transparent and verifiable. In addition, the method converts the complex natural language questions presented by the user into a structured form which can be understood by a machine, thereby ensuring that the question-answering system can give answers with high accuracy and interpretability. Through the conversion process, the understanding capability of deep meaning of the natural language problem is improved, and the processing capability of the knowledge graph question-answering system on complex problems is enhanced, so that the knowledge graph question-answering system becomes a powerful tool for acquiring and utilizing a large-scale structured knowledge set.

As shown in fig. 6, the embodiment of the invention further provides a knowledge graph question-answering device; the specific apparatus may include:

The knowledge graph question-answering device of the present embodiment is used to implement the foregoing knowledge graph question-answering method, so that the specific implementation of the knowledge graph question-answering device may be part of the foregoing knowledge graph question-answering method, for example, the question obtaining module, the query statement generating module, the entity relationship similarity calculating module, the fuzzy logic executing module, and the explanatory answer module are respectively used to implement steps S101, S102, S103, S104, and S105 in the foregoing knowledge graph question-answering method, so that the specific implementation thereof may refer to the description of the corresponding embodiments of each part and will not be repeated herein.

The specific embodiment of the invention also provides knowledge graph question answering equipment, which comprises the following steps: a memory for storing a computer program; and the processor is used for realizing the steps of the knowledge graph question-answering method when executing the computer program.

The specific embodiment of the invention also provides a computer readable storage medium, wherein the computer readable storage medium is stored with a computer program, and the computer program realizes the steps of the knowledge graph question-answering method when being executed by a processor.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It is apparent that the above examples are given by way of illustration only and are not limiting of the embodiments. Other variations and modifications of the present invention will be apparent to those of ordinary skill in the art in light of the foregoing description. It is not necessary here nor is it exhaustive of all embodiments. While still being apparent from variations or modifications that may be made by those skilled in the art are within the scope of the invention.

Claims

1. The knowledge graph question-answering method is characterized by comprising the following steps of:

acquiring a natural language problem proposed by a user;

2. The knowledge-graph question-answering method according to claim 1, wherein the training process of the large language model includes:

acquiring a natural language question set;

And training the large language model by using the training data set.

3. The knowledge-graph question-answering method according to claim 1, wherein the calculating the similarity between each entity in the knowledge-graph query sentence entity set and the corresponding knowledge-graph entity, and outputting the query sentence set in the form of entity-similarity pairs, comprises:

4. The knowledge-graph question-answering method according to claim 1, wherein the predicting the fuzzy set of the tail entity using the graph neural network with the query statement set as the fuzzy set of the head entity comprises:

5. The knowledge-graph question-answering method according to claim 1, wherein the fuzzy entity inference based on the fuzzy set of tail entities includes;

6. The knowledge-graph question-answering method according to claim 1, wherein the outputting the reasoning step and the natural language answer based on the intermediate variables and the reasoning results of each step in the reasoning process using the large language model comprises:

7. The knowledge-graph question-answering method according to claim 1, wherein the outputting the reasoning step and the natural language answer based on the intermediate variables and the reasoning results of each step in the reasoning process using the large language model comprises:

8. The knowledge graph question-answering device is characterized by comprising:

9. A knowledge graph question-answering apparatus, characterized by comprising:

A memory for storing a computer program;

A processor for implementing the steps of a knowledge graph question-answering method according to any one of claims 1 to 7 when the computer program is executed.

10. A computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, which when executed by a processor, implements the steps of a knowledge-graph question-answering method according to any one of claims 1 to 7.