CN117271799A

CN117271799A - Knowledge graph-based multi-round question answering method and system

Info

Publication number: CN117271799A
Application number: CN202311244745.8A
Authority: CN
Inventors: 陈莹; 何健军; 崔莹; 代翔; 雋兆波; 张剑; 李春豹; 霍志浩
Original assignee: CETC 10 Research Institute
Current assignee: CETC 10 Research Institute
Priority date: 2023-09-25
Filing date: 2023-09-25
Publication date: 2023-12-22

Abstract

The invention discloses a knowledge graph-based multi-round question and answer method and a system, wherein the method comprises the following steps: carrying out semantic similarity comparison on the entity input by the identification user and the knowledge-graph entity node, and returning the most similar node name; when a user inputs a keyword, the question-answering system recommends an entity for the user based on a knowledge graph base, and guides the user to select an interested target; in the process of multi-round interaction between the user and the system, after the user definitely focuses on the entity, the question-answering system recommends the related problems of the entity for the user based on a historical question-answering library and a knowledge graph library, and the user is assisted to further definitely ask for the intention and the direction; in the multi-round interactive dialogue, different question-answering strategies are adopted by the question-answering system aiming at different question types proposed by users. The invention improves the accuracy of questions and answers.

Description

Knowledge graph-based multi-round question answering method and system

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a knowledge-graph-based multi-round question answering method and system.

Background

With the increasing complexity of internet data, search engines have been difficult to meet the customization needs of professionals in the field, and with the rapid development of technologies such as artificial intelligence, intelligent question-answering systems have been developed. The question-answering system is a software system which can give out the most accurate answer to the questions presented by the user by using natural language, and is a question-answering system which can accurately position the relevant knowledge of the questions of the user by establishing a question-answer form through artificial intelligence, knowledge graph, knowledge base construction and other technologies and can provide personalized information service for the user by interacting with the user. At present, intelligent question answering is one of research hotspots in the field of natural language processing, and the application of the intelligent question answering is quite wide, such as the fields of medical systems, electronic commerce, intelligent home and the like. In the intelligent question-answering system, compared with a traditional question-answering database, the knowledge graph visually displays the core structure of knowledge and the association relationship between knowledge, and more intuitively displays the data relationship and the data characteristics, so that more and more fields are used as intelligent question-answering data sources to realize intelligent question-answering application by constructing the special field knowledge graph.

Knowledge maps can be seen as structured representations of knowledge, consisting of triples (subjects, predicates, objects) to represent entities and semantic relationships that exist between entities. The appearance of the knowledge graph brings new possibility to the intelligent question-answering system, and the knowledge graph question-answering implementation flow is based on given natural language questions, the question entities and semantic relations are identified, then the question entities and semantic relations are associated to the knowledge graph, and answers are retrieved from the knowledge graph and returned. At present, the question and answer research methods based on the knowledge graph are mainly divided into two categories, one category is question and answer based on semantic analysis, and the other category is question and answer based on information retrieval, wherein the question and answer based on the retrieval type is currently the mainstream method of the question and answer of the knowledge graph.

The single-round question-answering technology based on the knowledge graph is relatively mature, natural language questions are converted into query sentences corresponding to the knowledge graph through understanding and analyzing the semantics of the questions, and related triplet answers are queried from the knowledge graph. However, the research of the multi-round question-answering technology based on the knowledge graph is less at present, so the following defects exist:

firstly, the knowledge graph question-answer corpus in the professional field is small in scale, corpus information is incomplete, quality is low and the like, and when a user asks questions, the knowledge graph base does not contain a question answer, so that a system cannot return a correct answer, and related content is not recommended to the user, thereby influencing the use experience of the user;

secondly, the knowledge graph base is used as a data background support of the question-answering system, and in the question-answering interaction process, the situation that the user questions are more general and generalized can exist, so that the system cannot accurately understand the question purpose of the user, and therefore the system cannot return answers correctly at one time, wrong answers can be returned, and the question-answering accuracy is reduced.

Disclosure of Invention

In view of the above, the invention provides a knowledge-graph-based multi-round question-answering method and system, which provide guidance for users in the multi-round interaction process, so as to accurately position the user demands and return the attention content of the users.

The invention discloses a knowledge graph-based multi-round question-answering method, which comprises the following steps:

step 1: carrying out semantic similarity comparison on the entity input by the identification user and the knowledge-graph entity node, and returning the most similar node name; when a user inputs a keyword, the question-answering system recommends an entity for the user based on a knowledge graph base, and guides the user to select an interested target;

step 2: in the process of multi-round interaction between the user and the system, after the user definitely focuses on the entity, the question-answering system recommends the related problems of the entity for the user based on a historical question-answering library and a knowledge graph library, and the user is assisted to further definitely ask for the intention and the direction;

step 3: in the multi-round interactive dialogue, different question-answering strategies are adopted by the question-answering system aiming at different question types proposed by users.

Further, the step 1 includes:

step 11: aiming at a character string S input by a user, the question-answering system directly calls a knowledge graph question-answering algorithm to return an answer; or, keyword segmentation and entity recognition processing are carried out on the character string S;

step 12: recommending the entity concerned to the user through the knowledge graph data based on the plurality of entities returned in the step 11; traversing knowledge-graph entity node _i Respectively carrying out similar comparison with each entity, and utilizing a BERT model to node the map nodes _i And entity ner _j J epsilon (1, 2, 3) vectorization representation, and vectorization results are x= { x respectively ₁ ,...,x _n Sum y= { y ₁ ,...,y _n Selecting a similarity calculation method to measure similarity values among vectors;

step 13: sorting the similar results of all traversals of step 12, and screening out the entity ner ₁ ,er ₂ ,er ₃ All nodes with similarity values larger than a preset threshold value return by taking the first K node names as user attention entities, and if the number of the nodes is less than K, returning all the nodes;

step 14: step 13, the first K node names returned by the step are entities related to the user input content, and the system recommends the first K node names as user attention entities so that the user can clearly inquire the main body; the user can select to continue to carry out multi-round interactive questioning on the returned entity, and other keywords or complete questions can also be input to carry out questioning; if other keywords are input, the question-answering system continues to recommend related entities; if a complete question is entered, the system may return the answer to the question directly.

Further, the step 11 includes:

aiming at a character string S input by a user, judging whether the character string S is a complete question from the length of the character string S, whether the character string S contains an entity and the entity relation, if the character string S contains the entity and the entity relation, the S is the complete question, and a knowledge graph question-answering algorithm is directly called by a question-answering system to return an answer; if the character string S is an incomplete question, keyword segmentation and entity identification processing are carried out on the character string S;

when the user asks for different keywords at specific symbol intervals, the different keywords are divided according to the specific symbols, and the character string S may be divided into S ₁ ,s ₂ ,s ₃ The method comprises the steps of carrying out a first treatment on the surface of the Recall named entity recognition algorithm to identify the entity ner contained in the keyword ₁ ,ner ₂ ,ner ₃ The method comprises the steps of carrying out a first treatment on the surface of the Special symbols include spaces and break-ins.

Further, the step 2 includes:

the problem recommendation based on the user attention entity comprises problem recommendation based on a history question-answer library, a hot point question library and a knowledge graph; according to the accurate search of the user attention entity in the history question-answering library and the hot spot question library respectively, returning the user history question and the hot spot question related to the entity as question recommendation; the problem recommendation based on the knowledge graph is to associate an entity focused by a user to a certain node in the knowledge graph, search the related triples of the node, and generate a plurality of types of problems focused by the user as recommendation by using a model; the user is guided to refine the explicit question intention step by step.

Further, the step 2 specifically includes:

recommending questions based on a historical question-answering library: the question-answering system records the interactive dialogue process of all users and the system, including user ID, question time, user question content, system return answer and user feedback modification;

the system stores the answers or the answer pairs of the questions after user feedback modification in a historical question-answer library; the user history question-answering library records the normal query direction, interested contents or targets of the user, in the multi-round interaction process, the user ID and the user focused entity name are used as keywords to accurately search the history question-answering library, the similarity is ordered, the first L questions are selected from the searched questions to be used as a question recommendation list RQ1, and if the search is empty, the RQ1 returns to the empty list.

Further, the step 2 further specifically includes:

based on hot spot problem library recommendation: aiming at the questions, the question answering system takes the questions with higher question frequency of the user as hot spot questions according to different time periods, and stores the hot spot questions in a hot spot question library; the hot spot problem library can reflect the recent attention content of the user; in the interactive dialogue process, after a user definitely inquires about a main body, the main body name is used as a keyword, hot spot problems related to the main body are accurately searched out based on a hot spot problem library, the hot spot problems are sorted according to different time periods, the first problem ranked in the different time periods is selected as a problem recommendation list RQ2 respectively, and if the search is empty, the RQ2 returns to an empty list.

Further, the step 2 further specifically includes:

generating problem recommendation based on knowledge graph: after the user pays attention to the entity, inquiring whether the knowledge graph base contains entity information or not based on the existing knowledge graph base; if not, returning the problem recommendation list RQ3 to be empty; if the knowledge graph base has entity information, correlating the user concerned entity with the graph node, and inquiring all correlated triplet information centering on the graph node; with the BERT-based pre-training model, with the triplet information and the user attention entity as inputs, the model will output a question list with the user attention entity as the subject of the question, and return the question list as a recommendation list RQ 3.

Further, the step 2 further specifically includes:

in the multi-round interaction process, a problem list recommended to a user based on a user attention entity is finally set to be RQ, RQ=RQ1+RQ2+RQ3, namely, related problems are recommended to the user based on a history question-answering library, a hot spot question library and a knowledge graph library; the user can select the questions recommended by the system to continue to carry out interactive questioning, and the system returns answers; after the system returns the answer, the multi-round interaction process is not terminated, and the user can select other recommended questions to ask questions or input other keywords or questions.

Further, the step 3 includes:

storing each dialogue of a user, storing the entity of each dialogue in a dialogue entity library, updating dialogue content of the dialogue entity library along with the user, automatically replacing the pronoun by the entity name of the last question when the question appears as a reference word in the interaction process, and searching an answer based on the question;

when the question-answering system identifies a question without a main language in the multi-round dialogue process, the system automatically takes the last round of stored entity in the dialogue entity library as the main language of the question, and returns a question answer by using a knowledge graph question-answering algorithm; the knowledge graph algorithm is used for searching a question answer by linking a question entity to a relevant node corresponding to the knowledge graph and linking a question relation to a relevant relation edge of the graph and generating a query path based on the node and the relation edge;

the system automatically saves dialogue records with the user in the multi-round dialogue process, when the identified question only contains the subject, the question answering system takes the latest questions in the history question library, replaces the question entities with the new input subjects of the user, and invokes the knowledge graph question answering algorithm to return answers after the questions are updated.

The invention also discloses a knowledge graph-based multi-round question-answering system, which comprises:

the entity recommending module is used for carrying out semantic similarity comparison on the entity input by the identification user and the knowledge graph entity node, and returning the most similar node name; when a user inputs a keyword, the question-answering system recommends an entity for the user based on a knowledge graph base, and guides the user to select an interested target;

the question recommending module of the entity is used for recommending the related questions of the entity for the user based on a historical question-answering library and a knowledge graph library after the user definitely focuses on the entity in the multi-round interaction process of the user and the system, so as to assist the user to further definitely ask for the intention and direction;

and the answer retrieval module is used for aiming at different question types proposed by a user in a multi-round interactive dialogue, and the question-answering system adopts different question-answering strategies.

Due to the adoption of the technical scheme, the invention has the following advantages:

1. compared with the traditional knowledge graph question-answering method, the invention provides a multi-round interactive question-answering method based on the knowledge graph, aims at the key words input by the user as user recommending entities, gradually guides the user to clearly ask the main body, and finally returns the attention content of the user, thereby improving the recall rate and the effectiveness of the question-answering system.

2. The invention provides a problem recommendation method based on a user attention entity, which searches and generates a plurality of types of problems related to the user attention entity from a history question-answering library, a hot spot question library and a knowledge graph library respectively, recommends and returns answers to the user, guides the user to interact for a plurality of rounds, and improves the user experience.

3. The multi-round interactive question-answering method provided by the invention has compatibility, not only has multi-round question-answering capability, but also has single-round question-answering capability, and the single-round question-answering function is not affected at any stage of multi-round interaction, namely, a user can input complete questions at any link of interaction, and the system has the capability of returning answers to the questions on the premise of having answers.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments described in the embodiments of the present invention, and other drawings may be obtained according to these drawings for those skilled in the art.

FIG. 1 is a knowledge-graph-based multi-round interactive question-answering flow chart in an embodiment of the invention;

FIG. 2 is a schematic diagram of a recommendation flow of a user attention entity according to an embodiment of the present invention;

fig. 3 is a schematic diagram of a problem recommendation flow based on a user attention entity according to an embodiment of the present invention.

Detailed Description

The present invention will be further described with reference to the accompanying drawings and examples, wherein the examples are shown only in a partial, but not in all embodiments of the invention. All other embodiments obtained by those skilled in the art are intended to fall within the scope of the embodiments of the present invention.

Referring to fig. 1, the invention provides an embodiment of a knowledge-graph-based multi-round question-answering method, which mainly comprises three processes: user attention entity recommendation, question recommendation based on user attention entity, multi-round interactive answer retrieval.

The user pays attention to the entity recommendation: by identifying the entity input by the user and the knowledge-graph entity node to perform semantic similarity comparison, returning the most similar node name, and when the user inputs a keyword, the question-answering system can recommend the entity for the user based on the knowledge-graph base, so as to guide the user to select the interested target.

The problem recommendation based on the user attention entity comprises problem recommendation based on a history question-answer library, a hot spot question library and a knowledge graph, and according to the user attention entity, accurate retrieval is carried out in the history question-answer library and the hot spot question library respectively, and the user history question and the hot spot question related to the entity are returned to be used as the problem recommendation; the problem recommendation based on the knowledge graph is to associate an entity focused by a user to a certain node in the knowledge graph, search the related triples of the node, and generate a plurality of types of problems focused by the user as recommendation by using a model; the user is guided to refine the explicit question intention step by step.

The multi-round interactive answer retrieval provides different question conversion strategies for different types of questions to be asked, the questions to be asked are converted into complete questions by means of pronoun replacement, question subject replacement and the like, and then answers are retrieved by means of a knowledge graph question-answering algorithm, so that multi-round interactive questions and answers with users are achieved.

The following is specifically set forth for each process:

s1, paying attention to entity recommendation by a user.

Referring to fig. 2, when a user inputs an incomplete question or an individual keyword, recommending related entities for the user based on the keyword is proposed, and the actual factors of the question are gradually completed through interactive question-answering, so that the user inquiry subject and the inquiry intention are defined. The method comprises the following specific steps:

s11: aiming at a character string S input by a user, judging whether the character string S is a complete question from the length of the character string S and whether the character string S contains an entity and an entity relation; if the character string S contains an entity and an entity relation, the character string S is a complete question, and the question-answering system directly calls a knowledge graph question-answering algorithm to return an answer; if the character string S is not a complete question, keyword segmentation and entity recognition processing are required to be carried out on the character string S; usually, the user asks questions with different keywords spaced by special symbols such as space, pause number, etc., for example, "moon cake in the mid-autumn festival", so that different keywords need to be divided according to the special symbols, and S may be divided into S ₁ ,s ₂ ,s ₃ The method comprises the steps of carrying out a first treatment on the surface of the Recall named entity recognition algorithm to identify the entity ner contained in the keyword ₁ ,ner ₂ ,ner ₃ ；

S12: and recommending the entity of interest to the user through the knowledge graph data based on the plurality of entities returned in the step S11. Traversing knowledge-graph entity node _i Respectively carrying out similar comparison with each entity, and utilizing a BERT model to node the map nodes _i And entity ner _j J epsilon (1, 2, 3) vectorization representation, and vectorization results are x= { x respectively ₁ ,...,x _n Sum y= { y ₁ ,...,y _n And selecting a proper similarity calculation method to measure the similarity value among vectors, wherein the optional text vector similarity calculation method comprises the following steps:

cosine similarity:

euclidean distance:

pearson correlation coefficient:

s13: sorting the similar results of all the traversals of S12, manually setting a similar threshold value, and screening out the similar result and the entity ner ₁ ,er ₂ ,er ₃ Similarity valueAnd returning all nodes larger than the threshold value by taking top10 node names as user attention entities, and returning all nodes if the number of the nodes is less than 10.

S14: s13, the top10 node name returned by the system is an entity related to the content input by the user, and the system recommends the top10 node name as a user attention entity so that the user can clearly inquire the main body; the user can select to continue to carry out multi-round interactive questioning on the returned entity, and other keywords or complete questions can also be input to carry out questioning; if other keywords are input, the question-answering system continues to recommend related entities; if a complete question is entered, the system may return the answer to the question directly.

S2, recommending the problems based on the user attention entity.

In the process of multi-round interaction between the user and the system, after the user definitely focuses on the entity, the question-answering system recommends the related problems of the entity for the user based on a historical question-answering library and a knowledge graph library, and the user is assisted to further definitely ask for the intention and the direction. Referring to fig. 3, the specific steps are as follows.

S21: questions are recommended based on a historical question-answering library. The question-answering system records the interactive dialogue process of all users and the system, including user ID, question time, user question content, answer returned by the system, user feedback modification and other information; the system stores answers or the answer pairs of the questions after user feedback modification in a historical question-answer library, and supports user ID retrieval, keyword retrieval, time retrieval and the like. The user history question-answering library records the normal query direction, interested content or target of the user, so that in the multi-round interaction process, the user ID and the user interested entity name are used as keywords to accurately search the history question-answering library, the similarity is ordered, the searched questions are top2 used as a question recommendation list RQ1, and if the search is empty, the RQ1 returns to the empty list.

S22: based on hotspot problem base recommendations. Aiming at the questions, the question answering system counts the questions with higher user question frequency according to the time period including the last week, the last month and the last three months as hot spot questions, and stores the hot spot questions in a hot spot question library, wherein the hot spot question library can reflect the recent attention content of the user. In the interactive dialogue process, after a user inquires a main body, the main body name is used as a keyword, hot spot problems related to the main body are accurately searched based on a hot spot problem library, the problems of nearly one week, nearly one month and nearly three months top1 are respectively selected as a problem recommendation list RQ2 according to different time periods, and if the search is empty, the RQ2 returns to an empty list.

S23: generating problem recommendation based on the knowledge graph. After the user pays attention to the entity, the user can inquire whether the knowledge graph base contains the entity information or not based on the existing knowledge graph base; if not, returning the problem recommendation list RQ3 to be empty; if the knowledge graph base has the entity information, correlating the user concerned entity with the graph node, and inquiring all correlated triplet information centering on the graph node; the method comprises the steps that a BERT-based pre-training model is utilized, triple information and a user attention entity are taken as input, the model outputs a question list taking the user attention entity as a question main body, and the question list is returned as a recommendation list RQ 3;

s24: in the multi-round interaction process, the problem list recommended to the user based on the user attention entity is finally set to be RQ, RQ=RQ1+RQ2+RQ3, namely, related problems are recommended to the user based on a history question-answering library, a hot spot question library and a knowledge graph library. The user can select the questions recommended by the system to continue to carry out interactive questioning, and the system returns answers; after the system returns the answer, the multi-round interaction process is not terminated, and the user can select other recommended questions to ask questions or input other keywords or questions. The system does not return the answer to the question as a termination flag for the interactive question, but rather is user-decided to choose to terminate the interactive session at any link.

S3, searching multiple rounds of interaction answers.

In the multi-round interactive dialogue, a user usually inquires about a question in order to acquire or know more information, and the inquired question is usually incomplete, lacks a main body, a main body relation or contains a pronoun and the like, and the answer can be searched only by supplementing the question completely according to the multi-round interactive dialogue scene, so that a plurality of types of frequently inquired question types are summarized, and a question-answering system adopts different question-answering strategies aiming at different question types.

The term "inquiry" refers to the fact that the subject is "it, he, she or they," etc., and other structurally complete questions such as "what school he graduations? "," which manufacturer produced? "etc. In a multi-round interactive dialogue scene, the following inquiry containing the index word needs to be combined with dialogue context, and the index word is replaced by a user inquiry main body, namely, a complete question. Therefore, each round of dialogue of the user is saved, the entity of each round of dialogue is saved in a dialogue entity library, the dialogue entity library is updated along with dialogue content of the user, when a reference word appears in a question sentence in the interaction process, the question answering system automatically replaces the reference word with the entity name appearing in the previous question sentence, and answers are searched based on the question. For example, the user first-round questions: "work units of the King? After the system returns the answer, the user asks the question for the second time: "what school he is graduation? ", the question-answering system will replace" he "with" king ", i.e." what school is the king graduation? Calling the knowledge graph question-answering algorithm by the complete question, and returning an answer; the knowledge graph algorithm is to search out answers to questions by linking the question entities to relevant nodes corresponding to the knowledge graph and linking the question relations to relevant relation edges of the graph and generating query paths based on the nodes and the relation edges.

A no subject challenge refers to a question that contains no subject, but only predicates or objects, such as "what activity features are? ", what causes? "etc. The main-language-free inquiry sentence is the same as the main-language-free inquiry sentence in referring to the secondary inquiry sentence, and in a specific dialogue scene, the main-language-free inquiry sentence body is the main body of a dialogue which is the last round or starts to mention with a user interaction dialogue of the system. Therefore, when the question-answering system identifies no question of the main language in the multi-round dialogue process, the system automatically takes the last round of saved entity in the dialogue entity library as the main language of the question, and returns the question answer by using a knowledge graph question-answering algorithm. For example, the user first asks "what is the cause of the formation of a sand storm? ", the system returns an answer, and the user asks the second round: "what season happens? "at this time, the question answering system supplements the latest entity" sand storm "in the dialogue entity library as a subject, and uses the question" what season the sand storm occurs? And searching answers in the knowledge graph base, and finally returning.

Subject-pursuit refers to a question that contains only subjects and no predicates or objects, such as "Zhang Sanyi? "," 2022? "etc. In a multi-round interactive dialogue, the question is only the scene of the subject, and is usually the same situation that the user wants to ask another subject when asking the previous round, the question is a complete sentence. Therefore, the system automatically saves the dialogue record with the user in the multi-round dialogue process, when the identified question only contains the subject, the question answering system takes the latest questions in the history question library, replaces the question entity with the newly input subject of the user, and after the questions are updated, invokes the knowledge graph question answering algorithm to return answers. For example, the user first-round questions: "is the national celebration holiday put for several days in 2022? After the answer is returned by the question-answering system, the user asks questions for the second round: "2023? "when only subjects are present to ask questions, the question-answering system will automatically take the latest questions from the historical question bank, i.e." is the national celebration holiday of 2022 put for several days? "replace the question subject" 2022 nd year "with" 2023 nd year ", and the question is updated to" 2023 nd national celebration vacation days? And calling a map question-answering algorithm to return an answer.

Finally, it should be noted that: the above embodiments are only for illustrating the technical aspects of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the above embodiments, it should be understood by those of ordinary skill in the art that: modifications and equivalents may be made to the specific embodiments of the invention without departing from the spirit and scope of the invention, which is intended to be covered by the claims.

Claims

1. The multi-round question answering method based on the knowledge graph is characterized by comprising the following steps of:

2. The method according to claim 1, wherein the step 1 comprises:

step 13: sorting the similar results of all traversals of step 12, and screening out the entity ner ₁ ,ner ₂ ,ner ₃ All nodes with similarity values larger than a preset threshold value return by taking the first K node names as user attention entities, and if the number of the nodes is less than K, returning all the nodes;

3. The method according to claim 2, wherein the step 11 comprises:

4. The method according to claim 1, wherein the step 2 comprises:

5. The method according to claim 1, wherein the step 2 specifically comprises:

6. The method according to claim 5, wherein the step 2 further specifically comprises:

7. The method according to claim 6, wherein the step 2 further specifically comprises:

8. The method according to claim 7, wherein the step 2 further specifically includes:

9. The method according to claim 8, wherein the step 3 comprises:

10. A knowledge-graph-based multi-round question-answering system, comprising: