CN116186228A

CN116186228A - Complex knowledge base question-answering method and system based on deep semantic analysis

Info

Publication number: CN116186228A
Application number: CN202310231722.7A
Authority: CN
Inventors: 杜振东; 王清琛
Original assignee: Nanjing Yunwen Network Technology Co ltd
Current assignee: Nanjing Yunwen Network Technology Co ltd
Priority date: 2023-03-10
Filing date: 2023-03-10
Publication date: 2023-05-30

Abstract

The invention provides a complex knowledge base question-answering method and system based on deep semantic analysis, wherein the method comprises the following steps: entity identification is carried out on the problems input by the user, and an entity is obtained; identifying a query target and a query condition of the problem to obtain the query target, the query condition and a query condition value; carrying out multi-hop path identification on the problem to obtain a multi-hop path; judging the problem types of the obtained entities, the query targets, the query conditions, the query condition values and the multi-hop paths, and generating a graph query statement through a graph query statement module according to the judging results; executing the graph query statement on the graph database, analyzing the query result to generate an answer, and outputting the answer in a Jason format. The method provided by the invention can be used for identifying the entity, the query target, the condition and the condition value in the problem through the classification model, avoiding the disambiguation chain finger problem existing in the extraction model, and effectively solving the questions and the answers of single-entity multi-attribute, multi-condition constraint, comparative reasoning and multi-hop complex problems.

Description

Complex knowledge base question-answering method and system based on deep semantic analysis

Technical Field

The invention relates to the technical field of natural language processing, in particular to a complex knowledge base question-answering method and system based on deep semantic analysis.

Background

The existing knowledge graph question-answering system generates a structured query sentence by carrying out semantic analysis on a user input query (question), selects a plurality of entity or attribute values from a given knowledge base as answers to the question, and has a better effect on simple sentences (single entity single attribute), but has a better effect on constraint sentences: conditional constraint sentences, time constraint sentences and inference type question sentences: the logical reasoning capability of the comparison sentence, the most valued sentence, the question of whether the type exists in the complex sentences with intersection, union and negation in the question is still to be improved.

In order to promote the semantic analysis performance support of a knowledge graph question-answering system on complex sentences, the invention provides a complex knowledge base question-answering method based on deep semantic analysis, namely ComplexKBQA (complex knowledge graph intelligent question-answering method), which is used for analyzing single-entity multi-attribute problems, condition constraint problems (only equal supporting conditions at present), comparison type problems (consistent comparing size and contrast), maximum type problems and multi-hop problems, generating graph query sentences and executing sentences to return answers.

Disclosure of Invention

The invention aims to provide a complex knowledge base question-answering method based on deep semantic analysis, which is a complex KBQA method based on deep semantic analysis and is used for analyzing complex sentence questions, generating graph query sentences and executing sentences to return answers, thereby improving the semantic analysis performance of complex sentences.

According to a first aspect of the present invention, a complex knowledge base question-answering method based on deep semantic parsing is provided, including:

step 1, carrying out entity identification on a problem (query) input by a user to obtain an entity;

step 2, identifying the query targets and the query conditions of the problems to obtain the query targets, the query conditions and the query condition values;

step 3, identifying the multi-hop path of the problem to obtain the multi-hop path;

step 4, judging the problem types of the obtained entity, the query target, the query condition value and the multi-hop path, and generating a graph query statement through a graph query statement module according to the judging result;

and step 5, executing the graph query statement on the graph database, analyzing the query result to generate an answer, and outputting the answer in a Jason format.

Preferably, in the foregoing step 1, performing entity recognition on the problem input by the user includes:

extracting synonyms of the entity by using an entity extraction model, and selecting the entity if the extracted synonyms hit directly in a synonym dictionary;

if the extracted synonyms are not in the synonym dictionary, entity matching is carried out, the first 30 candidate entity synonyms are screened out through a K nearest neighbor algorithm, and matching calculation is carried out by utilizing a matching model;

if the threshold value of the matching result is larger than 0.5, selecting the entity with the largest vote number by adopting a voting mechanism;

and if the threshold value of the matching result is smaller than 0.5, selecting the entity with the largest matching probability in the matching result.

Preferably, in the foregoing step 2, the query target and the query condition identification for the problem include:

and identifying the problems by using the query targets and the query condition identification model to obtain the query targets, the query conditions and the query condition values.

Preferably, in the foregoing step 3, performing multi-hop path recognition on the problem includes:

and carrying out K neighbor calculation on the problem by using a prediction recognition model to obtain candidate path hops, and carrying out path sequencing on the candidate path hops to obtain a multi-hop path.

Preferably, in the step 4, the obtained entity, the query target, the query condition value and the multi-hop path are subjected to problem type discrimination, wherein the problem type comprises a single entity multi-attribute problem, a condition constraint problem, a comparison class problem, a maximum class problem and a multi-hop problem;

the distinguishing the single-entity multi-attribute problem comprises the following steps:

if the problem is analyzed in the step 1 and the step 2 and the result is only a single entity and the query target is judged to be a relationship through the map schema, the problem is judged to be a relationship searching entity, and a map query statement is generated through a relationship query map query statement generating module;

if the results of the analysis of the problems in the step 1 and the step 2 have the attributes with the number more than or equal to 1 and the query targets are judged to be multi-attribute through the map schema, the judgment of the problems is multi-attribute query, and a multi-attribute map query statement generation module is used for generating a map query statement.

Preferably, discriminating the multi-hop problem includes:

if the length of the multi-hop path in the analyzed result of the step 1 and the step 2 exceeds 2 and the number of attributes of the unconditionally constrained multi-hop path end relation nodes or the multi-hop path end nodes is more than or equal to 1, judging that the problem is a multi-hop problem, and generating a graph query statement through a multi-hop problem graph query statement module;

and if the length of the multi-hop path in the analysis result of the step 1 and the step 2 is not more than 2 and the number of attributes of the unconditionally constrained multi-hop path end relation nodes or the multi-hop path end nodes is less than 1, judging other problem types.

Preferably, discriminating the conditional constraint problem includes:

if the result analyzed by the step 1 and the step 2 has an entity, a constraint condition value and a multi-jump path, judging that the problem is a multi-condition constraint problem, wherein the default condition values are equally constrained and all act on the entity at the tail end of the path, and generating a graph query statement through a condition constraint class graph query statement generating module;

if the problem is analyzed in the step 1 and the step 2 and does not have an entity, a constraint condition value and a multi-hop path, the comparison type problem and the maximum value type problem are judged.

Preferably, discriminating the comparison type problem includes:

according to a pre-defined comparison type dictionary, performing similarity calculation on keys and problems in the dictionary, selecting the type with the largest similarity as the type of the problem according to a calculation result, and generating a graph query statement through a type graph query statement generation module.

Preferably, discriminating the maximum class problem includes:

pre-defining Min and Max type dictionaries, and carrying out Min and Max constraint matching calculation on the problems based on a fuzzy matching algorithm of a multi-layer sliding window;

if the matching result is constrained on the condition, updating the condition and the condition value, and generating a graph query statement through a graph query statement generating module;

and if the matching result is constrained on the query target, performing the maximum filtering of the query target, and generating a graph query statement through the graph query statement generating module.

According to a second aspect of the object of the present invention, there is also provided a computer system comprising: one or more processors, and memory; the memory is configured to store instructions that are operable to cause the one or more computers to perform operations comprising the aforementioned flow of complex knowledge base question-answering methods based on deep semantic parsing.

Compared with the prior art, the complex knowledge base question-answering method based on deep semantic analysis has the following advantages:

the method of the invention identifies the entity, the query target, the condition and the condition value in the problem through the classification model, avoids the disambiguation chain finger problem existing in the extraction model, and simultaneously, the query target, the condition and the condition value class adopts a joint modeling mode, thereby not only reducing error propagation among different tasks, but also greatly reducing model reasoning time, improving automatic extraction efficiency, and effectively solving the question and answer of single entity multi-attribute, multi-condition constraint, comparative reasoning and multi-jump complex problems.

Meanwhile, the method provided by the invention adopts the multi-classification problem based on "span (inline element, in-line label of hypertext markup language)" for two tasks of condition and condition value class, and the label of constraint attribute value contained in each "span" is the corresponding constraint attribute classification name, so that the reusability of the model characterization part is improved, and the one-to-one correspondence of constraint condition and constraint condition value is realized.

It should be understood that all combinations of the foregoing concepts, as well as additional concepts described in more detail below, may be considered a part of the inventive subject matter of the present disclosure as long as such concepts are not mutually inconsistent. In addition, all combinations of claimed subject matter are considered part of the disclosed inventive subject matter.

The foregoing and other aspects, embodiments, and features of the present teachings will be more fully understood from the following description, taken together with the accompanying drawings. Other additional aspects of the invention, such as features and/or advantages of the exemplary embodiments, will be apparent from the description which follows, or may be learned by practice of the embodiments according to the teachings of the invention.

Drawings

The drawings are not intended to be drawn to scale. In the drawings, each identical or nearly identical component that is illustrated in various figures may be represented by a like numeral. For purposes of clarity, not every component may be labeled in every drawing. Embodiments of various aspects of the invention will now be described, by way of example, with reference to the accompanying drawings.

FIG. 1 is a flow diagram of a complex knowledge base question-answering method based on deep semantic parsing provided in one embodiment of the present invention;

FIG. 2 is a schematic diagram of an entity identification process provided in one embodiment of the present invention;

FIG. 3 is a diagram of query targets, query terms, and query term value model identification provided in one embodiment of the invention;

FIG. 4 is a schematic flow diagram of a predictive identification and path ordering model provided in one embodiment of the invention;

FIG. 5 is a schematic diagram of a single entity multi-attribute problem map query statement generation flow provided in one embodiment of the invention;

FIG. 6 is a schematic diagram of a conditional constraint problem map query statement generation flow provided in one embodiment of the invention;

FIG. 7 is a schematic diagram of a comparison class problem map query statement generation flow provided in one embodiment of the invention;

FIG. 8 is a schematic diagram of a flow chart for generating a query statement of a problem map provided in one embodiment of the invention;

FIG. 9 is a schematic diagram of a multi-hop problem map query statement generation flow provided in one embodiment of the invention.

Detailed Description

For a better understanding of the technical content of the present invention, specific examples are set forth below, along with the accompanying drawings.

Aspects of the invention are described in this disclosure with reference to the drawings, in which are shown a number of illustrative embodiments. The embodiments of the present disclosure are not necessarily intended to include all aspects of the invention. It should be understood that the various concepts and embodiments described above, as well as those described in more detail below, may be implemented in any of a number of ways, as the disclosed concepts and embodiments are not limited to any implementation. Additionally, some aspects of the disclosure may be used alone or in any suitable combination with other aspects of the disclosure.

According to an embodiment of the present invention, in combination with the flowchart shown in fig. 1, a complex knowledge base question-answering method based on deep semantic parsing includes:

step 1, carrying out entity identification on a problem input by a user to obtain an entity;

step 2, identifying a query target and a query condition of the problem to obtain the query target, the query condition and a query condition value;

step 3, identifying multi-hop paths for the problems to obtain multi-hop paths;

and 5, executing a graph query statement on the graph database, analyzing the query result to generate an answer, and outputting the answer in a Jason format.

Therefore, the complex knowledge base question-answering method (complex KBQA method) based on deep semantic analysis provided by the invention can be used for identifying the entity, the query target, the condition and the condition value in the problem through the classification model, so that the disambiguation chain finger problem existing in the extraction model is avoided, meanwhile, the query target, the condition and the condition value class are combined to adopt a joint modeling mode, so that the error propagation among different tasks is reduced, the model reasoning time is greatly reduced, the automatic extraction efficiency is improved, and the question-answering of single-entity multi-attribute multi-condition constraint, comparative reasoning and multi-hop complex problems are effectively solved.

Meanwhile, in some embodiments, a multi-classification problem based on "span (inline element, inline label of hypertext markup language)" can be adopted for two tasks of condition and condition value class, and the label of constraint attribute value contained in each "span" is the corresponding constraint attribute classification name, so that the reusability of the model characterization part is improved, and the one-to-one correspondence of constraint condition and constraint condition value is realized.

The practice and/or effect of certain examples of the present invention will be described in more detail below in conjunction with the flowcharts shown in fig. 2-9 and some preferred or alternative examples of the present invention.

[ entity identification ]

Referring to fig. 2, in the foregoing step 1, performing entity recognition on the problem input by the user includes:

[ query target, query condition identification ]

Referring to fig. 3, in the foregoing step 2, the query target and the query condition identification for the problem include:

identifying the problems by using the query targets and the query condition identification model to obtain query targets, query conditions and query condition values;

the query target identifies the attribute and relation of the query in the query (problem) by adopting a multi-label classification mode (namely query target classification).

Taking the telecom operator scenario as an example, for the question (query) "tell me how open a 5g video member traffic package for 7 days, how much? "query targets that can identify a problem (query) by query target multi-tag classification prediction are: [ opening mode, price ].

Further, the query condition comprises constraint attribute classification, the query condition value comprises constraint attribute value extraction, classification constraint is extracted based on span, and the label of the constraint attribute value is a constraint attribute classification name, so that the query condition and the query condition value are in one-to-one correspondence.

Taking a telecom operator scene as an example, for a problem (query) 'hello, 2 Yuan package personalized screen display service before I expires, want to go on handling', and the query target and the query condition identification model identify and predict to obtain the query target of the problem (query) as follows: [ opening mode ], the query conditions are: [ price, sub-business ], query condition values are: [2, month pack ].

It should be further noted that, in this embodiment, the query target classification, constraint attribute classification and constraint attribute value extraction are obtained by performing joint training through a BERT (language representation model containing rich semantic information for obtaining text by using large-scale non-labeling corpus training) model.

Preferably, the query target and the query condition recognition provided in the present embodiment analyze all constraint conditions, and replace the intent recognition of the conventional method by multiple classifications of the query target, and it should be noted that the conventional intent recognition has the following drawbacks: the method has various intentions, the accuracy of intent classification and modeling difficulty are high, and classification and identification are carried out in a 0-1 classification mode, so that the method extracts classification constraints based on span (inline elements, inline labels of hypertext markup language), improves classification labels, innovates label types, and reduces defects of chain fingers and modeling.

Specifically, according to the embodiment, the constraint attribute value labels contained in each span are the corresponding constraint attribute classification names of the constraint attribute values, so that the model representation part reusability is improved, and the one-to-one correspondence between the constraint conditions and the constraint condition values is realized.

[ Multi-hop Path identification ]

Referring to fig. 4, in the foregoing step 3, performing multi-hop path recognition on the problem includes:

and performing K neighbor calculation on the problem by using a prediction (predicate) recognition model to obtain candidate path hops, and performing path sequencing on the candidate path hops to obtain a multi-hop path (path).

[ Generation of a graph query statement ]

In the step 4, the obtained entity, query target, query condition value and multi-hop path are subjected to problem type discrimination, wherein the problem type comprises single entity multi-attribute problems, condition constraint problems, comparison problems, maximum value problems and multi-hop problems.

(1) In connection with fig. 5, distinguishing single-entity multi-attribute problems includes:

if the result of the analysis of the step 1 and the step 2 is only a single entity and the query target is judged to be a relationship through the graph schema (which is equivalent to a data model in a field and comprises meaningful concept types and attributes of the types in the field), the judgment problem is a relationship searching entity, and a graph query statement is generated through a relationship query graph query statement generating module;

(2) Referring to fig. 6, the discriminant condition constraint problem includes:

(3) Referring to fig. 7, distinguishing comparison type problems includes:

according to a pre-defined comparison type dictionary, performing similarity calculation on keys in the dictionary and the problems, selecting the type with the maximum similarity as the type of the problems according to a calculation result, and generating a graph query statement through a type graph query statement generation module.

Preferably, the comparison types which are compared and supported in the embodiment are three types of greater than, less than and equal to, and if the judged problem type is greater than, a graph query statement is generated by a greater than class graph query statement generating module;

if the judged problem type is less than the class, generating a graph query statement through a less than class graph query statement generating module;

if the judged problem type is equal, generating a graph query statement through an equal class graph query statement generation module.

(4) Referring to fig. 8, discriminating the most significant class problem includes:

(5) In connection with fig. 9, distinguishing the multi-hop problem includes:

[ Generation of answers to questions ]

Executing the graph query statement on the graph database, analyzing the query result to generate an answer, and outputting the answer in a Jason format.

Preferably, the method of the invention effectively avoids the problem of disambiguation chain fingers existing in entity extraction models by identifying the entity, the query target, the condition and the condition value in the query (problem) through the identification model, and simultaneously, the query target, the condition and the condition value class adopt a joint modeling mode, thereby not only reducing error propagation among different tasks, but also greatly reducing model reasoning time, improving automatic extraction efficiency, and effectively solving the problems of single entity multi-attribute, multi-condition constraint, comparative reasoning and multi-hop complex problems.

While the invention has been described with reference to preferred embodiments, it is not intended to be limiting. Those skilled in the art will appreciate that various modifications and adaptations can be made without departing from the spirit and scope of the present invention. Accordingly, the scope of the invention is defined by the appended claims.

Claims

1. A complex knowledge base question-answering method based on deep semantic analysis is characterized by comprising the following steps:

2. The complex knowledge base question-answering method based on deep semantic parsing according to claim 1, wherein in the aforementioned step 1, performing entity recognition on the question inputted by the user comprises:

3. The complex knowledge base question-answering method based on deep semantic parsing according to claim 1, wherein in the step 2, the query target and the query condition identification of the question include:

4. The complex knowledge base question-answering method based on deep semantic parsing according to claim 1, wherein in the step 3, the multi-hop path recognition of the question includes:

5. The complex knowledge base question-answering method based on deep semantic parsing according to any one of claims 1-4, wherein in the step 4, question types including single-entity multi-attribute questions, condition constraint questions, comparison questions, maximum value questions and multi-hop questions are distinguished for the obtained entities, the query targets, the query conditions, the query condition values and the multi-hop paths;

6. The deep semantic parsing based complex knowledge base question-answering method according to claim 5, wherein discriminating the multi-hop problem comprises:

7. The deep semantic parsing based complex knowledge base question-answering method according to claim 5, wherein discriminating the conditional constraint problems comprises:

8. The deep semantic parsing based complex knowledge base question-answering method according to claim 5, wherein discriminating the comparison type question comprises:

9. The deep semantic parsing based complex knowledge base question-answering method according to claim 5, wherein discriminating the most valued class of questions comprises:

10. A computer system, comprising:

one or more processors;

a memory storing instructions operable to cause the one or more computers to perform operations comprising the flow of the method of any one of claims 1-9.