CN115238101B

CN115238101B - Multi-engine intelligent question-answering system oriented to multi-type knowledge base

Info

Publication number: CN115238101B
Application number: CN202211165513.9A
Authority: CN
Inventors: 李春豹; 崔莹; 代翔; 戴礼灿; 刘鑫; 雋兆波; 杨露
Original assignee: CETC 10 Research Institute
Current assignee: CETC 10 Research Institute
Priority date: 2022-09-23
Filing date: 2022-09-23
Publication date: 2023-01-03
Anticipated expiration: 2042-09-23
Also published as: CN115238101A

Abstract

The invention discloses a multi-engine intelligent question-answering system oriented to a multi-type knowledge base, belonging to the field of multi-engine intelligent question-answering system construction and comprising the following steps: the system comprises a complex question understanding module, a man-machine multi-wheel interaction module, an intelligent question-answering engine module and a multi-source answer fusion module. The invention solves the problems of incomplete user problem understanding, single data knowledge source type, difficult multi-source answer fusion, deficient multi-round interactive question answering capability, difficult answer information verification and the like in the conventional question answering system so as to realize more convenient, more reliable, comprehensive, efficient and accurate information acquisition.

Description

Multi-engine intelligent question-answering system oriented to multi-type knowledge base

Technical Field

The invention relates to the technical field of multi-engine intelligent question-answering system construction, in particular to a multi-engine intelligent question-answering system oriented to a multi-type knowledge base.

Background

With the rapid development of information technology, massive internet information continuously emerges, and data styles and data types become various, so that the difficulty in obtaining user information is increased while richer and diversified information is brought to internet users. At present, the conventional retrieval method still depends on a keyword matching method for obtaining information, the problems of word errors, description word ambiguity, inaccurate and incomplete word use and the like still exist when natural language text is input, the retrieval result is to provide an article with the highest matching degree instead of directly providing a definite answer wanted by a user, and the user still needs manual screening, reading, answer confirmation and other processes after obtaining a recommended article list, and is time-consuming and labor-consuming.

In order to solve the problems and improve the efficiency of accurately acquiring information by a user, an intelligent question-answering system is developed. The existing typical intelligent question-answering system comprises five types: firstly, the E-business customer service mainly uses an intelligent customer service robot as a main body, can answer common questions about products and services of a user, improves user experience, and cannot answer user questions beyond a question bank; secondly, emotional accompany, such as Microsoft ice can realize chatting, fun and open topic chatting with a user by dialog context modeling (a specific background continuously interacts instead of a combination of a plurality of unrelated questions), carry out spirit accompany, emotional consortium and psychological persuasion on human beings, and only support chatting and accompany of a specific domain at present; thirdly, the virtual assistant mainly takes apple Siri, mystery, google Now, outgoing questions and the like as the main part, assists the user to complete tasks such as weather inquiry, taxi taking, meal ordering, ticket booking, schedule reminding, teaching assistance and the like, takes the task as the guide, and cannot answer when the task range is exceeded; fourthly, the retrieval type question-answering is mainly based on Baidu and Google search engines, can answer various questions provided by users, gives clear answers, answer sources and retrieval results, and can only realize single-round interaction or basically no (weak) interaction; fifthly, by integrating question answering, the IBM watson system, the Wolfram Alpha system and other systems can answer the facts and the problems needing calculation and logical reasoning, which are provided by the user in the natural language form, and assist the user in quickly and effectively completing decisions.

For specific scene data, the existing intelligent comprehensive question-answering system based on a natural language text question-answering mode has achieved some effects, but certain gaps still exist between the existing intelligent comprehensive question-answering system and actual use, and the intelligent comprehensive question-answering system is mainly embodied in the following five aspects. First, complex problems are not clearly understood. When a user inputs a problem, situations of incomplete description, redundant description, errors in words or grammar and the like may exist, most of the existing systems do not perform targeted processing, so that problem understanding deviation or errors are caused, wrong answers are returned, and user experience is reduced. Second, existing systems are mostly designed for a particular type of data. For example, most of the intelligent customer services store frequently asked questions of users in a form of question-answer pairs, and the questions input by the users are matched with the questions of the existing question-answer pairs and then the answers corresponding to the questions with the highest similarity score are returned. In addition, there are a series of question-answering systems proposed for knowledge maps, database tables, or free documents, respectively, which support only question-answering of the same type of knowledge. However, data knowledge in real scenarios is often represented in a variety of forms. Third, the multi-source answers cannot be fused. When a user obtains answers from multiple types of data sources, each data source may obtain one answer, so that the answer of how to fuse the multiple sources still remains to be solved. Fourth, only a single round of question-answering is supported, lacking multi-round interaction capability. Most of the existing systems finish answer acquisition aiming at the current questions input by users, and when multiple questions are asked aiming at the same theme, partial prepositive information is lacked in the current questions, so that the existing systems cannot acquire the answers correctly. In addition, when the user inputs the question description incompletely, the existing system basically has no interactive question-answering capability, and the answer acquisition deviation or error is caused. Finally, the source of the answer is not traceable, and the answer acquisition process is invisible to the user. The existing intelligent question-answering system only returns the answer of the question input by the user, and the user cannot verify whether the obtained answer is correct, namely the source of the answer is not traceable. Therefore, in order to improve the interpretability of the answer obtaining process of the question-answering system, the accuracy and comprehensiveness of the answer returned by the user through more intuitive and more convenient analysis can be realized by visualizing the answer obtaining process.

Disclosure of Invention

The invention aims to overcome the defects of the prior art, provides a multi-engine intelligent question-answering system for a multi-type knowledge base, and solves the problems of incomplete user problem understanding, single data knowledge source type, difficulty in multi-source answer fusion, lack of multi-round interactive question-answering capability, difficulty in answer information verification and the like in the conventional question-answering system so as to realize more convenient, more reliable, comprehensive, efficient and accurate information acquisition.

The purpose of the invention is realized by the following scheme:

a multi-engine intelligent question-answering system facing a multi-type knowledge base comprises a complex question understanding module, a man-machine multi-wheel interaction module, an intelligent question-answering engine module and a multi-source answer fusion module;

the complex problem understanding module is used for identifying and classifying the real intentions of the input problems, correcting and complementing the input problems, and outputting the corrected problems, problem intention identification and classification results and problem core information;

aiming at the corrected questions output by the complex question understanding module, the man-machine multi-round interaction module completes multi-round conversations of the user and the question-answering system by combining the subjects to which the corrected questions belong, the corresponding subject rule template and the man-machine conversation management strategy based on deep reinforcement learning, and completes answer searching by combining a multi-question-answering engine;

the intelligent question-answering engine module is used for respectively designing corresponding question-answering engines aiming at a question-answering pair library, a document library, a knowledge graph and a database table, taking the data of a business question-answering knowledge base, a corrected question, a question intention classification result and question core information as input, and outputting one or more candidate answers corresponding to the user question, answer confidence degrees and answer sources thereof;

and the multi-source answer fusion module is used for distributing each answer weight by utilizing a candidate answer credibility evaluation model aiming at the candidate answers, the answer confidences and the answer source sets output by the question-answering engine to complete the acquisition of the final answers and the sources thereof.

Further, the method comprises an answer obtaining process visualization module and an answer evaluation feedback and training optimization module;

the answer acquisition process visualization module is used for displaying the intermediate result understandable by the user to the user in a visualization form and displaying the record in the document, the knowledge graph or the database table corresponding to the answer for the user to verify and confirm the given answer;

the answer evaluation feedback and training optimization module is used for the user to evaluate and score the given answer and correct and feed back the wrong answer, directly stores the result fed back by the user into the question-answer pair library, and simultaneously organizes the historical questions, the answers and the sources thereof fed back by the user into training corpora to perform timed or quantitative training optimization on the question-answer model.

Further, the complex problem understanding module includes sub-modules: the system comprises a language identification module, a voice identification module, an error correction module, a reference resolution module, a core information extraction module, a recommendation module and a classification module;

the language identification module is used for identifying the language aiming at the problem of voice input of a user;

the voice recognition module is used for transcribing the voice problem of the specific language into Chinese natural language description and transcribing the voice to obtain a natural language description text problem;

the error correction module is used for carrying out word error correction, factual error correction and grammar error correction on the text obtained by input or voice transcription;

the reference resolution module is used for performing reference resolution on pronouns, tone auxiliary words and omitted entities appearing in the user input problems, and outputting the problems after the reference resolution as the problems of complex problem understanding and transcription;

the core information extraction module is used for identifying and extracting entity information, relationships among entities, keywords and phrases in the problem as the core information of the user problem;

the recommendation module is used for calculating the similarity between the user input questions and a historical question set or a typical question bank and recommending similar questions and hot questions by combining the question-asking frequency;

the classification module classifies the user question intentions aiming at the input questions of different tasks, the classified categories comprise a fact description category, an attribute query category, a data calculation category and a statistical analysis category, and the result of the question intention identification and classification is used for guiding an intelligent question-answering engine to realize the selection of a question-answering strategy.

Further, the combination of the subject to which the corrected problem belongs, the corresponding subject rule template, and the man-machine conversation management strategy based on deep reinforcement learning specifically includes: and judging the theme of the corrected problem, if the context problem does not belong to the same theme, completing theme switching, namely recording the current problem as a new problem, otherwise, judging whether the current problem needs to be completed by conversation: if the problem needs to be perfected through man-machine conversation, combining preset slot information to perform missing element analysis, generating a natural language question sentence and sending the natural language question sentence to the user, judging whether necessary slot filling is completed or not after the user responds, and repeating the step of combining the preset slot information to perform missing element analysis, generating a natural language question sentence and sending the natural language question sentence to the user if the necessary slot filling is not completed; and if the groove filling is finished, outputting a new problem after completion.

Further, in the intelligent question answering engine module,

the question-answer database question-answer engine calculates similarity with questions in the question-answer database by using rough sorting based on ES retrieval and fine sorting based on user question core information, and returns answers corresponding to the questions with the highest similarity to the user;

and/or the presence of a gas in the gas,

the question-answering engine facing the document library is designed into a two-stage network model aiming at free documents, relevant document screening is completed in the first stage based on BM25 retrieval, the longest common substring and question core information, user questions and relevant documents obtained through screening are simultaneously input into a model constructed by a pre-training model, an attention network and a pointer network in the second stage, and the starting position and the ending position of an answer are estimated to serve as answers to the questions;

and/or the presence of a gas in the gas,

the knowledge graph question-answering engine acquires answers by utilizing entity identification and relation extraction results facing user questions and utilizing a Bert-BilSTM, a CRF model and a sequence generation model query graph;

and the database table question-answering engine converts the seq2seq natural language text into SQL and generates an SQL query statement by combining a rule template to finish the accurate acquisition of the answer corresponding to the user question.

Furthermore, the candidate answer credibility evaluation model simultaneously requests question-answering engines corresponding to different intention questions by adopting a multithreading mode, and firstly returns answers meeting answer thresholds set by the corresponding question-answering engines, namely final candidate answers; or after all engines obtain answers and the credibility thereof, extracting core entities in the questions and the answers, after the core entities are separated and connected, inputting the core entities into a pre-training model together to extract features, and then inputting the core entities into an MLP network and a normalization layer to give confidence scores of all candidate answers to obtain final candidate answers;

in addition, the corrected new question and the final candidate answer can be input into the sequence model together to complete the answer generation described by the natural language: the encoder completes embedded encoding of the input text and calculates a state value as an initial state of the decoder module; calculating the output and hidden layer state of the splicing encoder by combining a Luong attention mechanism; the decoder predicts the probability distribution of the current output dictionary by combining the encoder input state, the context vector and the decoder history input, and the loss function adopts sparse normalized cross entropy loss between the logistic and the label.

Further, the intelligent question-answering engine module comprises a reading understanding question-answering engine; in a reading understanding question-answering engine: firstly, aiming at an Elastic Search library, completing retrieval based on an IK word segmentation-BM 25 algorithm aiming at a new problem after correction, and acquiring the first k pieces of sequenced document data, wherein k is an integer; then, the problem-longest common substring of the chapters and the proprietary entity in the core information are combined for re-screening to obtain the first n sequenced document data, wherein n is an integer; then, segmenting the top n sequenced documents according to 512 lengths, connecting the segmented documents with questions, and inputting the documents into a Bert Chinese pre-training model to complete paragraph screening; then, connecting the question with the screened paragraphs, inputting the question and the screened paragraphs into a Bert Chinese pre-training model and an Attention network, and extracting the joint feature representation of the question and the paragraphs; and finally, the input Pointer network layer estimates the starting position and the span of the answer in each section.

And further, inputting the corrected questions into an intelligent question-answering engine, identifying classification results by combining the question intentions, selecting one or more of four question-answering engines of question-answering versus-library question-answering, knowledge-graph question-answering, database table question-answering and reading understanding, and obtaining candidate answers, answer confidence degrees and answer sources.

Further, the service question-answer knowledge base is stored according to different types of knowledge, and the types of the knowledge base are divided into the following four types: the common/typical questions and answers in the combing field form a question-answer pair library which is stored in an Elastic Search library; aiming at structured data in the field, static data are organized into a knowledge graph, and Neo4j is adopted for storage and display; storing the dynamic data in a MySQL or Oracle structured database in a data record form; the unstructured documents and materials form a document library and are stored in an Elastic Search.

Further, the corresponding relationship between the question intention recognition classification result and the question-answering mode is as follows: the fact description type questions adopt a question-answer pair library, a knowledge graph and a reading understanding question-answer engine; the attribute query questions adopt a knowledge graph question-answer and reading understanding question-answer engine; the data calculation and statistical analysis questions adopt a database table question-answering engine.

The beneficial effects of the invention include:

(1) The invention provides convenient, efficient and accurate question answering for multi-source and multi-type user service data. An intelligent question-answering engine comprising a question-answering database question-answering engine, a document database oriented question-answering engine, a knowledge map question-answering engine and a database table question-answering engine is constructed for multi-source multi-type data in a specific field, and compatibility of the intelligent question-answering system to user data types and convenience of use of the intelligent question-answering system are improved.

(2) The present invention supports not only answers given from a single data source, but also multi-cue candidate answer fusion. Aiming at the user input problem, the answers and the confidence degrees of the answers are respectively obtained from multiple types of data sources, the final answer described by the natural language is obtained based on the sequence generation model and returned to the user, the answer combines with multiple data source clues, and the comprehensiveness and the confidence degree of the answer are improved.

(3) The invention not only supports single round of question answering, but also supports the man-machine multi-round interactive question answering capability of the same theme. The situation that the user continuously asks questions for the same task or the same theme is comprehensively considered, the daily communication habits of the user are simulated, and the completion of the user questions in the multi-round interactive question answering is realized by combining a man-machine conversation decision control strategy based on deep reinforcement learning and context conversation information.

(4) The invention can accurately acquire the real intention of the user and automatically select the corresponding question-answering engine. The conditions of missing of problem information input by a user, errors of meaning words, words or grammars and the like are comprehensively considered, the identification and classification of the real intentions of the user are completed by combining various information under the assistance of external common knowledge, and the automatic selection of the intelligent question-answering engine is realized according to the identification and classification results.

(5) The answer obtaining process supports visual display, and the answer source can be traced. The method displays the intermediate result which can be understood by the user in the answer production process, can help the user understand the answer obtaining process of the intelligent question-answering system, and also displays the answer source and the position of the answer, so that the answer obtaining process can be understood, and the given answer can be traced and confirmed.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a schematic flow chart of a multi-engine intelligent question-answering system oriented to a multi-type knowledge base according to the present invention;

FIG. 2 is a schematic flow diagram of a complex problem understanding module;

FIG. 3 is a schematic flow diagram of a human-machine multi-turn interaction module;

FIG. 4 is a schematic flow diagram of the intelligent question and answer engine module;

FIG. 5 is a schematic flow diagram of a multi-source answer fusion module;

FIG. 6 is a flowchart of the answer evaluation feedback and training optimization module.

Detailed Description

The invention is further described with reference to the following figures and examples. All features disclosed in all embodiments in this specification, or all methods or process steps implicitly disclosed, may be combined and/or expanded, or substituted, in any way, except for mutually exclusive features and/or steps.

The invention provides a comprehensive question-answering system, which is characterized in that an intelligent question-answering engine is respectively designed aiming at multi-source and multi-type data in a specific field so as to respectively obtain candidate answers of questions input by a user, complete candidate answer fusion and return the candidate answers to the user in the modes of texts, voices, visual spectrums and the like. In an embodiment, a multi-engine intelligent question-answering system for a multi-type knowledge base is provided, and in specific implementation, as shown in fig. 1, the system includes the following contents:

aiming at the natural language description problem input by the user, firstly, complex problem understanding is carried out, and a complex problem understanding module in a corresponding graph is obtained. And deeply analyzing the problems by combining text or voice description, external business knowledge, external business general knowledge, context dialogue information, subject matters of the problems and the like, reasonably completing the user input problems, extracting core information in the user problems, and accurately identifying, understanding and classifying the real intention of the problems.

As shown in fig. 2, the structure diagram of the complex problem understanding module in this embodiment. The input of the module is a text or voice question input by a user, and the output is a corrected question obtained by transcription after understanding, a user question intention classification result and user question core information.

Firstly, performing language identification aiming at the problem of user voice input, wherein a language identification model extracts audio signal features such as Mel Frequency Cepstrum Coefficient (MFCC) and the like by adopting a method based on traditional machine learning or a deep neural network and then combines a multi-classifier model to complete the language identification of input voice; secondly, a voice recognition model is adopted to transcribe the voice problem of a specific language into Chinese natural language description, preprocessing and feature extraction of voice signals, acoustic model training and matching of input voice features and acoustic modes are sequentially completed, and a language model based on statistical grammar or regular grammar structure completes grammar semantic analysis so as to complete continuous voice transcription to obtain a natural language description text problem; then, carrying out word error correction, factual error correction and grammar error correction on a text obtained by input or voice transcription, wherein the word error correction can adopt a Seq2Seq deep convolutional neural network model facing a large-scale word error correction corpus to finish intelligent correction of errors such as multiple words, missing words and wrong words in a problem, the factual error correction adopts a fact detection and correction model based on an external common knowledge base to correct errors deviating from common knowledge in the problem, and the grammar error correction adopts an error detection model based on semantic collocation to finish semantic collocation error detection and correction in the problem; pronouns such as's/his/her/his' appearing in the user input problem, mood-assisting words, omitted entities and the like also need to be subjected to reference resolution, and the above dialogue information is combined in the embodiment by adopting a rule-based method; referring to the resolved problem as a complex problem, understanding and transcribing the problem, and respectively identifying entities, relationships among the entities, keywords and phrases of key time, places, people, organizations and the like in the problem and extracting core information serving as an input problem for other modules, wherein the core information is extracted by a deep learning-based method; calculating the similarity between the user input questions and a historical question set or a typical question bank based on the BM25 and a semantic similarity algorithm, and recommending similar questions and hot questions by combining question asking frequency; aiming at input problems of different tasks, user question intentions are divided into a fact description class, an attribute query class, a data calculation class and a statistical analysis class, user intention identification and classification are completed based on a TextCNN + Bert pre-training model, and the intention identification result is used for guiding an intelligent question-answering engine to automatically select a question-answering strategy.

Aiming at the corrected problems output by the complex problem understanding module, aiming at elements required by typical scenes in a specific field, combining a rule template and deep reinforcement learning session management, and perfecting the problems in a mode of interactive dialogue with a user, wherein the elements correspond to the human-computer multi-turn interaction module in the graph.

As shown in fig. 3, a structure diagram of the human-computer multi-wheel interaction module in the embodiment is shown. The input of the module is a corrected problem, and the output is a new problem after completion. Firstly, performing theme judgment on an input corrected problem, if the context problem does not belong to the same theme, completing theme switching, namely recording the current problem as a new problem, and otherwise, judging whether the current problem needs to perform a conversation perfection problem, namely judging the conversation state; if the problem needs to be completed through man-machine conversation, missing element analysis needs to be carried out by combining preset slot position information, a natural language question sentence is generated and sent to a user, whether necessary slot position filling is completed or not is judged after the user responds, and if not, the step is repeated; and if the groove filling is finished, outputting a new problem after completion.

And constructing an intelligent question-answer engine based on the completed new questions obtained by transcription after understanding, the user question intention classification results and the service question-answer knowledge base, and completing quick acquisition of candidate answers facing different types of knowledge bases based on the constructed multi-engine intelligent question-answer model to correspond to the intelligent question-answer engine module in the graph.

As shown in fig. 4, the structure of the intelligent question answering engine module in this embodiment is shown. The input of the module is business question-answer knowledge base data, completed new questions, intention classification results and question core information, and one or more candidate answers corresponding to the user questions, answer confidence degrees and answer sources thereof are output.

The service question-answer knowledge base is stored according to different types of knowledge respectively so that a user can ask questions. The types of knowledge bases are mainly classified into the following four categories: the common/typical questions and answers in the combing field form a question-answer pair library which is stored in an Elastic Search library; organizing structured data in the field, particularly static data with fixed and unchangeable basic parameters, attributes and the like in a period of time into a knowledge graph, and storing and displaying the knowledge graph by adopting Neo4 j; some dynamic data related to daily business activities are stored in a MySQL or Oracle structured database in a data record mode; while some unstructured documents, materials, etc. constitute a document library, stored in Elastic Search.

The question-answering engine which is most suitable for the knowledge is respectively provided for the four types of knowledge bases, and the specific steps are shown in fig. 3.

The implementation steps of the question-answer-to-library question-answer engine are as follows: firstly, adopting a BM25 algorithm to complete preliminary screening of question-answer pairs facing to an Elastic Search library, completing sorting according to scores, and taking Top-k question-answer pairs; then, calculating the semantic similarity of the text based on Simhash, finishing the sorting according to the scores, taking Top-n question-answer pairs, and finishing the re-screening; and finally, comparing the core entities such as time, country and the like in the core information with the questions in the Top-n question-answer pairs, and removing the question-answer pairs which do not contain the core entities to obtain a final candidate answer set.

The implementation steps of the knowledge-graph question-answering engine are as follows: firstly, according to entities in core information, adopting a BM25 algorithm to Search a knowledge graph entity set stored in an Elastic Search library to obtain an entity with the highest similarity reaching a specified threshold, and directly recommending similar entities to a user if no entity meeting requirements is found; then, querying all relations of the corresponding entities in the Neo4j by using cypher sentences; finally, calculating the relevance of the input question and all the relations based on natural language inference to obtain a candidate answer set; wherein, the natural language inference adopts LSTM network to calculate, the explicit matching degree of the whole question and answer is calculated by logistic regression layer:

wherein the content of the first and second substances,

and

are the parameters of the logistic regression,

is a shallow semantic vector representation of the input question.

The latent semantic association degree of the question and the answer is calculated by using a tensor neural network to obtain:

wherein the content of the first and second substances,Ta transformation of the representative tensor is performed,

and

are the parameters of the logistic regression,

is a shallow vector representation of the triplet answer.

Calculating the overall relevance score of the question and the answer from the explicit matching degree and the latent semantic relevance degree of the question and the answer by using a dynamic weighted summation mode:

wherein the threshold weight

And controlling the contribution proportions of the two in the final correlation degree, and calculating as follows:

that is, the respective degrees of validity of the explicit degree of matching and the degree of latent semantic association in the computation of the question-answer pair are dynamically estimated by the latent semantic vector of the question and the answer, rather than using fixed weights.

The database table question-answering engine is implemented by the following steps: firstly, similarity matching is carried out on the completed new problems and the names of the database tables based on a semantic similarity model, and meanwhile, the longest common substring is calculated, and a Top-k table is obtained; then, inputting the corrected problem and the table into the constructed NL2SQL model respectively to generate an SQL query statement, wherein the NL2SQL model comprises two independent models: the model 1 divides and connects the problem and the header name by adopting a flag bit and then inputs a Bert-wwm Chinese pre-training model for coding, then respectively connects a Dense network and a Cross entropy loss function layer (Cross entropy layer) aiming at three different tasks of AGG operation, combination relation and comparison relation, and finally completes training by minimizing the loss of the three tasks; the model 2 inputs the problem and the column selected by the model 1 into a Bert-wwm Chinese pre-training model to be combined to obtain a candidate < column, operation, value >, and then all candidate combinations are classified through a Dense layer to select a final combination to obtain an SQL statement; and finally, the result of SQL query facing the database table is the final answer.

The implementation steps of the reading understanding question answering engine are as follows: firstly, aiming at an Elastic Search library for storing massive unstructured documents, completing retrieval based on an IK word segmentation-BM 25 algorithm aiming at a new problem after completion, and acquiring top-k document data; then, combining the country, time and other proprietary entities in the core information to perform re-screening to obtain top-n document data; thirdly, segmenting the top-n document according to 512 lengths, connecting the segmented top-n document with a question, and inputting the segmented top-n document into a Bert Chinese pre-training model to complete paragraph screening; fourthly, connecting the question with the screened paragraphs, inputting the question Chinese pre-training model and the Attention network, and extracting the joint feature representation of the question and the paragraphs; and finally, the input Pointer network layer estimates the starting position and the span of the answer in each section.

And inputting the corrected questions into an intelligent question-answering engine module by facing the business question-answering knowledge base, and automatically selecting one or more of four question-answering engines of question-answering versus-base question-answering, knowledge map question-answering, database table question-answering and reading understanding by combining intention classification results to realize the acquisition of candidate answers, answer confidence degrees and answer sources. Wherein, the corresponding relationship between the intention classification result and the question-answering mode is as follows: the fact description type questions mainly adopt a question-answer pair library, a knowledge graph and a reading understanding question-answer engine; the attribute query questions mainly adopt a knowledge graph question-answer and reading understanding question-answer engine; the data calculation and statistical analysis questions adopt a database table question-answering engine.

And aiming at the candidate answers, the confidence degrees and the answer source sets output by the intelligent question-answering engine module, the acquisition of the final answers and the sources thereof is completed by combining the candidate answer credibility evaluation model, and the model corresponds to the multi-source answer fusion module in the graph.

As shown in fig. 5, a structure diagram of the multi-source answer fusion module in this embodiment is shown. The input of the module is a candidate answer, a confidence coefficient and an answer source set thereof, and the output is a fused answer, a confidence coefficient and an answer source.

In a system with high requirement on question answering efficiency, a candidate answer credibility evaluation model adopts a rule-based mode, and the specific rules are as follows: simultaneously requesting question-answer engines corresponding to the different intention questions by adopting a multithreading mode, and firstly returning answers meeting answer threshold values set by the engine mode to be final answers; in a system with higher requirement on the accuracy, the candidate answer credibility evaluation model also simultaneously requests question-answering engines corresponding to the different intention questions in a multithreading mode, extracts core entities such as time, people, places, organizations and the like in the questions and the answers after all the engines obtain the answers and the credibility thereof, separately connects the core entities, inputs the core entities into the pre-training model after extracting the characteristics, and inputs a 3-layer MLP network and a normalization layer to give confidence scores of the candidate answers. And finally, inputting the corrected new question and the final candidate answer input sequence into a sequence model to complete the generation of the answer described by the natural language: the encoder completes embedded encoding of the input text and calculates a state value as an initial state of the decoder module; calculating the output and hidden layer state of the splicing encoder by combining a Luong attention mechanism; the decoder predicts the probability distribution of the current output dictionary by combining the encoder input state, the context vector and the decoder history input, and the loss function adopts sparse normalized cross entropy loss between the logistic and the label.

The system comprises a visualization module corresponding to the answer acquisition process in the figure, wherein the visualization module is used for displaying the whole process of user question input, question understanding, question recommendation, question and answer engine answer acquisition and answer organization. In the embodiment, the input of the visualization module in the answer obtaining process is all intermediate and output contents of the complex question understanding module, the intelligent question answering engine and the multi-source fusion module, and the output is a visualization display result. The complex problem understanding module mainly displays user input problems, problem recommendation results, reference resolution results, intention problem results, core words and key phrases; the question-answer pair library shows a Top5 BM25 search result and a fine-ranking question answer pair; entities, cypher query sentences, triples and final map answers in a map corresponding to the knowledge map question-answer display questions; the database table question-answer display table selection result, the SQL query statement, the answer and the table record corresponding to the answer are recorded; reading, understanding, showing a Top5 result, a Top answer and a corresponding original text after the document retrieval and screening.

Given answers and sources thereof according to the intelligent question answering, the method also supports grading, feedback and model self-optimization of the answers by the user, and corresponds to an answer evaluation feedback and training optimization module in the graph.

As shown in fig. 6, the answer evaluation feedback and training optimization module in this embodiment. The input of the module is an answer and a source thereof, and the output is an answer score, a user feedback result and a trained model.

Firstly, if the user scores a given answer and the source thereof and the score is not less than a given threshold (for example, the given threshold is 3 points and the full score is 5 points), or the score is less than the given threshold and the user feeds back the answer to a given question, the answer and the source thereof are stored in a question-answer pair library so as to facilitate the next question-answer of the user and are also stored in a training corpus; then, when the corpora fed back by the user reach a certain number or reach the specified training interval time, reminding the user to perform manual training evaluation or finishing automatic training evaluation according to user setting; finally, if the EM, F1 or rough evaluation index of the current training is better than the index score of the online question-answering model version, the new version can be online or automatically online by one key to provide better question-answering service. As described above, the present invention can be preferably implemented by fully describing the specific implementation process of the present invention.

Example 1

Example 2

On the basis of the embodiment 1, the method comprises an answer obtaining process visualization module and an answer evaluation feedback and training optimization module;

the answer evaluation feedback and training optimization module is used for a user to evaluate and score given answers and correct and feed wrong answers, the result fed back by the user is directly stored in a question and answer pair library, meanwhile, historical questions, answers and sources thereof fed back by the user are organized into training corpora, and the question and answer model is trained and optimized regularly or quantitatively.

Example 3

On the basis of embodiment 1, the complex problem understanding module comprises sub-modules: the system comprises a language identification module, a voice identification module, an error correction module, a reference resolution module, a core information extraction module, a recommendation module and a classification module;

the language identification module is used for identifying the language aiming at the problem of the voice input of the user;

the voice recognition module is used for transcribing the voice problem of the specific language into Chinese natural language description and obtaining a natural language description text problem by transcribing the voice;

the classification module classifies the user question intentions aiming at the input questions of different tasks, the classification categories comprise a fact description category, an attribute query category, a data calculation category and a statistical analysis category, and the result of the question intention identification classification is used for guiding an intelligent question-answering engine to realize the selection of a question-answering strategy.

Example 4

On the basis of the embodiment 1, the combination of the subject to which the corrected problem belongs, the corresponding subject rule template and the man-machine conversation management strategy based on the deep reinforcement learning specifically includes: and judging the theme of the corrected problem, if the context problem does not belong to the same theme, completing theme switching, namely recording the current problem as a new problem, otherwise, judging whether the current problem needs to be completed by conversation: if the problem needs to be perfected through man-machine conversation, combining preset slot information to perform missing element analysis, generating a natural language question sentence and sending the natural language question sentence to the user, judging whether necessary slot filling is completed or not after the user responds, and repeating the step of combining the preset slot information to perform missing element analysis, generating a natural language question sentence and sending the natural language question sentence to the user if the necessary slot filling is not completed; and if the groove filling is finished, outputting a new problem after completion.

Example 5

On the basis of the embodiment 1, the question-answer database question-answer engine utilizes the rough sorting based on ES retrieval and the fine sorting based on the core information of the user questions, calculates the similarity with the questions in the question-answer database, and returns the answers corresponding to the questions with the highest similarity to the user;

and/or the presence of a gas in the atmosphere,

the knowledge graph question-answering engine acquires answers by utilizing entity recognition and relation extraction results facing to user questions and utilizing a Bert-BilSTM, a CRF model and a sequence generation model query graph;

Example 6

On the basis of the embodiment 1, the candidate answer credibility evaluation model simultaneously requests question-answering engines corresponding to different intention questions in a multithreading mode, and firstly returns answers meeting answer thresholds set by the corresponding question-answering engines, namely final candidate answers; or after all engines obtain answers and the credibility thereof, extracting core entities in the questions and the answers, after the core entities are separated and connected, inputting the core entities into a pre-training model together to extract features, and then inputting the core entities into an MLP network and a normalization layer to give confidence scores of all candidate answers to obtain final candidate answers;

in addition, the corrected new question and the final candidate answer can be input into the sequence model together to complete the answer generation described by the natural language: the encoder completes embedded encoding of the input text and calculates a state value as an initial state of the decoder module; calculating the output and hidden layer state of the splicing encoder by combining a Luong attention mechanism; the decoder predicts the probability distribution of the current output dictionary by combining the encoder input state, the context vector and the decoder historical input, and the loss function adopts the sparse normalization cross entropy loss between the logical stutty and the label.

Example 7

On the basis of the embodiment 1, the intelligent question-answering engine module comprises a reading understanding question-answering engine; in a reading understanding question-answering engine: firstly, aiming at an Elastic Search library, completing retrieval based on an IK word segmentation-BM 25 algorithm aiming at a new problem after correction, and acquiring the first k pieces of sequenced document data, wherein k is an integer; then, the problem-longest common substring of the chapters and the proprietary entity in the core information are combined for re-screening to obtain the first n sequenced document data, wherein n is an integer; then, segmenting the first n sequenced documents according to the length of 512, connecting the segmented documents with a question, and inputting the documents into a Bert Chinese pre-training model to complete paragraph screening; then, connecting the question with the screened paragraph, inputting the question and the screened paragraph into a Bert Chinese pre-training model and an Attention network, and extracting the joint feature representation of the question and the paragraph; and finally, the input Pointer network layer estimates the starting position and the span of the answer in each section.

Example 8

On the basis of the embodiment 1, the method comprises the steps of inputting the corrected questions into an intelligent question-answering engine by facing a service question-answering knowledge base, identifying and classifying results by combining the question intentions, selecting one or more of a question-answering database question-answering engine, a knowledge graph question-answering engine, a database table question-answering engine and a reading and understanding engine, and obtaining candidate answers, answer confidence degrees and answer sources.

Example 9

On the basis of the embodiment 8, the service question-answer knowledge bases are respectively stored according to different types of knowledge, and the types of the knowledge bases are divided into the following four types: the common/typical questions and answers in the combing field form a question-answer pair library which is stored in an Elastic Search library; aiming at structured data in the field, static data are organized into a knowledge graph, and Neo4j is adopted for storage and display; storing the dynamic data in a MySQL or Oracle structured database in a data record form; the unstructured documents and materials form a document library and are stored in an Elastic Search.

Example 10

On the basis of the embodiment 8, the correspondence between the question intention recognition classification result and the question-answering method is as follows: the fact description type questions adopt a question-answer pair library, a knowledge graph and a reading understanding question-answer engine; the attribute query questions adopt a knowledge graph question-answer and reading understanding question-answer engine; the data calculation and statistical analysis questions adopt a database table question-answering engine.

The units described in the embodiments of the present invention may be implemented by software, or may be implemented by hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.

According to an aspect of the application, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device executes the method provided in the above-mentioned various alternative implementation modes.

As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by an electronic device, cause the electronic device to implement the method described in the above embodiments.

The parts not involved in the present invention are the same as or can be implemented using the prior art.

Other embodiments than the above examples may be devised by those skilled in the art based on the foregoing disclosure, or by adapting and using knowledge or techniques of the relevant art, and features of various embodiments may be interchanged or substituted and such modifications and variations that may be made by those skilled in the art without departing from the spirit and scope of the present invention are intended to be within the scope of the following claims.

Claims

1. A multi-engine intelligent question-answering system for a multi-type knowledge base is characterized by comprising a complex question understanding module, a man-machine multi-wheel interaction module, an intelligent question-answering engine module and a multi-source answer fusion module;

the complex problem understanding module is used for identifying and classifying the real intentions of the input problems, correcting and complementing the input problems, and outputting corrected problems, problem intention identification and classification results and problem core information;

the complex problem understanding module comprises sub-modules: the system comprises a language identification module, a voice identification module, an error correction module, a reference resolution module, a core information extraction module, a recommendation module and a classification module;

the classification module is used for classifying the user question intentions aiming at the input questions of different tasks, the classified categories comprise a fact description category, an attribute query category, a data calculation category and a statistical analysis category, and the question intention identification and classification result is used for guiding an intelligent question-answering engine to realize the selection of a question-answering strategy;

aiming at the corrected questions output by the complex question understanding module, the man-machine multi-round interaction module completes multi-round conversations of the user and the question-answering system by combining the subjects to which the corrected questions belong, the corresponding subject rule template and the man-machine conversation management strategy based on deep reinforcement learning, and completes answer searching by combining a multi-question-answering engine; the combination of the subject to which the corrected problem belongs, the corresponding subject rule template and the man-machine conversation management strategy based on deep reinforcement learning specifically comprises the following steps: and judging the theme of the corrected problem, if the context problem does not belong to the same theme, finishing theme switching, namely recording the current problem as a new problem, and otherwise, judging whether the current problem needs to be completed by conversation: if the problem needs to be perfected through man-machine conversation, combining preset slot position information to perform missing element analysis, generating a natural language question sentence and sending the natural language question sentence to a user, judging whether necessary slot position filling is completed or not after the user responds, and repeating the step of combining the preset slot position information to perform missing element analysis and generating a natural language question sentence and sending the natural language question sentence to the user if the necessary slot position filling is not completed; if the groove filling is finished, outputting a new problem after completion;

in the intelligent question-answering engine module,

the question-answer database question-answer engine is used for calculating the similarity between the questions in the question-answer database and the questions in the question-answer database by using the rough sequence based on ES retrieval and the fine sequence based on the core information of the questions of the user, and returning the corresponding answer of the question with the highest similarity to the user;

and/or the presence of a gas in the gas,

the database table question-answering engine converts a seq2seq natural language text into SQL and generates an SQL query statement by combining a rule template to finish the accurate acquisition of answers corresponding to user questions;

the intelligent question-answering engine module comprises a reading understanding question-answering engine; in a reading understanding question-answering engine: firstly, aiming at an Elastic Search library, completing retrieval based on an IK word segmentation-BM 25 algorithm aiming at a new problem after correction, and acquiring the first k pieces of sequenced document data, wherein k is an integer; then, the problem-longest common substring of the chapters and the proprietary entity in the core information are combined for re-screening to obtain the first n sequenced document data, wherein n is an integer; then, segmenting the top n sequenced documents according to 512 lengths, connecting the segmented documents with questions, and inputting the documents into a Bert Chinese pre-training model to complete paragraph screening; then, connecting the question with the screened paragraphs, inputting the question and the screened paragraphs into a Bert Chinese pre-training model and an Attention network, and extracting the joint feature representation of the question and the paragraphs; finally, inputting a Pointer network layer to estimate the starting position and the span of the answer in each section;

the multi-source answer fusion module is used for distributing each answer weight by using a candidate answer credibility evaluation model aiming at the candidate answers, the answer confidences and the answer source sets output by the question-answering engine to complete the acquisition of the final answers and the sources thereof;

the candidate answer credibility evaluation model simultaneously requests question-answering engines corresponding to different intention questions by adopting a multithreading mode, and firstly returns answers meeting answer threshold values set by the corresponding question-answering engines as final candidate answers; or after all engines obtain answers and the credibility thereof, extracting core entities in the questions and the answers, after the core entities are separated and connected, inputting the core entities into a pre-training model together to extract features, and then inputting the core entities into an MLP network and a normalization layer to give confidence scores of all candidate answers to obtain final candidate answers;

2. The multi-engine intelligent question-answering system for the multi-type knowledge base according to claim 1, which comprises an answer obtaining process visualization module and an answer evaluation feedback and training optimization module;

the answer obtaining process visualization module is used for displaying the intermediate result which can be understood by the user to the user in a visualization mode, and displaying the record in the document, the knowledge graph or the database table corresponding to the answer for the user to verify and confirm the given answer;

3. The multi-engine intelligent question-answering system for the multi-type knowledge base according to claim 1, comprising a service-oriented question-answering knowledge base, wherein the corrected questions are input into the intelligent question-answering engines, and one or more of four question-answering engines, namely, question-answering to library, knowledge graph question-answering, database table question-answering and reading and understanding are selected according to the question intention recognition classification results, so that the candidate answers, the answer confidence degrees and the answer sources thereof are obtained;

the service question-answer knowledge base is stored according to different types of knowledge, and the types of the knowledge base are divided into the following four types: the common/typical questions and answers in the combing field form a question-answer pair library which is stored in an Elastic Search library; aiming at structured data in the field, static data are organized into a knowledge graph, and Neo4j is adopted for storage and display; storing the dynamic data in a MySQL or Oracle structured database in a data record form; the unstructured documents and data form a document library and are stored in an Elastic Search;

the corresponding relationship between the question intention recognition classification result and the question answering mode is as follows: the fact description type questions adopt a question-answer pair library, a knowledge graph and a reading understanding question-answer engine; the attribute query questions adopt a knowledge graph question-answer and reading understanding question-answer engine; the data calculation and statistical analysis questions adopt a database table question-answering engine.