CN112579666B - Intelligent question-answering system and method and related equipment - Google Patents
Intelligent question-answering system and method and related equipment Download PDFInfo
- Publication number
- CN112579666B CN112579666B CN202011476684.4A CN202011476684A CN112579666B CN 112579666 B CN112579666 B CN 112579666B CN 202011476684 A CN202011476684 A CN 202011476684A CN 112579666 B CN112579666 B CN 112579666B
- Authority
- CN
- China
- Prior art keywords
- knowledge
- question
- type
- data
- text
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 51
- 239000013598 vector Substances 0.000 claims abstract description 46
- 238000004458 analytical method Methods 0.000 claims abstract description 38
- 238000012545 processing Methods 0.000 claims abstract description 33
- 238000000605 extraction Methods 0.000 claims abstract description 19
- 238000004364 calculation method Methods 0.000 claims abstract description 7
- 238000013145 classification model Methods 0.000 claims description 43
- 238000012549 training Methods 0.000 claims description 36
- 238000012795 verification Methods 0.000 claims description 16
- 238000012360 testing method Methods 0.000 claims description 15
- 238000002372 labelling Methods 0.000 claims description 13
- 239000000463 material Substances 0.000 claims description 10
- 238000007781 pre-processing Methods 0.000 claims description 3
- 238000005516 engineering process Methods 0.000 description 17
- 238000009411 base construction Methods 0.000 description 10
- 238000010586 diagram Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 238000011160 research Methods 0.000 description 4
- 238000013473 artificial intelligence Methods 0.000 description 3
- 230000009193 crawling Effects 0.000 description 3
- 238000003058 natural language processing Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 2
- 238000010224 classification analysis Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000004321 preservation Methods 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2468—Fuzzy queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Fuzzy Systems (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Automation & Control Theory (AREA)
- Probability & Statistics with Applications (AREA)
- Artificial Intelligence (AREA)
- Human Computer Interaction (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses an intelligent question-answering system and method and related equipment. The system comprises: the problem analysis subsystem is used for generating a problem description text and identifying the problem type, namely the real type, the non-type and the definition type; the knowledge base generation subsystem is used for sorting the knowledge data of the actual type and the non-type into entity quadruples to form knowledge strips; the defined knowledge data are arranged into a question answer pairing mode and combined with text feature vectors to form knowledge strips; the answer extraction and generation subsystem is used for analyzing the real-time and non-real-time questions to obtain real quadruples, and searching and matching the real quadruples to obtain answers; and carrying out vectorization processing and feature vector similarity calculation on the defined questions to obtain answers. According to the invention, through carrying out problem classification, database inquiry and feature vector similarity calculation are respectively utilized to carry out matching, so that the problem can be answered more accurately; the knowledge content base adopts a structured knowledge bar form, so that the storage capacity can be reduced, and the retrieval efficiency is higher and more convenient.
Description
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to an intelligent question-answering system and method and related equipment.
Background
With the development of information technology, artificial intelligence technology and internet technology, the technology of human interaction with intelligent systems is continuously perfected, and related intelligent applications are rapidly developed.
In recent years, an intelligent question-answering system is a big research hotspot in the field of artificial intelligence, and plays an increasingly important role in human daily life. The natural language processing domain task contained in the intelligent question-answering system generally needs to process knowledge information contained in a large amount of corpus text. The search for a better way to realize the intelligent question-answering system based on a large amount of knowledge information is quite valuable in research and application. However, due to the complexity of natural language, it is very difficult for a computer to correctly understand human language, and especially, the application effect of the intelligent question-answering system is far from meeting the needs of users facing some special fields with a large number of technical terms.
The technical problems related to the intelligent question-answering system mainly comprise: text information matching and knowledge base construction.
(1) Text information matching
For a question-answering system, matching to a corresponding answer according to a question is an indispensable link. The text information matching in the question-answering system refers to inputting the predefined questions and the corresponding answers into a database, searching the questions in the database according to the questions proposed by the user, and finally extracting the answers corresponding to the questions with the highest similarity and feeding back the answers to the user.
(2) Knowledge base construction
Behind the question-answering system is often a knowledge database that supports the system with sufficient knowledge as answers to answer the user's questions. The knowledge base can be built according to the data information of different fields, so that the question-answering system with different purposes is supported.
The practice finds that the prior traditional question-answering system has the following technical problems:
1. When the traditional question-answering system carries out text information matching, firstly, fuzzy query is carried out on user questions in a knowledge database, the target which is most matched with the user question description keywords is selected from the query results, and finally, answers corresponding to the target are returned to be answer feedback of the user. But such a matching manner easily results in a case where the user question is not properly understood and thus a wrong answer is returned, such as: the two questions of how the flow of applying the bonus is and how the bonus is applied are finally the questions which should be pointed to the same answer, but only the keyword matching of the fuzzy query is used for identifying the questions, which can lead to the situation that the answer matching is wrong or the answer is not found.
2. In order to match the questions presented by the user as much as possible, one answer is associated with the questions in different description modes in the knowledge base so as to match the answers, and the knowledge base is constructed in such a way that the capacity of the database is increased, so that the problem of reduced retrieval efficiency is caused.
Disclosure of Invention
The invention aims to provide an intelligent question-answering system which is used for enabling questions of a user to be better matched with a knowledge content base so as to answer the questions of the user more accurately; and the structure of the knowledge content library is perfected, so that the knowledge content library is more convenient to search, and the search efficiency is higher. The invention also aims to provide a corresponding intelligent question-answering method and related equipment.
In order to achieve the above object, the present invention adopts the following technical scheme.
In a first aspect of the present invention, an intelligent question-answering system is provided, including: the system comprises a question analysis subsystem, a knowledge base generation subsystem and an answer extraction and generation subsystem; wherein,
The problem analysis subsystem is used for generating a problem description text according to a problem input by a user, and performing category analysis by adopting a pre-trained classification model to identify the problem types, wherein the problem types comprise a real type, a non-type and a definition type;
The knowledge base generation subsystem is used for processing the acquired knowledge data into a structured knowledge strip form to form a knowledge content base; the method comprises the steps of arranging knowledge data of a real type and a non-real type into a real quadruple comprising an event, time, place and person and content descriptions corresponding to the real quadruple respectively to form a knowledge strip; the defined knowledge data are arranged into a pairing form comprising question descriptions and answer descriptions, and meanwhile, the question descriptions are vectorized to obtain corresponding text feature vectors, so that knowledge strips are formed;
the answer extraction and generation subsystem is used for searching the knowledge content base according to the question description text and the question type obtained by the question analysis subsystem and matching the knowledge content base with the adapted answer; the method comprises the steps of analyzing a question description text to obtain entity quaternion comprising an event, time, place and person from the question description text, searching and matching the entity quaternion contained in a knowledge bar in a knowledge content library, and taking the content description of the knowledge bar obtained by matching as an answer of a question input by a user; the question type is defined, vectorization processing is carried out on the question description text to obtain an input question text vector, similarity calculation is carried out on the input question text vector and the text feature vector contained in the knowledge bar in the knowledge content base, and the answer description corresponding to the knowledge bar with the highest similarity is used as an answer of the question input by the user.
In one possible implementation, the problem analysis subsystem includes:
The voice recognition module is used for converting the questions input by the user in a voice mode into a text format to obtain a question description text;
The problem type judging module is used for training the labeled data by using supervised learning in advance to obtain a classification model; and carrying out category analysis on the problem description text by adopting the trained classification model, and identifying the problem type.
In one possible implementation manner, the problem type discriminating module is specifically configured to:
acquiring question data related to questions and answers from a network, and processing the question data into sentence pair forms corresponding to the questions and the answers;
dividing the problem data, and labeling each sentence, wherein the labeled label comprises three types of real type, non-type and definition type, so as to obtain a labeled problem data set;
The labeling problem dataset is divided into three subsets: training set, verification set and test set;
And inputting the text data of the three subsets into an initial classification model to perform model training, verification and test, and finally obtaining the classification model meeting the requirements.
In one possible implementation, the classification model includes two layers, wherein the first layer is an input layer for feature vectorization of input text data; the second layer is a classifier, which is used for inputting the training set with the vectorized features and the corresponding labels and then learning, carrying out label prediction on the problems in the verification set by using the obtained classifier, comparing the predicted labels with the original labels to obtain error values, repeatedly learning the training set according to the error values until the iteration ending condition is met, and taking the classifier with the minimum error values as the classifier of the final classification model.
In one possible implementation, the knowledge base generation subsystem includes:
a material input module; the knowledge data are used for collecting knowledge data in the required field and serve as materials;
A knowledge generation module; the method comprises the steps of preprocessing materials, and processing acquired knowledge data into a structured knowledge strip form;
the knowledge storage module: and the obtained knowledge strips are input into a database to form a knowledge content base.
The second aspect of the invention provides an intelligent question-answering method, which comprises the following steps:
a problem analysis step: generating a problem description text according to a problem input by a user, and performing category analysis by adopting a pre-trained classification model to identify the problem types, wherein the problem types comprise a real type, a non-type and a definition type;
A knowledge base generation step: processing the acquired knowledge data into a structured knowledge strip form to form a knowledge content library; the method comprises the steps of arranging real-type knowledge data and non-type knowledge data into real quadruples comprising events, time, places and characters and content descriptions corresponding to the real quadruples respectively, wherein the real quadruples comprise a knowledge strip; the method comprises the steps of arranging defined knowledge data into a pairing form comprising question descriptions and answer descriptions, and carrying out vectorization processing on the question descriptions to obtain corresponding text feature vectors to obtain a knowledge strip;
Answer extraction and generation: searching a knowledge content base according to the question description text and the question type obtained by the question analysis subsystem, and matching the knowledge content base with an adapted answer; the method comprises the steps of analyzing a question description text to obtain entity quaternion comprising an event, time, place and person from the question description text, searching and matching the entity quaternion contained in a knowledge bar in a knowledge content library, and taking the content description of the knowledge bar obtained by matching as an answer of a question input by a user; the question type is defined, vectorization processing is carried out on the question description text to obtain an input question text vector, similarity calculation is carried out on the input question text vector and the text feature vector contained in the knowledge bar in the knowledge content base, and the answer description corresponding to the knowledge bar with the highest similarity is used as an answer of the question input by the user.
In one possible implementation, the problem analysis step includes:
a voice recognition sub-step: the method comprises the steps of converting a problem input by a user in a voice mode into a text format to obtain a problem description text;
A problem type judging sub-step: training the labeled data by using supervised learning in advance to obtain a classification model; carrying out category analysis on the problem description text by adopting a trained classification model, and identifying the problem type; the classification model is.
In one possible implementation, the training the tagged data in advance using supervised learning to obtain the classification model includes:
acquiring question data related to questions and answers from a network, and processing the question data into sentence pair forms corresponding to the questions and the answers;
dividing the problem data, and labeling each sentence, wherein the labeled label comprises three types of real type, non-type and definition type, so as to obtain a labeled problem data set;
The labeling problem dataset is divided into three subsets: training set, verification set and test set;
And inputting the text data of the three subsets into an initial classification model to perform model training, verification and test, and finally obtaining the classification model meeting the requirements.
In a third aspect of the present invention, there is provided a computer device comprising a processor and a memory, the memory having stored therein a program comprising computer-executable instructions, the processor executing the computer-executable instructions stored in the memory when the computer device is running to cause the computer device to perform the intelligent question-answering method according to the second aspect.
A fourth aspect of the present invention provides a computer-readable storage medium storing one or more programs, the one or more programs comprising computer-executable instructions, which when executed by a computer device, cause the computer device to perform the intelligent question-answering method as described in the second aspect.
By adopting the technical scheme, the invention has the following technical effects:
based on the idea of text information matching, the invention researches and constructs an intelligent question-answering system which takes open-field knowledge information as a knowledge content base from the aspects of entities, relation information and the like contained in sentences. The invention classifies the problems into three types, namely real type, non-type and definition type, and adopts different processing modes aiming at different types.
For the real type, the non-type: in knowledge content base construction, the entity information (event, time, place and character) of the question text and answer text in the knowledge data is utilized, and a structured knowledge strip form is adopted, so that the knowledge strip in the knowledge content base can be conveniently and structurally searched while the storage capacity of the knowledge base is reduced, and the searching efficiency is accelerated; in question answer matching, based on information entities of question sentences and relation information among the entities, knowledge strips in a knowledge base can be better matched by adopting a database query technology, so that questions of users can be answered more accurately.
For the definition type: in knowledge content base construction, a structured knowledge strip form is adopted to be arranged into a pairing form comprising question description and answer description, and text feature vectors processed by natural language are also utilized, so that the formed knowledge strip can be retrieved and matched more conveniently; in question and answer matching, matching of questions and answers is performed by using a text feature vector similarity matching technology of natural language processing, so that questions of users can be answered more accurately and more efficiently.
As above, when the answers of the questions are matched, the invention not only relies on the database query technology, but also utilizes the text feature vector similarity matching technology processed by the upper natural language to match the questions and the answers, so that the questions of the user can be answered more accurately; in knowledge content base construction, the knowledge content base is perfected into a structured knowledge strip form by utilizing entity information of a question text and an answer text in knowledge data, so that the storage capacity of the knowledge content base is reduced, and the retrieval efficiency is improved and the knowledge content base is more convenient to retrieve.
Therefore, the problems that in the prior art, answer matching is easy to be wrong and the matching efficiency is low due to the fact that only fuzzy query is adopted for keyword matching are solved, and the problems that in the prior art, the database capacity is increased and the retrieval efficiency is reduced due to the fact that a knowledge base construction mode is poor are solved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments and the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of an intelligent question-answering system according to one embodiment of the present invention;
FIG. 2 is a workflow diagram of a problem analysis subsystem in one embodiment of the invention;
FIG. 3 is a workflow diagram of a knowledge base generation subsystem in one embodiment of the invention;
FIG. 4 is a flowchart of the operation of the answer extraction and generation subsystem in one embodiment of the invention;
FIG. 5 is a flow chart of an intelligent question-answering method according to one embodiment of the present invention;
fig. 6 is a schematic structural diagram of a computer device according to an embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
The terms first, second, third and the like in the description and in the claims and in the above drawings, are used for distinguishing between different objects and not necessarily for describing a particular sequential or chronological order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.
The following will each explain in detail by means of specific examples.
Referring to fig. 1, an intelligent question-answering system is provided according to an embodiment of the present invention.
The architecture of the intelligent question-answering system is shown in fig. 1, and mainly comprises three subsystems, namely: a question analysis subsystem 10, a knowledge base generation subsystem 20, and an answer extraction and generation subsystem 30. The problem analysis subsystem 10 is configured to generate a problem description text according to a problem presented by a user, perform category analysis to determine a problem type, and facilitate the other two subsystems to process according to the problem type so as to match an adapted answer; the knowledge base generation subsystem 20 is used for preprocessing the acquired knowledge data into a structured knowledge strip form to form a knowledge content base which can be provided for the answer extraction and generation subsystem 30 to search and match answers to questions; the answer extraction and generation subsystem 30 is used for retrieving the knowledge content base according to the question description text and the question type obtained by the question analysis subsystem and matching the knowledge content base with the adapted answer.
The following describes each subsystem and implementation mode of the intelligent question-answering system in detail.
First, problem analysis subsystem
In the present invention, the question analysis subsystem 10 may be used to generate a question description text according to a question entered by a user, and may use a pre-trained classification model to perform a classification analysis to identify a question type. In the present invention, three question types, namely, a real type, a non-type and a definition type, are preset. And each subsequent subsystem adopts different processing modes for the problems according to different types of the problems. In the invention, the user can input the proposed problems in a text or voice mode, and preferably, a voice input mode is supported.
In some embodiments of the present invention, the problem analysis subsystem 10 may further include:
The voice recognition module 11 is configured to convert a question input by a user in a voice manner into a text format, and obtain a question description text;
The question type discriminating module 12 is configured to train the tagged data by supervised learning in advance to obtain a classification model; and carrying out category analysis on the problem description text by adopting the trained classification model, and identifying the problem type.
Next, two modules will be described separately in connection with the workflow shown in fig. 2.
A) Speech recognition module
The voice recognition module can convert the voice input into a text format according to the voice input so as to facilitate subsequent processing. The intelligent question-answering system can be used for inputting questions by a user through voice, then identifying by utilizing a voice identification technology and converting the questions into a text format convenient to process, and the obtained text becomes a question description text. The main workflow of the module comprises:
A.1 The user speaks a question, and as an input of the system, the input voice question can be recognized and converted into a text form, such as a text form with punctuation marks, by utilizing the existing voice recognition technology in view of the fact that the existing voice recognition technology is mature, so as to obtain a question description text which is recorded as problem _text.
A.2 Considering that there may be a case where a user simultaneously proposes a plurality of questions, some embodiments may use a title symbol existing after converting text by voice to sentence the question description text problem _text, to obtain text problem _text= { sentence _1, text_2, …, sentence _n }, where sentence _1 is the 1 st sentence in the user voice, …, sentence _n is the n-th sentence in the user voice.
B) Question type discriminating module
The question type judging module is used for classifying the questions obtained by the voice recognition module, recognizing the question types, and transmitting the question description text and the question types to the subsequent answer extraction and generation subsystem so as to carry out corresponding answer retrieval operation on the questions of different types. The core of the problem type judging module is a classifying model which is obtained by training tagged data through supervised learning, so that the aim of classifying the problems is fulfilled. The main workflow of the module comprises:
B.1 By crawling from the network, etc., to obtain question data related to questions and answers, for example: data on a question and answer platform such as hundred degrees of knowledge, dog question searching and the like can be utilized. After crawling, the acquired problem data is cleaned, for example: and removing the content which is left over in the crawling process and is irrelevant to the question-answering task, such as webpage labels, messy code symbols and the like. Finally, the method is processed into { problems: answer } sentence pair form, i.e., each piece of question data is composed of a question and its answer to { question: answer sentence pair format preservation.
B.2 Dividing the problem data obtained through the B.1), marking each sentence, and dividing the marked labels into three types according to the problem types, namely [ real types; is of the non-type; definition type ]. The fact type refers to a problem of "what" type, such as: what color such flowers are; being a type refers to a problem of the "not being" type, such as: whether or not such flowers are red; definition type refers to a "why" type of problem, such as: why such flowers are red. If some sentences exist, the sentences do not have problematic properties, such as: i want to ask … that the sentences are not annotated and removed from the question data. After labeling, a labeling problem dataset can be obtained, which is denoted as tag_ problem _list= { problem _1, problem_2, …, problem _n }, where problem _n is the nth problem in the problem data.
B.3 Data partitioning: dividing the labeling problem data set tag_ problem _list obtained in the step B.2) into the following three parts:
(1) Training set: the training data is mainly used as a model;
(2) Verification set: the method is mainly used for verifying the performance of the training model;
(3) Test set: mainly for testing the effect of the final model.
The dividing standard is A, B and C, namely that the training set occupies A% of the original data set, the verification set occupies B% and the test set occupies C%, wherein A, B, C are engineering experience parameters, such as 8, 1 and 1 respectively.
B.4 After the data are preprocessed by the steps, the data of the marked problem data set can be input into an initial classification model for model training, verification and test are carried out, and finally the classification model meeting the requirements is obtained. The training flow of the classification model is as follows:
(1) The first layer of the classification model is an input layer, and the main function is to perform feature vectorization on input text data. In some embodiments, the chinese pre-training model BERT provided by google may be utilized as an input layer, i.e., BERT as a feature extractor. By inputting the labeled problem data set tag_ problem _list obtained by b.3) to the input layer, the problem description text can be converted into sentence vector form which can be understood by a computer, and a problem feature vector set vector= { v_1, v_2, …, v_n }, where v_n represents the feature vector of the nth problem, is obtained.
(2) The second layer of the model is a classifier, the classifier learns based on the information-contained characteristics of each problem characteristic vector in the training set by inputting the training set with the characteristic vector and the corresponding label thereof into the classifier, then, label prediction can be carried out on each problem in the training set, and then, the original label and the predicted label of the verification set are utilized for comparison, so that an error value is obtained. And finally, the classifier learns the problem data of the training set again according to the error value.
(3) And (3) repeating the classifier training process in the step (2) until the iteration ending condition is met, and finally, reserving the classifier with the minimum error value as the classifier of the final classification model.
For the classification model obtained through training, the model can be tested by utilizing the partitioned test set, and the quality of the model can be quantified by the evaluation standards of the problem type identification accuracy, recall rate, F1 value and the like.
(II) knowledge base generation subsystem
2. Knowledge base generation subsystem
Knowledge base generation subsystem 20 may be used to collect knowledge data for some areas and then pre-process, i.e., process into a structured knowledge strip form, to finally form a knowledge content base. The method has the main function of providing a knowledge content base and providing answer matching for the questions of the user so as to complete the question-answering task.
The knowledge content base contains knowledge bars in the form of structure shown in fig. 3. The knowledge base generation subsystem is used for sorting knowledge data of fact type and non-type problems into entity quaternions comprising events, time, places and characters and respectively corresponding content descriptions (or corresponding support knowledge descriptions) to form knowledge strips; and arranging the defined question knowledge data into a pairing form comprising the question description and the answer description, and simultaneously carrying out vectorization processing on the question description to obtain a corresponding text feature vector to form a knowledge strip comprising the question description, the answer description and the text feature vector of the question description.
The knowledge base generation subsystem 20 may include several modules: (1) a material input module 21; (2) a knowledge generation module 22; (3) a knowledge storage module 23. The workflow of the knowledge base generation subsystem is as follows:
A) The material input module is responsible for collecting knowledge data, and aiming at different business requirements, knowledge data in different fields are collected to serve as materials. For example, data belonging to the field of plants are crawled from the Baidu encyclopedia, and then the data have the name of a certain plant and further include detailed information such as the description of the plant; for another example, knowledge of the crawled question and answer data from hundred degrees may have question descriptions and answer descriptions. So that the knowledge content is utilized as an "answer source" for the question-answering system of the present patent.
B) The knowledge generation module is responsible for processing the acquired materials, and processing the knowledge data acquired in the step A) into a structured knowledge strip form so as to facilitate the retrieval of a knowledge base. The intelligent question-answering system mainly aims at three types of problems (namely, real type, non-type and definition type respectively), so that a knowledge base needs to establish knowledge strips based on the three types of problems. Knowledge strip processing is largely divided into two forms:
(1) For answering real-type and non-type questions: the acquired knowledge data of encyclopedia type is arranged into (event, time, place and character) entity quaternion and content descriptions corresponding to the entity quaternion respectively, and the entity quaternion is used as a knowledge strip;
(2) For answering defined questions: and (3) sorting out the acquired knowledge data of the question-answer type (question description and answer description) in a pairing mode, and carrying out vectorization processing on the question description text data, namely, a text feature vector corresponding to the question description text exists in the question description text, wherein the text feature vector is denoted as question _library_ vertor = { qlv _1, …, qlv _n }, and qlv _n represents the feature vector corresponding to the nth question. The text feature vector is also provided as part of the knowledge bar to the subsequent answer extraction and generation subsystem for the question similarity matching operation.
C) The knowledge storage module is used for inputting the knowledge pieces obtained in the step B) into a database one by one to form a knowledge content base for the subsequent answer extraction and generation subsystem to use.
(III) answer extraction and generation subsystem
The answer extraction and generation subsystem is used for searching the knowledge content base according to the question description text and the question type obtained by the question analysis subsystem and matching the knowledge content base with the adapted answer; and carrying out different retrieval matching processing on the knowledge content in the knowledge base content according to different problem types.
The workflow of the answer extraction and generation subsystem 30 is shown in fig. 4.
(1) For real and non-real type problems
And for the fact type and the non-type of the question type, analyzing the question description text to obtain entity quaternions comprising events, time, places and characters, searching and matching the obtained entity quaternions with the entity quaternions contained in the knowledge strips in the knowledge content library, and returning the content description of the knowledge strips obtained by matching to the user as an answer corresponding to the question input by the user.
(2) Aiming at the definition type problem:
For the problem type being defined, firstly, carrying out vectorization processing on a problem description text to obtain an input problem text vector, which can be named question _input_ vertor = { qiv _1, …, qiv _n }, wherein qiv _n represents a feature vector corresponding to the nth problem;
Then, using the text feature vector question _library_ vertor contained in the knowledge item in the knowledge content library and the text vector question _input_ vertor of the problem to calculate the similarity; and finally, the answer description corresponding to the knowledge bar with the highest similarity is used as the answer of the question input by the user and returned to the user.
Implementation mode of intelligent question-answering system
In order to receive the question request of the user in real time, in some embodiments, the question analysis subsystem, the knowledge base generation subsystem and the answer extraction and generation subsystem may be preloaded into the memory, and implemented in a B/S (Browser/Server) mode, so that the question extraction and generation subsystem can process the question in time and return the result whenever a user initiates a processing request.
As described above, the intelligent question-answering system disclosed in the embodiment of the present invention is described. When the questions and answers are matched, the system not only depends on the database query technology, but also utilizes the text feature vector similarity matching technology processed by the upper natural language to match the questions and the answers, and can answer the questions of the user more accurately. The system utilizes the entity information of the question text and the answer text in the knowledge data to perfect the knowledge content library into a structured knowledge strip form on the construction of the knowledge content library, thereby reducing the storage capacity of the knowledge content library, and simultaneously accelerating the retrieval efficiency and being more convenient to retrieve.
Referring to fig. 5, in one embodiment of the present invention, an intelligent question-answering method is further provided, which includes:
step S1, problem analysis: generating a problem description text according to a problem input by a user, and performing category analysis by adopting a pre-trained classification model to identify the problem types, wherein the problem types comprise a real type, a non-type and a definition type;
Step S2, knowledge base generation: processing the acquired knowledge data into a structured knowledge strip form to form a knowledge content library; the method comprises the steps of arranging real-type knowledge data and non-type knowledge data into real quadruples comprising events, time, places and characters and content descriptions corresponding to the real quadruples respectively, wherein the real quadruples comprise a knowledge strip; the method comprises the steps of arranging defined knowledge data into a pairing form comprising question descriptions and answer descriptions, and carrying out vectorization processing on the question descriptions to obtain corresponding text feature vectors to obtain a knowledge strip;
Step S3, answer extraction and generation: searching a knowledge content base according to the question description text and the question type obtained by the question analysis subsystem, and matching the knowledge content base with an adapted answer; the method comprises the steps of analyzing a question description text to obtain entity quaternion comprising an event, time, place and person from the question description text, searching and matching the entity quaternion contained in a knowledge bar in a knowledge content library, and taking the content description of the knowledge bar obtained by matching as an answer of a question input by a user; the question type is defined, vectorization processing is carried out on the question description text to obtain an input question text vector, similarity calculation is carried out on the input question text vector and the text feature vector contained in the knowledge bar in the knowledge content base, and the answer description corresponding to the knowledge bar with the highest similarity is used as an answer of the question input by the user.
Further, the problem analysis step S1 may include:
a voice recognition sub-step: the method comprises the steps of converting a problem input by a user in a voice mode into a text format to obtain a problem description text;
A problem type judging sub-step: training the labeled data by using supervised learning in advance to obtain a classification model; carrying out category analysis on the problem description text by adopting a trained classification model, and identifying the problem type; the classification model is.
Further, in the step of determining the problem type, training the labeled data by using supervised learning in advance to obtain a classification model may include:
acquiring question data related to questions and answers from a network, and processing the question data into sentence pair forms corresponding to the questions and the answers;
dividing the problem data, and labeling each sentence, wherein the labeled label comprises three types of real type, non-type and definition type, so as to obtain a labeled problem data set;
The labeling problem dataset is divided into three subsets: training set, verification set and test set;
And inputting the text data of the three subsets into an initial classification model to perform model training, verification and test, and finally obtaining the classification model meeting the requirements.
Referring to fig. 6, an embodiment of the present invention further includes a computer device 60 including a processor 61 and a memory 62, where the memory 62 stores a program including computer-executable instructions, and when the computer device 60 is running, the processor 61 executes the computer-executable instructions stored in the memory 62 to cause the computer device 60 to perform the intelligent question-answering method as described above.
An embodiment of the present invention also provides a computer-readable storage medium storing one or more programs, the one or more programs comprising computer-executable instructions, which when executed by a computer device, cause the computer device to perform the intelligent question-answering method as described previously.
In summary, the invention discloses an intelligent question-answering system, a corresponding intelligent question-answering method and related equipment, which have the following technical effects:
based on the idea of text information matching, the invention researches and constructs an intelligent question-answering system which takes open-field knowledge information as a knowledge content base from the aspects of entities, relation information and the like contained in sentences. The invention classifies the problems into three types, namely real type, non-type and definition type, and adopts different processing modes aiming at different types.
For the real type, the non-type: in knowledge content base construction, the entity information (event, time, place and character) of the question text and answer text in the knowledge data is utilized, and a structured knowledge strip form is adopted, so that the knowledge strip in the knowledge content base can be conveniently and structurally searched while the storage capacity of the knowledge base is reduced, and the searching efficiency is accelerated; in question answer matching, based on information entities of question sentences and relation information among the entities, knowledge strips in a knowledge base can be better matched by adopting a database query technology, so that questions of users can be answered more accurately.
For the definition type: in knowledge content base construction, a structured knowledge strip form is adopted to be arranged into a pairing form comprising question description and answer description, and text feature vectors processed by natural language are also utilized, so that the formed knowledge strip can be retrieved and matched more conveniently; in question and answer matching, matching of questions and answers is performed by using a text feature vector similarity matching technology of natural language processing, so that questions of users can be answered more accurately and more efficiently.
As above, when the answers of the questions are matched, the invention not only relies on the database query technology, but also utilizes the text feature vector similarity matching technology processed by the upper natural language to match the questions and the answers, so that the questions of the user can be answered more accurately; in knowledge content base construction, the knowledge content base is perfected into a structured knowledge strip form by utilizing entity information of a question text and an answer text in knowledge data, so that the storage capacity of the knowledge content base is reduced, and the retrieval efficiency is improved and the knowledge content base is more convenient to retrieve.
Therefore, the problems that in the prior art, answer matching is easy to be wrong and the matching efficiency is low due to the fact that only fuzzy query is adopted for keyword matching are solved, and the problems that in the prior art, the database capacity is increased and the retrieval efficiency is reduced due to the fact that a knowledge base construction mode is poor are solved.
In the foregoing embodiments, the descriptions of the embodiments are each focused, and for those portions of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.
The above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; those of ordinary skill in the art will appreciate that: the technical scheme described in the above embodiments can be modified or some technical features thereof can be replaced equivalently; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims (10)
1. An intelligent question-answering system, comprising:
The problem analysis subsystem is used for generating a problem description text according to a problem input by a user, and performing category analysis by adopting a pre-trained classification model to identify the problem types, wherein the problem types comprise a real type, a non-type and a definition type;
The knowledge base generation subsystem is used for processing the acquired knowledge data into a structured knowledge strip form to form a knowledge content base; the method comprises the steps of arranging knowledge data of a real type and a non-real type into a real quadruple comprising an event, time, place and person and content descriptions corresponding to the real quadruple respectively to form a knowledge strip; the defined knowledge data are arranged into a pairing form comprising question descriptions and answer descriptions, and meanwhile, the question descriptions are vectorized to obtain corresponding text feature vectors, so that knowledge strips are formed;
the answer extraction and generation subsystem is used for searching the knowledge content base according to the question description text and the question type obtained by the question analysis subsystem and matching the knowledge content base with the adapted answer; the method comprises the steps of analyzing a question description text to obtain entity quaternion comprising an event, time, place and person from the question description text, searching and matching the entity quaternion contained in a knowledge bar in a knowledge content library, and taking the content description of the knowledge bar obtained by matching as an answer of a question input by a user; the question type is defined, vectorization processing is carried out on the question description text to obtain an input question text vector, similarity calculation is carried out on the input question text vector and the text feature vector contained in the knowledge bar in the knowledge content base, and the answer description corresponding to the knowledge bar with the highest similarity is used as an answer of the question input by the user.
2. The intelligent question-answering system according to claim 1, wherein the question analysis subsystem comprises:
The voice recognition module is used for converting the questions input by the user in a voice mode into a text format to obtain a question description text;
The problem type judging module is used for training the labeled data by using supervised learning in advance to obtain a classification model; and carrying out category analysis on the problem description text by adopting the trained classification model, and identifying the problem type.
3. The intelligent question-answering system according to claim 2, wherein the question type discrimination module is specifically configured to:
acquiring question data related to questions and answers from a network, and processing the question data into sentence pair forms corresponding to the questions and the answers;
dividing the problem data, and labeling each sentence, wherein the labeled label comprises three types of real type, non-type and definition type, so as to obtain a labeled problem data set;
The labeling problem dataset is divided into three subsets: training set, verification set and test set;
And inputting the text data of the three subsets into an initial classification model to perform model training, verification and test, and finally obtaining the classification model meeting the requirements.
4. The intelligent question-answering system according to claim 3, wherein,
The classification model comprises two layers, wherein the first layer is an input layer and is used for carrying out feature vectorization on input text data; the second layer is a classifier, which is used for inputting the training set with the vectorized features and the corresponding labels and then learning, carrying out label prediction on the problems in the verification set by using the obtained classifier, comparing the predicted labels with the original labels to obtain error values, repeatedly learning the training set according to the error values until the iteration ending condition is met, and taking the classifier with the minimum error values as the classifier of the final classification model.
5. The intelligent question-answering system according to claim 1, wherein the knowledge base generation subsystem comprises:
a material input module; the knowledge data are used for collecting knowledge data in the required field and serve as materials;
A knowledge generation module; the method comprises the steps of preprocessing materials, and processing acquired knowledge data into a structured knowledge strip form;
the knowledge storage module: and the obtained knowledge strips are input into a database to form a knowledge content base.
6. An intelligent question-answering method is characterized by comprising the following steps:
a problem analysis step: generating a problem description text according to a problem input by a user, and performing category analysis by adopting a pre-trained classification model to identify the problem types, wherein the problem types comprise a real type, a non-type and a definition type;
A knowledge base generation step: processing the acquired knowledge data into a structured knowledge strip form to form a knowledge content library; the method comprises the steps of arranging real-type knowledge data and non-type knowledge data into real quadruples comprising events, time, places and characters and content descriptions corresponding to the real quadruples respectively, wherein the real quadruples comprise a knowledge strip; the method comprises the steps of arranging defined knowledge data into a pairing form comprising question descriptions and answer descriptions, and carrying out vectorization processing on the question descriptions to obtain corresponding text feature vectors to obtain a knowledge strip;
Answer extraction and generation: searching a knowledge content base according to the question description text and the question type obtained by the question analysis subsystem, and matching the knowledge content base with an adapted answer; the method comprises the steps of analyzing a question description text to obtain entity quaternion comprising an event, time, place and person from the question description text, searching and matching the entity quaternion contained in a knowledge bar in a knowledge content library, and taking the content description of the knowledge bar obtained by matching as an answer of a question input by a user; the question type is defined, vectorization processing is carried out on the question description text to obtain an input question text vector, similarity calculation is carried out on the input question text vector and the text feature vector contained in the knowledge bar in the knowledge content base, and the answer description corresponding to the knowledge bar with the highest similarity is used as an answer of the question input by the user.
7. The method of claim 6, wherein the problem analysis step comprises:
a voice recognition sub-step: the method comprises the steps of converting a problem input by a user in a voice mode into a text format to obtain a problem description text;
A problem type judging sub-step: training the labeled data by using supervised learning in advance to obtain a classification model; and carrying out category analysis on the problem description text by adopting the trained classification model, and identifying the problem type.
8. The method of claim 6, wherein the pre-training the tagged data with supervised learning to obtain a classification model comprises:
acquiring question data related to questions and answers from a network, and processing the question data into sentence pair forms corresponding to the questions and the answers;
dividing the problem data, and labeling each sentence, wherein the labeled label comprises three types of real type, non-type and definition type, so as to obtain a labeled problem data set;
The labeling problem dataset is divided into three subsets: training set, verification set and test set;
And inputting the text data of the three subsets into an initial classification model to perform model training, verification and test, and finally obtaining the classification model meeting the requirements.
9. A computer device comprising a processor and a memory, the memory having stored therein a program comprising computer-executable instructions that when executed by the computer device cause the computer device to perform the intelligent question-answering method of claim 6.
10. A computer readable storage medium storing one or more programs, the one or more programs comprising computer-executable instructions, which when executed by a computer device, cause the computer device to perform the intelligent question-answering method of claim 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011476684.4A CN112579666B (en) | 2020-12-15 | 2020-12-15 | Intelligent question-answering system and method and related equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011476684.4A CN112579666B (en) | 2020-12-15 | 2020-12-15 | Intelligent question-answering system and method and related equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112579666A CN112579666A (en) | 2021-03-30 |
CN112579666B true CN112579666B (en) | 2024-07-30 |
Family
ID=75136179
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011476684.4A Active CN112579666B (en) | 2020-12-15 | 2020-12-15 | Intelligent question-answering system and method and related equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112579666B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113918686A (en) * | 2021-08-30 | 2022-01-11 | 杭州摸象大数据科技有限公司 | Intelligent question-answering model construction method and device, computer equipment and storage medium |
CN113780561B (en) * | 2021-09-07 | 2024-07-30 | 国网北京市电力公司 | Construction method and device of power grid regulation operation knowledge base |
CN116069911A (en) * | 2023-01-04 | 2023-05-05 | 阿里巴巴(中国)有限公司 | Intelligent question-answering method, device, equipment and storage medium |
CN117609440B (en) * | 2023-10-27 | 2024-07-23 | 中国司法大数据研究院有限公司 | Document-level intelligent question-answering implementation method for referee document |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104216913A (en) * | 2013-06-04 | 2014-12-17 | Sap欧洲公司 | Problem answering frame |
CN104598445A (en) * | 2013-11-01 | 2015-05-06 | 腾讯科技(深圳)有限公司 | Automatic question-answering system and method |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2009035871A1 (en) * | 2007-09-10 | 2009-03-19 | Powerset, Inc. | Browsing knowledge on the basis of semantic relations |
JP6414956B2 (en) * | 2014-08-21 | 2018-10-31 | 国立研究開発法人情報通信研究機構 | Question generating device and computer program |
CN106844368B (en) * | 2015-12-03 | 2020-06-16 | 华为技术有限公司 | Method for man-machine conversation, neural network system and user equipment |
US20180341871A1 (en) * | 2017-05-25 | 2018-11-29 | Accenture Global Solutions Limited | Utilizing deep learning with an information retrieval mechanism to provide question answering in restricted domains |
CN107766511A (en) * | 2017-10-23 | 2018-03-06 | 深圳市前海众兴电子商务有限公司 | Intelligent answer method, terminal and storage medium |
-
2020
- 2020-12-15 CN CN202011476684.4A patent/CN112579666B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104216913A (en) * | 2013-06-04 | 2014-12-17 | Sap欧洲公司 | Problem answering frame |
CN104598445A (en) * | 2013-11-01 | 2015-05-06 | 腾讯科技(深圳)有限公司 | Automatic question-answering system and method |
Also Published As
Publication number | Publication date |
---|---|
CN112579666A (en) | 2021-03-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112579666B (en) | Intelligent question-answering system and method and related equipment | |
CN111639171B (en) | Knowledge graph question-answering method and device | |
CN107729468B (en) | answer extraction method and system based on deep learning | |
CN106570708B (en) | Management method and system of intelligent customer service knowledge base | |
CN111783394B (en) | Training method of event extraction model, event extraction method, system and equipment | |
CN117033608A (en) | Knowledge graph generation type question-answering method and system based on large language model | |
CN110781276A (en) | Text extraction method, device, equipment and storage medium | |
CN113377936B (en) | Intelligent question and answer method, device and equipment | |
CN109271506A (en) | A kind of construction method of the field of power communication knowledge mapping question answering system based on deep learning | |
CN113505586A (en) | Seat-assisted question-answering method and system integrating semantic classification and knowledge graph | |
CN113821605B (en) | Event extraction method | |
CN111858896B (en) | Knowledge base question-answering method based on deep learning | |
CN110287298A (en) | A kind of automatic question answering answer selection method based on question sentence theme | |
CN115599899B (en) | Intelligent question-answering method, system, equipment and medium based on aircraft knowledge graph | |
CN112256845A (en) | Intention recognition method, device, electronic equipment and computer readable storage medium | |
CN112131876A (en) | Method and system for determining standard problem based on similarity | |
CN112397201B (en) | Intelligent inquiry system-oriented repeated sentence generation optimization method | |
CN111339777A (en) | Medical related intention identification method and system based on neural network | |
CN111859934A (en) | Chinese sentence metaphor recognition system | |
CN118193701A (en) | Knowledge tracking and knowledge graph based personalized intelligent answering method and device | |
CN117332789A (en) | Semantic analysis method and system for dialogue scene | |
CN113220854B (en) | Intelligent dialogue method and device for machine reading and understanding | |
CN117828024A (en) | Plug-in retrieval method, device, storage medium and equipment | |
CN117131383A (en) | Method for improving search precision drainage performance of double-tower model | |
CN114417880B (en) | Interactive intelligent question-answering method based on power grid practical training question-answering knowledge base |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |