CN114647717A - Intelligent question and answer method and device - Google Patents

Intelligent question and answer method and device Download PDF

Info

Publication number
CN114647717A
CN114647717A CN202011503097.XA CN202011503097A CN114647717A CN 114647717 A CN114647717 A CN 114647717A CN 202011503097 A CN202011503097 A CN 202011503097A CN 114647717 A CN114647717 A CN 114647717A
Authority
CN
China
Prior art keywords
document
processed
document set
question
documents
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011503097.XA
Other languages
Chinese (zh)
Inventor
白金国
李长亮
李小龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kingsoft Digital Entertainment Co Ltd
Original Assignee
Beijing Kingsoft Digital Entertainment Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kingsoft Digital Entertainment Co Ltd filed Critical Beijing Kingsoft Digital Entertainment Co Ltd
Priority to CN202011503097.XA priority Critical patent/CN114647717A/en
Publication of CN114647717A publication Critical patent/CN114647717A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides an intelligent question-answering method and an intelligent question-answering device, wherein the intelligent question-answering method comprises the following steps: acquiring a problem to be processed and a preset document set; determining a first document set in the document set according to the to-be-processed problem and a grammar retrieval strategy; determining a second document set in the document set according to the problem to be processed and the semantic retrieval strategy; determining a target document set according to the first document set and the second document set; according to the to-be-processed question and the target document set, the target answer corresponding to the to-be-processed question is determined, irrelevant documents can be effectively filtered through the method, the subsequent influence of the irrelevant documents on a question-answer model is reduced, the answer of the to-be-processed question is determined from the finally determined target document set, and the intelligent question-answer performance is improved.

Description

Intelligent question and answer method and device
Technical Field
The present application relates to the field of computer technologies, and in particular, to an intelligent question answering method and apparatus, a computing device, and a computer-readable storage medium.
Background
Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence, and it is studying various theories and methods that enable efficient communication between humans and computers using Natural Language. The application scenario of natural language processing is, in a large aspect, intelligent processing of language words, including reading comprehension, question and answer conversation, writing, translation and the like.
The open field question-answering system is an important branch of NLP, and the current open field question-answering method is to input questions and related documents into a question-answering model for processing, and obtain answers to the questions in the documents. The current question-answering methods in the open field all follow the mode of information retrieval-question-answering models, adopt a grammar matching method to recall documents related to questions from massive documents, and the document retrieval methods only consider grammar matching, namely only retrieve from keywords of the questions, so that the questions and the documents are difficult to retrieve when little grammar is covered, which can seriously affect the performance of subsequent question-answering models, and the post-processing method of the documents obtained by the existing retrieval strategy can still improve the effect of the final question-answering models.
Therefore, a more efficient method is needed to solve the above problems and improve the effect of intelligent question answering.
Disclosure of Invention
In view of this, embodiments of the present application provide an intelligent question answering method and apparatus, a computing device, and a computer readable storage medium, so as to solve technical defects in the prior art.
According to a first aspect of the embodiments of the present application, there is provided an intelligent question answering method, including:
acquiring a problem to be processed and a preset document set;
determining a first document set in the document set according to the to-be-processed problem and a grammar retrieval strategy;
determining a second document set in the document set according to the problem to be processed and the semantic retrieval strategy;
determining a target document set according to the first document set and the second document set;
and determining a target answer corresponding to the to-be-processed question according to the to-be-processed question and the target document set.
According to a second aspect of embodiments of the present application, there is provided an intelligent question answering apparatus, including:
the acquisition module is configured to acquire a problem to be processed and a preset document set;
the grammar retrieval module is configured to determine a first document set in the document sets according to the to-be-processed question and a grammar retrieval strategy;
a semantic retrieval module configured to determine a second document set from the document sets according to the to-be-processed question and a semantic retrieval policy;
a determination module configured to determine a target set of documents from the first set of documents and the second set of documents;
and the question-answering module is configured to determine a target answer corresponding to the to-be-processed question according to the to-be-processed question and the target document set.
According to a third aspect of embodiments of the present application, there is provided a computing device comprising a memory, a processor, and computer instructions stored on the memory and executable on the processor, the processor implementing the steps of the intelligent question-answering method when executing the instructions.
According to a fourth aspect of embodiments of the present application, there is provided a computer-readable storage medium storing computer instructions which, when executed by a processor, implement the steps of the intelligent question-answering method.
According to a fifth aspect of the embodiments of the present application, there is provided a chip storing computer instructions, which when executed by the chip, implement the steps of the intelligent question-answering method.
The intelligent question answering method provided by the embodiment of the application comprises the steps of obtaining a problem to be processed and a preset document set; determining a first document set in the document set according to the to-be-processed problem and a grammar retrieval strategy; determining a second document set in the document set according to the problem to be processed and the semantic retrieval strategy; determining a target document set according to the first document set and the second document set; according to the to-be-processed question and the target document set, the target answer corresponding to the to-be-processed question is determined, and when the document is retrieved, the intelligent question-answering method provided by the application uses multiple information dimension retrieval, not only the grammatical features of the to-be-processed question and the document are considered, but also the semantic features of the to-be-processed question and the document are considered, so that the document recall rate is improved, irrelevant documents are effectively filtered, and the high relevance between the document participating in question-answering and the to-be-processed question is ensured.
Secondly, after the preliminary target document set is determined, documents with low relevance to the questions to be processed can be further filtered according to a similarity matching strategy between the questions and the documents, the number of the documents participating in question answering is reduced, further, the subsequent influence on the output result of the question answering model is reduced, and the intelligent question answering performance is improved.
Drawings
FIG. 1 is a block diagram of a computing device provided by an embodiment of the present application;
FIG. 2 is a flow chart of an intelligent question answering method provided by the embodiment of the application;
fig. 3 is a schematic structural diagram of an intelligent question answering method provided in an embodiment of the present application;
FIG. 4 is a flow chart of a method for intelligent question answering according to another embodiment of the present application;
fig. 5 is a schematic structural diagram of an intelligent question answering device according to an embodiment of the present application.
Detailed Description
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is capable of implementation in many different ways than those herein set forth and of similar import by those skilled in the art without departing from the spirit and scope of this application, and thus this application is not limited to the specific implementations disclosed below.
The terminology used in the one or more embodiments of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the one or more embodiments of the present application. As used in one or more embodiments of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present application refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It will be understood that, although the terms first, second, etc. may be used herein in one or more embodiments of the present application to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first aspect may be termed a second aspect, and, similarly, a second aspect may be termed a first aspect, without departing from the scope of one or more embodiments of the present application. The word "if," as used herein, may be interpreted as "responsive to a determination," depending on the context.
First, the noun terms to which one or more embodiments of the present invention relate are explained.
And (3) information retrieval: relevant information is retrieved based on a given question.
A question-answer model: according to given questions and articles, corresponding answers are given.
Bm 25: bm25 is an algorithm for evaluating the relevance between search terms and documents, and is an algorithm proposed based on a probabilistic search model.
Pre-training the model: at present, a mainstream deep learning model is used for unsupervised training on large-scale data and is applied to different fields.
Semantic-feature retrieval: vectorizing the document, and calculating the semantic similarity between the document and the question.
fine-retrieve: and calculating the semantic similarity of the text based on a pre-training model.
In the present application, an intelligent question answering method and apparatus, a computing device and a computer readable storage medium are provided, which are described in detail in the following embodiments one by one.
FIG. 1 shows a block diagram of a computing device 100 according to an embodiment of the application. The components of the computing device 100 include, but are not limited to, memory 110 and processor 120. The processor 120 is coupled to the memory 110 via a bus 130 and a database 150 is used to store data.
Computing device 100 also includes access device 140, access device 140 enabling computing device 100 to communicate via one or more networks 160. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. Access device 140 may include one or more of any type of network interface (e.g., a Network Interface Card (NIC)) whether wired or wireless, such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.
In one embodiment of the present application, the above-mentioned components of the computing device 100 and other components not shown in fig. 1 may also be connected to each other, for example, by a bus. It should be understood that the block diagram of the computing device architecture shown in FIG. 1 is for purposes of example only and is not limiting as to the scope of the present application. Other components may be added or replaced as desired by those skilled in the art.
Computing device 100 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), a mobile phone (e.g., smartphone), a wearable computing device (e.g., smartwatch, smartglasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 100 may also be a mobile or stationary server.
Wherein, the processor 120 can execute the steps in the intelligent question answering method shown in fig. 2. Fig. 2 shows a flowchart of an intelligent question answering method according to an embodiment of the present application, including step 202 to step 210.
Step 202: and acquiring a problem to be processed and a preset document set.
The question-answering method in the open field needs to search answers corresponding to the problems to be processed in a large number of documents, and how to quickly and efficiently search answers corresponding to the problems to be processed in the large number of documents is a problem that needs to be solved urgently by technical staff.
Correspondingly, the questions to be processed are the questions to be answered, a large number of documents form a preset document set, and the document set can be a document set generated by technicians per se or a document set based on Wikipedia (Wikipedia) commonly used in the industry at present.
In one embodiment provided by the present application, a pending question q and a preset document set Ds (D) are obtained1,D2……Ds) The number of documents in the document set is not limited herein.
Step 204: and determining a first document set in the document sets according to the to-be-processed question and the grammar retrieval strategy.
After a problem to be processed is obtained, a document matched with the problem to be processed is retrieved from a document set according to a preset retrieval strategy, and then an answer to the problem to be processed is determined from the matched document.
The grammar retrieval strategy is specifically an algorithm based on the correlation between keywords and documents, and is an algorithm proposed by a probability retrieval model, such as a Bm25 algorithm, a Bm25 algorithm is an algorithm for evaluating the correlation between search words and documents, and the algorithm is mainly used for calculating the correlation between all words and documents in a problem and accumulating scores.
Specifically, determining a first document subset in the document set according to the to-be-processed question and the grammar retrieval policy includes:
performing word segmentation processing on the problem to be processed to obtain a word unit set corresponding to the problem to be processed;
calculating word unit grammar relevance scores of each word unit in the word unit set and each document in the document set;
determining a question grammar relevance score corresponding to each document and the to-be-processed question according to the word unit grammar relevance score corresponding to each document and each word unit in the word unit set;
and determining a first document set according to the question grammar correlation score corresponding to each document and the question to be processed.
In one embodiment provided by the present application, following the above example, the question to be processed is q, and the document set Ds (D)1,D2……Ds) In S documents D, to calculate the relevance score of a problem q to be processed and each document D, performing word segmentation processing on the problem q to be processed to obtain a word unit set, and calculating each word segmentation q for each document DiRelevance score to document D, and finally, qiAnd carrying out weighted summation relative to the relevance score of the document D so as to obtain the relevance score of the to-be-processed problem and the document D, wherein the general formula of the Bm25 algorithm is shown as the following formula 1:
Figure BDA0002844045990000071
wherein Q represents a problem to be processed, QiRepresenting word units after word segmentation processing of the problem to be processed, d representing a document, WiRepresenting participles qiWeight of (c), R (q)iAnd d) represents a participle qiA relevance score to document d.
After the grammar retrieval strategy is performed on the to-be-processed problem and each document, the relevance score corresponding to the to-be-processed problem and each document is obtained, and then a first document set is determined according to the relevance score corresponding to each document and the to-be-processed problem, specifically, the grammar retrieval strategy comprises the following steps: and sequencing according to the grammar relevance score corresponding to each document and the problem to be processed, and determining that a preset number of documents form a first document set.
And sequencing each document according to the corresponding relevance score from high to low, determining the documents with the corresponding number from the sequencing according to the number of the documents in a preset first document subset as a first document subset, and if the number of the documents in the preset first document subset is 20, sequencing the documents in the document set according to the relevance score, and then taking the articles ranked at the top 20 as the first document set.
Step 206: and determining a second document set in the document set according to the to-be-processed problem and the semantic retrieval strategy.
The preset retrieval strategy also comprises a semantic retrieval strategy, and the semantic retrieval strategy is specifically used for calculating the semantic similarity between the problem to be processed and each document. And determining a second document set from the document sets according to the semantic similarity.
Specifically, determining a second document set in the document set according to the to-be-processed question and the semantic retrieval policy includes:
embedding the problem to be processed to obtain a problem vector to be processed corresponding to the problem to be processed;
embedding each document in the document set to obtain a document vector corresponding to each document;
respectively calculating similarity scores of the problem vector to be processed and the document vector corresponding to each document;
and determining a second document set according to the corresponding similarity score of each document.
In practical application, the problem to be processed may be embedded to obtain a problem vector to be processed corresponding to the problem to be processed, each document in the document set is embedded to obtain a document vector corresponding to each document, so that a problem vector to be processed and a document vector set are obtained, and then the vector similarity between the problem vector to be processed and each document vector is calculated respectively.
After calculating the similarity score corresponding to each document and the problem to be processed, determining a second document set according to the similarity score corresponding to each document in the document set, specifically, the method includes: and sequencing the document set according to the similarity degree value corresponding to each document, and determining that a preset number of documents form a second document set.
After the similarity score between the problem vector to be processed and each document vector is obtained through calculation, the documents in the document set are ranked from high to low according to the similarity score between the documents and the problem to be processed, a second document set is determined from the document set according to the preset document number, for example, the preset document number is 20, the documents in the document set are ranked according to the similarity score, and then the documents with the top rank of 20 are used as the second document set.
Step 208: and determining a target document set according to the first document set and the second document set.
After the first document set and the second document set are obtained, the target document set can be determined according to the first document set and the second document set, and there are many methods for determining the target document set, for example, a union of the first document set and the second document set, an intersection of the first document set and the second document set, all documents in the first document set and the second document set constitute the target document, and the like, which is not limited in this application.
Documents in the first document set and the second document set may be duplicated, i.e., documents in the first document set may appear in the second document set, e.g., documents in the first document set (D)1,D2,D3) In the second subset of documents is (D)2,D3,D4). In this regard, if the first document and the second document are merged, the target document set is (D)1,D2,D3,D4) (ii) a If the first document and the second document take intersection, the target document set is (D)2,D3) (ii) a If all the documents in the first document set and the second document set form the target document, the target document set is (D)1,D2,D3,D2,D3,D4)。
Preferably, in order to improve the efficiency of intelligent question answering and prevent a situation that a document is deleted by mistake, in an embodiment provided by the present application, determining a target document set according to the first document set and the second document set includes: merging the first document set and the second document set to determine a target document set
In practical application, in order to improve processing efficiency, it is preferable to determine the target document set by performing union processing on the documents in the first document set and the second document set, and during union processing, the deduplication operation can be synchronously performed, such as the documents (D) in the first document set1,D2,D3) In the second document set are (D)2,D3,D4) After the first document set and the second document set are subjected to union set processing, a target document set is obtained and is (D)1,D2,D3,D4)。
Optionally, performing union processing on the first document set and the second document set to determine a target document set, including:
merging the first document set and the second document set to obtain an initial target document set;
and determining a target document set in the initial target document set according to the to-be-processed problem and the similarity strategy.
After the union processing is performed according to the first document set and the second document set, the initial target document set is obtained, and because the number of documents in the first document set and the second document set may still be more, and more time is needed for determining the final answer to the question, further refined processing can be performed after the union processing is performed on the first document set and the second document set,
and after the first document set and the second document set are subjected to union processing, the obtained document sets are initial target document sets, and then the target document sets are determined in the initial target document sets according to the problems to be processed and the similarity strategy.
The similarity strategy specifically is to match the similarity between the problem to be processed and each document in the initial target document set, further determine the target document set in the initial target document set according to the similarity score, and a plurality of methods are available for determining the target document set in the initial target document set according to the problem to be processed and the similarity strategy, so that the similarity between the problem to be processed and each document is directly calculated, the problem to be processed and each document can be input into a pre-trained similarity matching model for processing, and the similarity between the problem to be processed and each document is determined according to the similarity matching model.
In a specific embodiment provided by the present application, determining a target document set in the initial target document set according to the to-be-processed question and the similarity policy includes:
splicing the problem to be processed with each document in the initial target document set to obtain a problem spliced document set;
inputting each problem splicing document in the problem splicing document set into a similarity matching model, and obtaining a similarity matching score between each document output by the similarity matching model and the problem to be processed;
and forming the documents with the similarity matching scores larger than a first preset threshold value into a target document set.
Splicing the questions to be processed and the documents to obtain a question splicing document set, wherein if the questions to be processed are Q, the target documents are (D)1,D2,D3,D4) The problem-stitched document set is (Q-D)1,Q-D2,Q-D3,Q-D4) There are various specific splicing types, such as document before and problem after processing; or the pending issue is before and the document is after. The concrete form of splicing can be represented by special symbols, such as "#",&"," @ ", etc., are not limited in this application and will be subject to practical application.
The similarity matching model is a pre-trained deep neural network model for calculating the similarity of the input question-document pairs, and can be obtained by further training based on the Bert model.
The method comprises the steps of sequentially inputting problem-document pairs in a problem splicing document set into a pre-trained similarity matching model for processing, embedding the problems and documents by using the similarity matching model for each problem-document pair to obtain problem-document coding vectors, calculating the similarity between the problem coding vectors and the document coding vectors in the problem-document coding vectors, finally outputting a similarity matching score corresponding to each problem-document pair, and determining a target document in an initial target document set according to the similarity matching score to form a target document set.
In practical application, a first preset threshold may be set, documents with similarity matching scores greater than the first preset threshold are used as target documents, or documents in an initial target document set may be sorted according to the similarity matching score corresponding to each document, and a preset number of documents are selected as target documents. And after the target document is selected, forming a target document set according to the target document.
The method for generating the target document set has low processing speed in practical application, but the relevance between the documents in the obtained target document set and the problems to be processed is high, so that the quality of the documents which subsequently participate in the intelligent question answering is screened, and the method is suitable for application scenes with better effect on final answers of the intelligent question answering.
In another specific embodiment provided by the present application, determining a target document set in the initial target document set according to the to-be-processed question and the similarity policy includes:
coding the problem to be processed to obtain a problem to be processed coding vector;
coding each document in the initial target document set to obtain a document coding vector corresponding to each document;
respectively calculating the vector similarity scores of the to-be-processed problem coding vector and the document coding vector corresponding to each document;
and forming the documents with the vector similarity scores larger than a second preset threshold value into a target document set.
Coding the problem to be processed to obtain a problem to be processed coding vector corresponding to the problem to be processed, coding each document in an initial target document set to obtain a document coding vector corresponding to each document, calculating a vector similarity score corresponding to the problem to be processed coding vector and each document coding vector, determining a target document in the initial target document set according to the vector similarity score, and combining the target documents into a target document set.
Specifically, the problem to be processed and the document may be respectively input into the BERT model for processing, taking the BERT model as an example that the BERT model includes 12 coding layers, the problem to be processed and the document are respectively coded in the first 10 coding layers of the BERT model to obtain a problem to be processed coding vector and a document coding vector, the problem to be processed coding vector and the document coding vector are spliced and input into the last 2 coding layers of the BERT model for processing, and the similarity score of the problem to be processed and the document is calculated.
In practical application, a second preset threshold value may be set, documents with vector similarity scores larger than the second preset threshold value are used as target documents, documents in the initial target document set may also be sorted according to the vector similarity score corresponding to each document, and a preset number of documents are selected as the target documents. And after the target document is selected, forming a target document set according to the target document.
The method for generating the target document set has higher processing speed than the first method in practical application, but the quality of the obtained documents participating in the intelligent question answering in the target document set is poorer than that of the first method, and the method is suitable for application scenes needing to quickly respond to the problems to be processed.
Step 210: and determining a target answer corresponding to the to-be-processed question according to the to-be-processed question and the target document set.
After the processing, most of the interference documents can be filtered, the number of the documents in the target document set is far less than that of the documents in the initial preset document set, the relevance between the documents in the target document set and the to-be-processed problem is higher after the processing, and the efficiency is higher when the target answer corresponding to the to-be-processed problem is determined according to the to-be-processed problem and the target document set.
Optionally, determining a target answer corresponding to the to-be-processed question according to the to-be-processed question and the target document set includes:
and inputting the questions to be processed and the documents in the target document set into a question-answering model for processing to obtain target answers corresponding to the questions to be processed and output by the question-answering model.
The question-answer model is a pre-trained neural network model, can be a BERT-based question-answer model, a Transformer-based question-answer model, a CNN-based question-answer model and a DrQA question-answer model, is not limited in the specific form of the question-answer model, and can respond to a to-be-processed question and a target document as input to obtain a target answer corresponding to the to-be-processed question.
Specifically, the questions to be processed and the target document are input into a question-and-answer model, the question-and-answer model determines target paragraphs corresponding to the questions to be processed in the target document according to the questions to be processed, and the specific method for determining the target paragraphs may be to calculate the similarity between each paragraph and the questions to be processed, or may be to use TF-IDF weights of word units according to the questions to be processed as a screening rule. After the target paragraph is determined, the answer of the to-be-processed question in the current target document is determined as the candidate answer in the target paragraph, the candidate answer is scored, after the to-be-processed question and each piece of target document are processed, the candidate answer and the candidate answer score of the to-be-processed question in each piece of target document are obtained, and the candidate answer with the highest candidate answer score is selected as the target answer of the to-be-processed question.
According to the intelligent question-answering method, when the documents are retrieved, multiple information dimension retrieval is used, not only the grammatical features of the to-be-processed questions and the documents are considered, but also the semantic features of the to-be-processed questions and the documents are considered, a primarily screened target document set is formed, the document recall rate is improved, irrelevant documents can be effectively filtered, and the high relevance between the documents participating in question-answering and the to-be-processed questions is guaranteed.
Secondly, after a document set is obtained through preliminary screening, similarity matching can be carried out according to the to-be-processed questions and the documents in the initial document set, the similarity between the to-be-processed questions and the documents is calculated, the documents with low similarity are further filtered, the number of the documents participating in question answering is reduced, the quality of the documents participating in question answering is improved, further the subsequent influence on the output result of the question answering model is reduced, and the performance of intelligent question answering is improved.
The following further explains the intelligent question-answering method provided by the present application with reference to fig. 3 and 4, and fig. 3 shows a schematic structural diagram of the intelligent question-answering method provided by the embodiment of the present application, as shown in fig. 3, a question to be processed and a preset document set determine a first document set according to a grammar retrieval policy, a second document set according to a semantic retrieval policy, an initial target document set is determined by the first document set and the second document set, after the initial target document set is obtained, further fine-ranking is performed on documents in the initial target document set according to the question to be processed to obtain a target document set, and then the target document set and the question to be processed are input into a question-answering model to be processed to obtain a final target answer.
Fig. 4 shows an intelligent question-answering method according to an embodiment of the present application, which includes steps 402 to 416.
Step 402: and acquiring a problem to be processed and a preset document set.
In the embodiment provided by the application, a problem Q to be processed and a preset document set { T } are obtained1、T2、……T50And 50 documents in the preset document set.
Step 404: and determining a first document set in the document sets according to the to-be-processed question and the grammar retrieval strategy.
In the embodiment provided by the application, the calculation is carried out according to the word frequency through the BM25 algorithmThe grammatical relevance score of the problem Q to be processed and each document in the document set is ranked according to the grammatical relevance score, and the document with the top rank of 20 is determined as a first document set { T }2、T4、……T47}。
Step 406: and determining a second document set in the document set according to the to-be-processed problem and the semantic retrieval strategy.
In the embodiment provided by the application, the problem Q to be processed is vectorized to obtain the problem vector M to be processedQGathering documents { T }1、T2、……T50Vectorizing each document to obtain document vector M of each document1、M2、……M50And fourthly, calculating a vector M of the problem to be processed by a cosine similarity calculation methodQAnd the similarity score of each document vector, sorting the documents, and selecting the document with the top 20 as a second document set { T3、T5、……T44}。
Step 408: and determining an initial target document set according to the first document set and the second document set.
In the embodiment provided by the present application, a union of a first document set and a second document set is taken as an initial target document set, wherein 10 documents in the first document set and the second document set are repeated, so that there are 30 documents in the initial target document set, and for convenience of representation, the documents in the first document set are represented as { P1、P2、……P30}。
Step 410: and splicing the problem to be processed with each document in the initial target document set to obtain a problem spliced document set.
In the embodiment provided by the application, the problem Q to be processed is respectively associated with the initial target document set { P }1、P2、……P30Splicing each document in the problem splicing document set (QP) is obtained1、QP2、……QP30}。
Step 412: and inputting each problem splicing document in the problem splicing document set into a similarity matching model, and obtaining a similarity matching score between each document output by the similarity matching model and the problem to be processed.
In the embodiment provided by the application, each question is spliced into a document QP1、QP2、……QP30And inputting the similarity matching scores into a pre-trained similarity matching model to obtain the similarity matching score of each document and the problem Q to be processed.
Step 414: and forming the documents with the similarity matching scores larger than a first preset threshold value into a target document set.
In an embodiment provided by the present application, an initial set of target documents { P } is selected1、P2、……P30Taking the documents with similarity matching scores larger than a first preset threshold value as a target document set, wherein the target document set contains 10 documents which are respectively { P }1、P3、P7、P10、P15、P16、P19、P24、P27、P29}。
Step 416: and inputting the questions to be processed and the documents in the target document set into a question-answering model for processing to obtain target answers corresponding to the questions to be processed and output by the question-answering model.
In the embodiment provided by the application, the problem Q to be processed and the target document set { P }1、P3、P7、P10、P15、P16、P19、P24、P27、P29Inputting the result into a question-answer model trained in advance for processing, and obtaining a target answer A corresponding to the question to be processed and output by the document model.
The intelligent question answering method provided by the embodiment of the application adopts two stages of document retrieval, in the first stage of document retrieval, the document set is retrieved respectively through a semantic related retrieval strategy and a grammar related retrieval strategy, two dimensions of the semantic and the grammar are considered, the recall rate of the document is improved, an initial target document set is obtained, in the second stage of document retrieval, a similarity matching strategy between the problems and the document is adopted to finely investigate the initial target document set, irrelevant documents in the initial target document set are further filtered, the target document set is obtained, then the problem to be processed is solved according to the documents in the target document set, the problems of low recall rate and low accuracy of the documents related to the problem to be processed in the retrieval stage of the documents of the intelligent question answering method are solved, and the recall rate and accuracy of the documents related to the problem to be processed are improved, the performance of the intelligent question answering method is improved.
Corresponding to the above method embodiments, the present application also provides an intelligent question answering device embodiment, and fig. 5 shows a schematic structural diagram of the intelligent question answering device according to an embodiment of the present application. As shown in fig. 5, the apparatus includes:
an obtaining module 502 configured to obtain a problem to be processed and a preset document set;
a grammar retrieval module 504 configured to determine a first document set among the document sets according to the to-be-processed question and a grammar retrieval policy;
a semantic retrieval module 506 configured to determine a second document set among the document sets according to the to-be-processed question and a semantic retrieval policy;
a determining module 508 configured to determine a target set of documents from the first set of documents and the second set of documents;
a question-answering module 510 configured to determine a target answer corresponding to the to-be-processed question according to the to-be-processed question and the target document set.
Optionally, the syntax retrieving module 504 is further configured to:
performing word segmentation processing on the problem to be processed to obtain a word unit set corresponding to the problem to be processed;
calculating word unit grammar relevance scores of each word unit in the word unit set and each document in the document set;
determining a question grammar relevance score corresponding to each document and the to-be-processed question according to the word unit grammar relevance score corresponding to each word unit and each document in the word unit set;
and determining a first document set according to the question grammar correlation score corresponding to each document and the question to be processed.
Optionally, the grammar retrieving module 504 is further configured to:
and sequencing according to the grammar relevance score corresponding to each document and the problem to be processed, and determining that a preset number of documents form a first document set.
Optionally, the semantic retrieval module 506 is further configured to:
embedding the problem to be processed to obtain a problem vector to be processed corresponding to the problem to be processed;
embedding each document in the document set to obtain a document vector corresponding to each document;
respectively calculating similarity scores of the problem vector to be processed and the document vector corresponding to each document;
and determining a second document set according to the corresponding similarity score of each document.
Optionally, the semantic retrieval module 506 is further configured to:
and sequencing the document set according to the similarity degree value corresponding to each document, and determining that a preset number of documents form a second document set.
Optionally, the determining module 508 is further configured to:
and performing union processing on the first document set and the second document set to determine a target document set.
Optionally, the determining module 508 is further configured to:
merging the first document set and the second document set to obtain an initial target document set;
and determining a target document set in the initial target document set according to the to-be-processed problem and the similarity strategy.
Optionally, the determining module 508 is further configured to:
splicing the problem to be processed with each document in the initial target document set to obtain a problem spliced document set;
inputting each problem splicing document in the problem splicing document set into a similarity matching model, and obtaining a similarity matching score between each document output by the similarity matching model and the problem to be processed;
and forming the documents with the similarity matching scores larger than a first preset threshold value into a target document set.
Optionally, the determining module 508 is further configured to:
coding the problem to be processed to obtain a problem to be processed coding vector;
coding each document in the initial target document set to obtain a document coding vector corresponding to each document;
respectively calculating the vector similarity scores of the to-be-processed problem coding vector and the document coding vector corresponding to each document;
and forming the documents with the vector similarity scores larger than a second preset threshold value into a target document set.
Optionally, the question-answering module 510 is further configured to:
and inputting the questions to be processed and the documents in the target document set into a question-answering model for processing to obtain target answers corresponding to the questions to be processed and output by the question-answering model.
The intelligent question answering device provided by the embodiment of the application adopts two stages of document retrieval, in the first stage of document retrieval, the document set is retrieved respectively through a semantic related retrieval strategy and a grammar related retrieval strategy, two dimensions of the semantic and the grammar are considered, the recall rate of the document is improved, an initial target document set is obtained, in the second stage of document retrieval, a similarity matching strategy between the problems and the document is adopted to finely investigate the initial target document set, irrelevant documents in the initial target document set are further filtered, the target document set is obtained, then the problem to be processed is solved according to the documents in the target document set, the problems of low recall rate and low accuracy of the documents related to the problem to be processed in the document retrieval stage of the intelligent question answering method are solved, and the recall rate and accuracy of the documents related to the problem to be processed are improved, the performance of the intelligent question answering method is improved.
The above is an illustrative scheme of an intelligent question answering device of this embodiment. It should be noted that the technical solution of the intelligent question answering device and the technical solution of the intelligent question answering method belong to the same concept, and details that are not described in detail in the technical solution of the intelligent question answering device can be referred to the description of the technical solution of the intelligent question answering method.
It should be noted that the components in the device claims should be understood as functional blocks which are necessary to implement the steps of the program flow or the steps of the method, and each functional block is not actually defined by functional division or separation. The device claims defined by such a set of functional modules are to be understood as a functional module framework for implementing the solution mainly by means of a computer program as described in the specification, and not as a physical device for implementing the solution mainly by means of hardware.
An embodiment of the present application further provides a computing device, which includes a memory, a processor, and computer instructions stored in the memory and executable on the processor, wherein the processor executes the instructions to implement the steps of the intelligent question answering method.
The above is an illustrative scheme of a computing device of the present embodiment. It should be noted that the technical solution of the computing device and the technical solution of the intelligent question and answer method described above belong to the same concept, and details that are not described in detail in the technical solution of the computing device can be referred to the description of the technical solution of the intelligent question and answer method described above.
An embodiment of the present application further provides a computer readable storage medium, which stores computer instructions, and the instructions, when executed by a processor, implement the steps of the intelligent question answering method as described above.
The above is an illustrative scheme of a computer-readable storage medium of the present embodiment. It should be noted that the technical solution of the storage medium and the technical solution of the intelligent question and answer method belong to the same concept, and details that are not described in detail in the technical solution of the storage medium can be referred to the description of the technical solution of the intelligent question and answer method.
The embodiment of the application discloses a chip, which stores computer instructions, and the instructions are executed by a processor to realize the steps of the intelligent question answering method.
The foregoing description of specific embodiments of the present application has been presented. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The computer instructions comprise computer program code which may be in source code form, object code form, an executable file or some intermediate form, or the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
It should be noted that, for the sake of simplicity, the above-mentioned method embodiments are described as a series of acts or combinations, but those skilled in the art should understand that the present application is not limited by the described order of acts, as some steps may be performed in other orders or simultaneously according to the present application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
The preferred embodiments of the present application disclosed above are intended only to aid in the explanation of the application. Alternative embodiments are not exhaustive and do not limit the invention to the precise embodiments described. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the application and its practical applications, to thereby enable others skilled in the art to best understand and utilize the application. The application is limited only by the claims and their full scope and equivalents.

Claims (13)

1. An intelligent question answering method is characterized by comprising the following steps:
acquiring a problem to be processed and a preset document set;
determining a first document set in the document set according to the to-be-processed problem and a grammar retrieval strategy;
determining a second document set in the document set according to the problem to be processed and the semantic retrieval strategy;
determining a target document set according to the first document set and the second document set;
and determining a target answer corresponding to the to-be-processed question according to the to-be-processed question and the target document set.
2. The intelligent question-answering method according to claim 1, wherein determining a first set of documents among the set of documents according to the to-be-processed question and a grammar retrieval policy comprises:
performing word segmentation processing on the problem to be processed to obtain a word unit set corresponding to the problem to be processed;
calculating word unit grammar relevance scores of each word unit in the word unit set and each document in the document set;
determining a question grammar relevance score corresponding to each document and the to-be-processed question according to the word unit grammar relevance score corresponding to each word unit and each document in the word unit set;
and determining a first document set according to the question grammar correlation score corresponding to each document and the question to be processed.
3. The intelligent question-answering method according to claim 2, wherein determining a first set of documents according to question grammar relevance scores of each document corresponding to the question to be processed comprises:
and sequencing according to the grammar relevance score corresponding to each document and the problem to be processed, and determining that a preset number of documents form a first document set.
4. The intelligent question-answering method of claim 1, wherein determining a second set of documents among the set of documents according to the to-be-processed question and a semantic retrieval policy comprises:
embedding the problem to be processed to obtain a problem vector to be processed corresponding to the problem to be processed;
embedding each document in the document set to obtain a document vector corresponding to each document;
respectively calculating similarity scores of the problem vector to be processed and the document vector corresponding to each document;
and determining a second document set according to the corresponding similarity score of each document.
5. The intelligent question-answering method of claim 4, wherein determining the second set of documents according to the similarity score corresponding to each document comprises:
and sequencing the document set according to the similarity degree value corresponding to each document, and determining that a preset number of documents form a second document set.
6. The intelligent question-answering method according to claim 1, wherein determining a target document set from the first document set and the second document set comprises:
and performing union processing on the first document set and the second document set to determine a target document set.
7. The intelligent question-answering method according to claim 6, wherein merging the first document set and the second document set to determine a target document set comprises:
merging the first document set and the second document set to obtain an initial target document set;
and determining a target document set in the initial target document set according to the to-be-processed problem and the similarity strategy.
8. The intelligent question-answering method according to claim 7, wherein determining a target document set among the initial target document set according to the to-be-processed question and a similarity policy comprises:
splicing the problem to be processed with each document in the initial target document set to obtain a problem spliced document set;
inputting each problem splicing document in the problem splicing document set into a similarity matching model, and obtaining a similarity matching score between each document output by the similarity matching model and the problem to be processed;
and forming the documents with the similarity matching scores larger than a first preset threshold value into a target document set.
9. The intelligent question-answering method according to claim 7, wherein determining a target document set among the initial target document set according to the to-be-processed question and a similarity policy comprises:
coding the problem to be processed to obtain a problem to be processed coding vector;
coding each document in the initial target document set to obtain a document coding vector corresponding to each document;
respectively calculating the vector similarity scores of the to-be-processed problem coding vector and the document coding vector corresponding to each document;
and forming the documents with the vector similarity scores larger than a second preset threshold value into a target document set.
10. The intelligent question answering method according to claim 1, wherein determining the target answer corresponding to the question to be processed according to the question to be processed and the target document set comprises:
and inputting the questions to be processed and the documents in the target document set into a question-answering model for processing to obtain target answers corresponding to the questions to be processed and output by the question-answering model.
11. An intelligent question answering device, comprising:
the acquisition module is configured to acquire a problem to be processed and a preset document set;
the grammar retrieval module is configured to determine a first document set in the document sets according to the to-be-processed question and a grammar retrieval strategy;
a semantic retrieval module configured to determine a second document set from the document sets according to the to-be-processed question and a semantic retrieval policy;
a determination module configured to determine a target set of documents from the first set of documents and the second set of documents;
and the question-answering module is configured to determine a target answer corresponding to the to-be-processed question according to the to-be-processed question and the target document set.
12. A computing device comprising a memory, a processor, and computer instructions stored on the memory and executable on the processor, wherein the processor implements the steps of the method of any one of claims 1-10 when executing the instructions.
13. A computer-readable storage medium storing computer instructions, which when executed by a processor, perform the steps of the method of any one of claims 1 to 10.
CN202011503097.XA 2020-12-17 2020-12-17 Intelligent question and answer method and device Pending CN114647717A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011503097.XA CN114647717A (en) 2020-12-17 2020-12-17 Intelligent question and answer method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011503097.XA CN114647717A (en) 2020-12-17 2020-12-17 Intelligent question and answer method and device

Publications (1)

Publication Number Publication Date
CN114647717A true CN114647717A (en) 2022-06-21

Family

ID=81989807

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011503097.XA Pending CN114647717A (en) 2020-12-17 2020-12-17 Intelligent question and answer method and device

Country Status (1)

Country Link
CN (1) CN114647717A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116069914A (en) * 2023-02-13 2023-05-05 北京百度网讯科技有限公司 Training data generation method, model training method and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116069914A (en) * 2023-02-13 2023-05-05 北京百度网讯科技有限公司 Training data generation method, model training method and device
CN116069914B (en) * 2023-02-13 2024-04-12 北京百度网讯科技有限公司 Training data generation method, model training method and device

Similar Documents

Publication Publication Date Title
CN110348535B (en) Visual question-answering model training method and device
CN110309839B (en) A kind of method and device of iamge description
CN111414461B (en) Intelligent question-answering method and system fusing knowledge base and user modeling
US20210125516A1 (en) Answer training device, answer training method, answer generation device, answer generation method, and program
CN113220832A (en) Text processing method and device
CN111524593A (en) Medical question-answering method and system based on context language model and knowledge embedding
CN112800203A (en) Question-answer matching method and system fusing text representation and knowledge representation
CN109740158A (en) Text semantic parsing method and device
CN114495129A (en) Character detection model pre-training method and device
CN113988079A (en) Low-data-oriented dynamic enhanced multi-hop text reading recognition processing method
CN114691864A (en) Text classification model training method and device and text classification method and device
CN114462385A (en) Text segmentation method and device
CN112307048A (en) Semantic matching model training method, matching device, equipment and storage medium
CN113159187A (en) Classification model training method and device, and target text determining method and device
CN117494815A (en) File-oriented credible large language model training and reasoning method and device
CN113961686A (en) Question-answer model training method and device, question-answer method and device
CN114003706A (en) Keyword combination generation model training method and device
WO2021176714A1 (en) Learning device, information processing device, learning method, information processing method, and program
CN114647717A (en) Intelligent question and answer method and device
CN116860943A (en) Multi-round dialogue method and system for dialogue style perception and theme guidance
CN114943236A (en) Keyword extraction method and device
CN114417863A (en) Word weight generation model training method and device and word weight generation method and device
KR20190060285A (en) Artificial intelligence based dialog system and response control method thereof
CN113792121A (en) Reading understanding model training method and device and reading understanding method and device
CN114003707A (en) Problem retrieval model training method and device and problem retrieval method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination