WO2021068352A1

WO2021068352A1 - Automatic construction method and apparatus for faq question-answer pair, and computer device and storage medium

Info

Publication number: WO2021068352A1
Application number: PCT/CN2019/118442
Authority: WO
Inventors: 杨凤鑫; 徐国强
Original assignee: 平安科技（深圳）有限公司
Priority date: 2019-10-12
Filing date: 2019-11-14
Publication date: 2021-04-15
Also published as: CN111046152B; CN111046152A

Abstract

Disclosed are an automatic construction method and apparatus for an FAQ question-answer pair, and a computer device and a storage medium. The method belongs to the technical field of artificial intelligence and natural language processing, and comprises: acquiring a document to be read; parsing the document to be read, and paragraphing a parsed document to obtain a paragraphed document as a target document; screening out, according to a question to be answered and a preset screening model, a paragraph, matching said question, from the target document to serve as a target paragraph; and according to the target paragraph and said question, generating an FAQ question-answer pair on the basis of a preset reading comprehension model.

Description

FAQ questions and answers on automatic construction methods, devices, computer equipment and storage media

This application requires the priority of a Chinese patent application filed with the Chinese Patent Office, the application number is CN201910969443.4, and the application name is "FAQ Questions and Answers on Automatic Construction Methods, Devices, Computer Equipment, and Storage Media" on October 12, 2019. The entire content is incorporated into this application by reference.

Technical field

This application relates to the technical fields of artificial intelligence and natural language processing, and in particular to a method, device, computer equipment, and storage medium for automatically constructing FAQ question and answer pairs.

Background technique

FAQ is the abbreviation of Frequently Asked Questions in English, and the Chinese means "Frequently Asked Questions", or more colloquially called "Frequently Asked Questions". FAQ is considered to be a common online customer service method. A good FAQ system should be able to answer at least 80% of users' general and frequently asked questions. This not only makes it convenient for users, but also greatly reduces the pressure on website staff, saves a lot of customer service costs, and increases customer satisfaction. Therefore, how to effectively implement the construction of the FAQ database is particularly important.

At present, there are mainly three methods for automatic FAQ construction in the field of question and answer: (1) Word segmentation is performed through the article to be read and the question to be answered, and the corresponding word string is obtained after the word segmentation is obtained, and the word string is input to the automatic reading comprehension model , You can output the answer corresponding to the question. (2) According to the similarity between the question asked by the user and the existing question record in the Q&A database, find the question matching the user’s question in the existing "question-answer" pair database, and return the corresponding answer To the user, complete the FAQ answer. (3) Using the established FAQ to manually enter the sentence pattern template corresponding to the standard question sentence. Match the user's question sentence with a sentence pattern, and then match the sentence pattern with the FAQ to match the FAQ. Although the above three methods can match successfully to a certain extent and realize the automatic construction of FAQ question and answer pairs, the matching accuracy of FAQ question and answer pairs is still relatively low.

Summary of the invention

The embodiments of the present application provide a method, device, computer equipment, and storage medium for automatically constructing FAQ question and answer pairs, aiming to solve the problem of low matching accuracy of existing FAQ question and answer pairs for automatic construction.

In the first aspect, an embodiment of the present application provides a method for automatically constructing FAQ question and answer pairs, which includes:

Obtain the document to be read; parse the document to be read and segment the parsed document to obtain the segmented document as the target document; according to the question to be answered and the preset screening model, from the target A paragraph that matches the question to be answered is selected from the document as a target paragraph; according to the target paragraph and the question to be answered, a FAQ question and answer pair is generated based on a preset reading comprehension model.

In the second aspect, an embodiment of the present application also provides a device for automatically constructing FAQ question and answer pairs, which includes:

An obtaining unit for obtaining the document to be read; a parsing and segmentation unit for analyzing the document to be read and segmenting the parsed document to obtain the segmented document as a target document; a filtering unit, According to the question to be answered and a preset screening model, a paragraph that matches the question to be answered is selected from the target document as a target paragraph; a generating unit is used to select a paragraph according to the target paragraph and the to-be-answered Questions, generate FAQ question and answer pairs based on the preset reading comprehension model.

In a third aspect, an embodiment of the present application also provides a computer device, which includes a memory and a processor, the memory stores a computer program, and the processor implements the above method when the computer program is executed.

In a fourth aspect, the embodiments of the present application also provide a computer-readable storage medium, the storage medium stores a computer program, and the computer program can implement the foregoing method when executed by a processor.

The embodiments of the present application provide a method, device, computer equipment, and storage medium for automatically constructing FAQ question and answer pairs. In the technical solution of the embodiment of this application, since the target paragraphs that match the question to be answered are selected first, and then FAQ question and answer pairs are generated based on the target paragraph and the question to be answered, there is no need to deal with non-target paragraphs, which reduces to a certain extent Interference information caused by non-target paragraphs when generating FAQ question and answer pairs can make the generated FAQ question and answer pairs more accurate.

Description of the drawings

In order to explain the technical solutions of the embodiments of the present application more clearly, the following will briefly introduce the drawings used in the description of the embodiments. Obviously, the drawings in the following description are some embodiments of the present application. Ordinary technicians can obtain other drawings based on these drawings without creative work.

FIG. 1 is a schematic diagram of a scenario of a method for automatically constructing a FAQ question and answer pair provided by an embodiment of the application;

FIG. 2 is a schematic flowchart of a method for automatically constructing FAQ question and answer pairs provided by an embodiment of the application;

FIG. 3 is a schematic diagram of a sub-flow of a method for automatically constructing a FAQ question and answer pair provided by an embodiment of the application;

FIG. 4 is a schematic diagram of a sub-flow of a method for automatically constructing a FAQ question and answer pair provided by an embodiment of the application;

FIG. 5 is a schematic diagram of a sub-flow of a method for automatically constructing a FAQ question and answer pair provided by an embodiment of the application;

FIG. 6 is a schematic flowchart of a method for automatically constructing FAQ question and answer pairs provided by another embodiment of the application;

FIG. 7 is a schematic block diagram of a device for automatically constructing FAQ question and answer pairs provided by an embodiment of the application;

FIG. 8 is a schematic block diagram of the analysis and segmentation unit of the FAQ question and answer automatic construction device provided by an embodiment of the application;

FIG. 9 is a schematic block diagram of the screening unit of the FAQ question and answer pair automatic construction device provided by an embodiment of the application; FIG.

FIG. 10 is a schematic block diagram of the generating unit of the FAQ question and answer pair automatic construction device provided by an embodiment of the application; FIG.

FIG. 11 is a schematic block diagram of a device for automatically constructing FAQ question and answer pairs provided by another embodiment of the application; and

FIG. 12 is a schematic block diagram of a computer device provided by an embodiment of the application.

Detailed ways

The technical solutions in the embodiments of the present application will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, rather than all of them. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.

It should be understood that when used in this specification and appended claims, the terms "including" and "including" indicate the existence of the described features, wholes, steps, operations, elements and/or components, but do not exclude one or The existence or addition of multiple other features, wholes, steps, operations, elements, components, and/or collections thereof.

It should also be understood that the terms used in the specification of this application are only for the purpose of describing specific embodiments and are not intended to limit the application. As used in the specification of this application and the appended claims, unless the context clearly indicates other circumstances, the singular forms "a", "an" and "the" are intended to include plural forms.

It should be further understood that the term "and/or" used in the specification and appended claims of this application refers to any combination and all possible combinations of one or more of the associated listed items, and includes these combinations .

As used in this specification and the appended claims, the term "if" can be interpreted as "when" or "once" or "in response to determination" or "in response to detection" depending on the context . Similarly, the phrase "if determined" or "if detected [described condition or event]" can be interpreted as meaning "once determined" or "in response to determination" or "once detected [described condition or event]" depending on the context ]" or "in response to detection of [condition or event described]".

Please refer to FIG. 1. FIG. 1 is a schematic diagram of a scene of a method for automatically constructing FAQ question and answer pairs provided by an embodiment of the present application. The method for automatically constructing FAQ question and answer pairs in the embodiment of the present application can be applied to a server. For example, the method for automatically constructing FAQ question and answer pairs can be realized by a software program configured on the server. The server communicates with the terminal, so that the server calls the document to be read uploaded by the user through the terminal and performs a series of processing according to the question to be answered and the document to be read to obtain the FAQ question and answer pair, so as to realize the automatic construction of the FAQ question and answer pair. The terminal can be a desktop computer, a laptop computer, a tablet computer, etc., and there is no specific restriction here. In addition, in FIG. 1, the number of the terminal and the user is one. It can be understood that in the actual application process, the number of the terminal and the user may be multiple, and FIG. 1 only serves as a schematic illustration.

Please refer to FIG. 2, which is a schematic flowchart of a method for automatically constructing FAQ question and answer pairs provided by an embodiment of the present application. As shown in Figure 2, the method includes the following steps S100-S130.

S100. Obtain a document to be read.

Specifically, for the server to realize the automatic construction of FAQ question and answer pairs, it first needs to obtain the document to be read, and then perform a series of processing based on the document to be read before generating the FAQ question and answer pair. In the embodiment of the present application, the user can upload the document to be read through the user terminal. Specifically, the user can upload the document to be read through the FAQ webpage terminal of the user terminal to send the document to be read to the server. Wherein, in the embodiment of the present application, the document to be read is a PDF document.

It should be noted that, in other embodiments, the document to be read may also be other types of documents, such as Word documents.

S110: Parse the document to be read and segment the parsed document to obtain the segmented document as a target document.

Specifically, after obtaining the document to be read, the server needs to parse the document to be read to obtain a document in a required format, and then segment the content in the document to finally obtain a document with a preset document structure.

Referring to FIG. 3, in an embodiment, for example, in an embodiment of the present application, the step S110 includes the following steps S111-S112.

S111. Analyze the document to be read using a stacked CRF model to obtain an XML document.

S112: Segment the XML document in a preset segmentation manner to obtain a document with a preset document structure as a target document.

In the embodiment of the present application, the cascaded CRF model is used to parse the document to be read to obtain the XML document. Among them, CRF is the abbreviation of Conditional Random Field, and its full name is Conditional Random Field in Chinese. The reason why the cascading CRF model is used in this embodiment is because the cascading CRF model has a relatively short processing time for parsing the document to be read and a better processing effect . XML is the abbreviation of Extensible Markup Language, and its full Chinese name is Extensible Markup Language. After the document to be read is parsed to obtain the XML document, the XML document needs to be segmented to obtain the segmented document as the target document. Specifically, the XML document can be segmented through a preset segmentation method. The segmentation method used includes multiple methods. For example, the selected segmentation method in this embodiment is the second-level heading. As a segment. In other embodiments, other segmentation methods such as first-level headings or article paragraphs can also be selected according to actual needs.

It should be noted that in other embodiments, other models may also be used to parse the document to be read, for example, a hidden Markov model, that is, a HMM (Hidden Markov Model) model, may be used.

S120: According to the question to be answered and a preset screening model, a paragraph matching the question to be answered is selected from the target document as a target paragraph.

Specifically, after the server parses the document to be read and segmented the parsed document to obtain the target document, it needs to filter the target document to obtain a paragraph matching the question to be answered as the target paragraph. In the embodiment of the present application, the question to be answered is a question stored in a question template in a preset database. Specifically, after the user uploads the document to be read on the FAQ web page end of the terminal, the user can select a corresponding question template according to the uploaded document content or document name. For example, if the user uploads the content of the document to be read on the FAQ web page to be life insurance or accident insurance, select the question template related to life insurance or accident insurance, which includes multiple common questions related to life insurance or accident insurance . The server calls the corresponding question template according to the user's selection and selects paragraphs matching the questions in the question template from the target document according to the questions in the question template and a preset screening model as the target paragraph.

In some embodiments, such as the embodiment of the present application, as shown in FIG. 4, the step S120 may include the following steps S121-S123.

S121. Encode the target document according to the question to be answered and the preset screening model to obtain a first paragraph text vector.

In the embodiment of the present application, in order to filter the target document to obtain paragraphs that match the question to be answered, it is first necessary to encode the target document according to the question to be answered and the filter model to obtain the first paragraph text vector. Among them, the preset screening model is, for example, a Bert (Bidirectional Encoder Representations From Transformers) model. The Bert model is a model based on Transformer that uses a bidirectional language. It can extract the syntax and semantic information of the document, and can also extract the document context information. Specifically, the server generates the first paragraph text vector for the target document according to the question to be answered and the Bert model. Among them, the first paragraph text vector is a three-dimensional vector, and the three-dimensional vector is a text vector representation of the target document. The reason why the Bert model is used to generate the first paragraph text vector for the target document is because the Bert model can extract the syntax and semantic information of the target document, and can also extract the context information of the target document to improve the accuracy of the extraction.

It should be noted that in other embodiments, other models can also be used to filter the target documents to obtain the target paragraphs according to actual needs, such as the Word2vec (Word to vector) model.

S122. Calculate the probability that each of the first paragraph text vectors matches the question to be answered according to the question to be answered.

S123: Determine the paragraph corresponding to the first paragraph text vector with the highest probability as a paragraph matching the question to be answered, and use it as a target paragraph.

In the embodiment of this application, after the server generates the first paragraph text vector for the target document according to the question to be answered and the Bert model, it also needs to calculate the probability that each first paragraph text vector matches the question to be answered according to the question to be answered and calculate The probability of sorting takes the paragraph corresponding to the text vector of the first paragraph with the highest probability as the target paragraph. Specifically, the Softmax function is used to calculate the probability that each first paragraph text vector matches the question to be answered according to the question to be answered, and then the calculated probabilities are sorted after the probability is obtained, and the paragraph corresponding to the text vector of the first paragraph with the highest probability is taken As the target paragraph.

S130. According to the target paragraph and the question to be answered, a FAQ question and answer pair is generated based on a preset reading comprehension model.

Specifically, after the server filters out paragraphs matching the question to be answered from the target document, it will generate a FAQ question and answer pair based on the selected paragraph and the question to be answered. Specifically, FAQ question and answer pairs can be automatically generated through a preset reading comprehension model. Wherein, the reading comprehension model is used to predict the start and end positions of the answer corresponding to the question to be answered in the target paragraph according to the target paragraph and the question to be answered, thereby determining the answer and generating a FAQ question and answer pair. In the embodiment of the present application, since the paragraph matching the question to be answered is selected from the target document as the target paragraph, the server further generates the FAQ question and answer pair based on the preset generation model based on the target paragraph and the question to be answered. The processing of non-target paragraphs reduces the interference information caused by non-target paragraphs when generating FAQ question and answer pairs to a certain extent. The accuracy rate of the generated FAQ question and answer pairs is relatively high, so it can weaken the influence of cross-domain and affect cross-domain. The question generates a FAQ question and answer pair with a relatively high matching accuracy.

In some embodiments, such as this embodiment, as shown in FIG. 5, the step S130 may include the following steps S131-S134.

S131. Encode the target paragraph and the question to be answered respectively to obtain a second paragraph text vector and a question text vector.

In the embodiment of the present application, after the server selects paragraphs matching the question to be answered from the target document, it encodes the selected target paragraph and the question to be answered respectively. Specifically, first, a preset model such as a Bert model is used to encode the target paragraph and the question to be answered, and then a preset model such as Encoder Block is used to re-encode the encoded target paragraph and the question to be answered, so as to obtain the first Two paragraph text vector and question text vector. Among them, the second paragraph text vector and the question text vector are both three-dimensional vectors, and the first component in the three-dimensional vector is Batch_Size, which is a batch processing parameter, and its limit value is the total number of training set samples. In this embodiment, the Batch_Size value is 32, which indicates that the preset model uses a small batch to batch process the target paragraph and the question to be answered. In other embodiments, Batch_Size can also be set to other values, as long as the target paragraph and the question to be answered are encoded to obtain the second paragraph text vector and question text vector. The second component in the three-dimensional vector is the length of the sentence. The third component in the three-dimensional vector is the dimension corresponding to each word. Encoder Block includes convolutional neural network, self-attention mechanism and forward neural network. Among them, the convolutional neural network (Convolutional Neural Networks, CNN) is a type of feedforward neural network (Feedforward Neural Networks) that includes convolution calculations and has a deep structure, and is one of the representative algorithms of deep learning (Deep learning). The self-attention mechanism (Self-attention) utilizes the Attention mechanism, which can take into account the representation of contextual information. Specifically, first use the Bert model to encode the target paragraph and the question to be answered to obtain the first temporary paragraph text vector and the first temporary question text vector. The first temporary paragraph text vector and the first temporary question text vector are both three-dimensional vectors. It can be called the first three-dimensional vector. Then use the convolutional neural network to encode the first temporary paragraph text vector and the first temporary question text vector to obtain the second temporary paragraph text vector and the second temporary question text vector, the second temporary paragraph text vector and the second temporary question The text vectors are also three-dimensional vectors, which can be called the second three-dimensional vector, and the dimension of each word in the second three-dimensional vector is reduced compared to the dimension of each word in the first three-dimensional vector. Secondly, through the self-attention mechanism, the second temporary paragraph text vector and the second temporary question text vector are calculated for each other word except the current word, and the weight values are weighted and summed to obtain the third temporary paragraph text The vector and the third temporary question text vector, the third temporary paragraph text vector and the third temporary question text vector can be called the third three-dimensional vector. Comparing the third three-dimensional vector with the second three-dimensional vector, each component has the same meaning. The third three-dimensional vector is only a further extraction of the second three-dimensional vector, and the dimension of each word in the third three-dimensional vector is reduced compared to the dimension of each word in the second three-dimensional vector. Finally, the third temporary paragraph text vector and the third temporary question text vector are extracted through a forward neural network to obtain the final required paragraph text vector and question text vector, where the final required paragraph text vector is defined as the second Paragraph text vector, the dimension of each word in the second paragraph text vector and the question text vector is reduced compared to the dimension of each word in the third temporary paragraph text vector and the third temporary question text vector. Understandably, in this step, only one Encoder Block is superimposed, and there are multiple layers of networks in an Encoder Block, and the higher the number of layers of the network, the problem of gradient disappearance will occur when backpropagation. In order to alleviate this problem, In this embodiment, the residual error is added when using the convolutional neural network, the self-attention mechanism, and the forward neural network for encoding, and adding the residual error can alleviate this problem.

It should be noted that in other embodiments, other models can also be used to encode the target paragraph and the question to be answered according to actual needs, and only the second paragraph text vector and the question text vector can be obtained, for example, RNN (Recurrent Neural Network model replaces the self-attention mechanism (Self-attention).

S132. Encode the second paragraph text vector and the question text vector to obtain a new text vector.

In the embodiment of the present application, after the target paragraph and the question to be answered are respectively encoded to obtain the second paragraph text vector and the question text vector, the second paragraph text vector and the question text vector need to be encoded to obtain a new text vector. Specifically, the Attention encoding operation is performed on the second paragraph text vector and the question text vector in the Context-Query Attention layer. Among them, Attention coding operations include Context-to-Query and Query-to-Context Attention coding operations. The Attention encoding operation of Context-to-Query refers to the Context length N and the Query length M to form a correlation matrix N*M, and then Softmax encoding is calculated for each row of the correlation matrix N*M Attention score, and finally the Attention score and the original Query text vector are calculated and weighted and summed to obtain a text vector containing Attention information. The Attention encoding operation of Query-to-Contex refers to the length M of the Query and the length N of the Context to form a correlation matrix M*N, and then the Softmax calculation is performed on each row of the correlation matrix M*N to obtain the Attention score Finally, the Attention score and the text vector of the original Context are calculated and weighted and summed to obtain the text vector containing the Attention information. The new text vector can be obtained by performing Context-to-Query and Query-to-Context encoding operations on the second paragraph text vector and question text vector at the Context-Query Attention layer.

It should be noted that the new text vector in the embodiment of this application is also a three-dimensional vector, and the meaning of each component is the same as the second paragraph text vector and question text vector, but the second paragraph text vector and question are realized. For the interaction of text vectors, the third component, that is, the dimension of each word, has been increased. For the sake of simplicity and convenience, it will not be repeated here.

S133: Encode the new text vector according to a preset extraction model to obtain a target text vector.

In the embodiment of the present application, after the second paragraph text vector and the question text vector are encoded to obtain the new text vector, the new text vector needs to be encoded according to the preset extraction model to obtain the target text vector. Among them, the preset extraction model is, for example, Encoder Block. The Encoder Block is different from the number of Blocks in Encoder Block in step S131, but both include convolutional neural networks, self-attention mechanisms, and forward neural networks. Neural networks, self-attention mechanisms, and forward neural networks will add residuals when encoding new text vectors. In this step, three Encoder Blocks are superimposed to encode the new text vector to further extract the target text vector from the new text vector. The dimension of each word in the target vector is compared to the dimension of each word in the new text vector. There is a decrease, which makes the matching accuracy of the generated FAQ question and answer pairs higher.

S134. Calculate the target text vector to obtain the start and end positions of the answer to the question to be answered, so as to generate the FAQ question and answer pair.

In the embodiment of the present application, after encoding the new text vector by Encoder Block to obtain the target text vector, the target text vector needs to be calculated to obtain the starting and ending positions of the answer to the question to be answered, thereby generating a FAQ question and answer pair. Specifically, the text vector obtained by the first Encoder Block encoding in step S133 and the text vector obtained by the second Encoder Block encoding are spliced together as the starting position of the answer to the question to be answered, and the first Encoder Block encoding is obtained The text vector and the text vector encoded by the third Encoder Block are spliced together as the end position of the answer to the question to be answered, and then the softmax operation is performed on the start and end positions of the answer to the question to be answered to obtain the start and end of the answer to the question to be answered. The probability of the end position, and the start and end positions of the answer to the question to be answered with the highest probability are taken as the start and end positions of the answer to the question to be answered, so as to generate the FAQ question and answer pair, and then realize the automatic construction of the FAQ question and answer pair.

FIG. 6 is a schematic flowchart of a method for automatically constructing FAQ question and answer pairs provided by another embodiment of the application. As shown in FIG. 6, in this embodiment, the method includes steps S100-S190. That is, in this embodiment, the method further includes steps S140-S190 after step S130 in the foregoing embodiment.

S140. Obtain the FAQ question and answer pair and feed back the obtained FAQ question and answer pair to the user.

In the embodiment of this application, the user uploads the content of the document to be read on the FAQ web page, and selects the relevant question template according to the content or the document name of the document to be read. The server calls the relevant question template according to the user's selection and selects the relevant question template according to the question template. Questions and Bert screening model, filter the paragraphs matching the questions in the question template from the target document as the target paragraph, and then generate FAQ pairs based on the target paragraphs and the questions to be answered, and then feedback the generated FAQ questions To the user. Specifically, the server obtains the generated FAQ question and answer pair and displays the obtained FAQ question and answer pair on the page of the FAQ web page, and the user can perform follow-up operations as needed. For example, if the user is satisfied with the generated FAQ question and answer, he can directly export the FAQ question and answer pair. If he is not satisfied, he can modify it on the FAQ page modification interface.

S150: Determine whether a modification instruction sent by the user is received.

In the embodiment of this application, after the server obtains the FAQ question and answer pair and feeds the obtained FAQ question and answer pair back to the user, it will determine whether the modification instruction sent by the user is received.

S160. If the modification instruction sent by the user is received, the question input by the user in the modification instruction is used as the question to be answered, and the execution of the screening model based on the question to be answered and the preset screening model is returned from the target. Step S120 of selecting a paragraph matching the question to be answered from the document as the target paragraph.

In the embodiment of this application, if the server receives the modification instruction sent by the user, it indicates that the user is not satisfied with the FAQ. The user can enter the question he wants to ask on the modification page of the FAQ web page, and then the server will modify the instruction The question input by the user is regarded as a question to be answered, and step S120 is executed back. That is, after re-determining the question to be answered, according to the pair of questions to be answered and the preset screening model, the paragraph matching the question to be answered is selected from the target document as the target paragraph, and then the subsequent steps are executed in sequence. Step, and finally feedback the obtained FAQ to the user.

S170: If the modification instruction sent by the user is not received, determine whether the question to be answered is a question in a preset database question template.

S180: If the question to be answered is not a question in the preset database question template, update the question in the preset database question template according to the question to be answered.

S190: If the question to be answered is a question in a preset database question template, do not update the question in the preset database question template.

In the embodiment of this application, if the server does not receive the modification instruction sent by the user, it indicates that the user is satisfied with the generated FAQ, and then it is judged whether the question to be answered is a question in the preset database question template, and if the question to be answered is yes The question in the preset database indicates that the question to be answered is the question in the question template and the user is satisfied, and the generated FAQ question and answer pair can be exported. If the question to be answered is not a question in the preset database, it indicates that the question to be answered is a question entered by the user and the answer rate is high. The question entered by the user needs to be added to the preset database question template to update and expand the preset database. Set the questions in the database question template, so that when the next FAQ question and answer pair is automatically generated, the questions in the preset database question template will be more abundant, which can better meet the needs of users.

It should be noted that in the embodiment of this application, the user's modification operation on the modification interface of the FAQ web page will be recorded, and the modified result will be regarded as the historical record of the question to be answered. These historical records can be used as A large amount of annotated data is used to optimize the model for the FAQ question and answer.

FIG. 7 is a schematic block diagram of a device 200 for automatically constructing FAQ question and answer pairs provided by an embodiment of the present application. As shown in FIG. 7, corresponding to the above method for automatically constructing FAQ question and answer pairs, the present application also provides a device 200 for automatically constructing FAQ question and answer pairs. The FAQ question and answer pair automatic construction device 200 includes a unit for executing the above FAQ question and answer pair automatic construction method, and the device may be configured in a server. Specifically, referring to FIG. 7, the apparatus 200 for automatically constructing FAQ question and answer pairs includes an acquiring unit 201, a parsing and segmenting unit 202, a screening unit 203 and a generating unit 204.

Wherein, the obtaining unit 201 is used to obtain the document to be read; the parsing and segmenting unit 202 is used to parse the document to be read and segment the parsed document to obtain the segmented document as the target document; The unit 203 is configured to filter out paragraphs matching the question to be answered from the target document as a target paragraph according to the question to be answered and a preset screening model; the generating unit 204 is configured to select paragraphs matching the question to be answered according to the target paragraph and the To answer the questions, generate FAQ question and answer pairs based on the preset reading comprehension model.

In some embodiments, such as this embodiment, as shown in FIG. 8, the analysis and segmentation unit 202 includes an analysis unit 2021 and a segmentation unit 2022.

Wherein, the parsing unit 2021 is used to parse the document to be read using a cascaded CRF model to obtain an XML document; the segmentation unit 2022 is used to segment the XML document by a preset segmentation method to obtain a pre-read Let the document with the document structure be the target document.

In some embodiments, such as this embodiment, as shown in FIG. 9, the screening unit 203 includes a first encoding unit 2031, a calculation unit 2032, and a determination unit 2033.

Wherein, the first coding unit 2031 is configured to code the target document according to the question to be answered and the preset screening model to obtain a first paragraph text vector; the first calculating unit 2032 is configured to code according to the question to be answered Calculate the probability that each of the first paragraph text vector matches the question to be answered; the determining unit 2033 is configured to determine the paragraph corresponding to the first paragraph text vector with the highest probability as the one that matches the question to be answered Paragraph and serve as the target paragraph.

In some embodiments, such as this embodiment, as shown in FIG. 10, the generating unit 204 includes a second coding unit 2041, a third coding unit 2042, a fourth coding unit 2043, and a generating subunit 2044.

Wherein, the second encoding unit 2041 is configured to encode the target paragraph and the question to be answered respectively to obtain a second paragraph text vector and a question text vector; the second encoding unit 2042 is configured to encode the second paragraph text vector And the question text vector to obtain a new text vector; the third coding unit 2043 is used to code the new text vector according to a preset extraction model to obtain a target text vector; the second generating subunit 2044 is used to The target text vector is calculated to obtain the start and end positions of the answer to the question to be answered, thereby generating the FAQ question and answer pair.

In some embodiments, such as this embodiment, as shown in FIG. 11, the device 200 further includes a feedback unit 205, a first judgment unit 206, a modification unit 207, a second judgment unit 208 and an update unit 209.

Wherein, the feedback unit 205 is used to obtain the FAQ question and answer pair and feedback the obtained FAQ question and answer pair to the user; the first judgment unit 206 is used to judge whether the modification instruction sent by the user is received; the modification unit 207 is used to receive When the modification instruction sent by the user is received, the question input by the user in the modification instruction is regarded as the question to be answered; the second determining unit 208 is configured to determine if the modification instruction sent by the user is not received Whether the question to be answered is a question in a preset database question template; the updating unit 209 is configured to update the preset database according to the question to be answered if the question to be answered is not a question in the preset database question template Questions in the question template.

The above FAQ question answering automatic construction device can be implemented in the form of a computer program, and the computer program can be run on the computer device as shown in FIG. 12.

Please refer to FIG. 12, which is a schematic block diagram of a computer device according to an embodiment of the present application. The computer device 300 is a server. Specifically, the server may be an independent server or a server cluster composed of multiple servers.

12, the computer device 300 includes a processor 302, a memory, and a network interface 305 connected through a system bus 301, where the memory may include a non-volatile storage medium 503 and an internal memory 304.

The non-volatile storage medium 303 can store an operating system 3031 and a computer program 3032. When the computer program 3032 is executed, the processor 302 can execute a method for automatically constructing FAQ question and answer pairs.

The processor 302 is used to provide calculation and control capabilities to support the operation of the entire computer device 300.

The internal memory 304 provides an environment for the operation of the computer program 3032 in the non-volatile storage medium 303. When the computer program 3032 is executed by the processor 302, it implements the method for automatically constructing FAQ question and answer pairs in the embodiment of the present application.

The network interface 305 is used for network communication with other devices. Those skilled in the art can understand that the structure shown in FIG. 12 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device 300 to which the solution of the present application is applied. The specific computer device 300 may include more or fewer components than shown in the figure, or combine certain components, or have a different component arrangement.

It should be understood that in the embodiment of the present application, the processor 302 may be a central processing unit (Central Processing Unit, CPU), and the processor 302 may also be other general-purpose processors, digital signal processors (Digital Signal Processors, DSPs), Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc. Among them, the general-purpose processor may be a microprocessor or the processor may also be any conventional processor.

Those of ordinary skill in the art can understand that all or part of the processes in the methods of the foregoing embodiments can be implemented by computer programs instructing relevant hardware. The computer program can be stored in a storage medium, and the storage medium is a computer-readable storage medium. The computer program is executed by at least one processor in the computer system to implement the process steps of the foregoing method embodiment.

Therefore, this application also provides a storage medium. The storage medium may be a computer-readable storage medium. The storage medium stores a computer program. When the computer program is executed by the processor, the processor is executed to implement the method for automatically constructing FAQ question and answer pairs in the embodiment of the present application.

In some embodiments, such as this embodiment, the processor executes the computer program to realize the parsing of the document to be read and segmenting the parsed document to obtain the segmented document. When a document is used as a target document step, the following steps are specifically implemented: the document to be read is parsed using a cascading CRF model to obtain an XML document; the XML document is segmented by a preset segmentation method to obtain a preset The document with the document structure is used as the target document.

The storage medium may be a U disk, a mobile hard disk, a read-only memory (Read-Only Memory, ROM), a magnetic disk or an optical disk, and other computer-readable storage media that can store program codes.

The steps in the method in the embodiment of the present application can be adjusted, merged, and deleted in order according to actual needs. The units in the device of the embodiment of the present application may be combined, divided, and deleted according to actual needs. In addition, the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.

If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a storage medium. Based on this understanding, the technical solution of this application is essentially or the part that contributes to the existing technology, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium. It includes several instructions to make a computer device (which may be a personal computer, a terminal, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application.

In the above-mentioned embodiments, the description of each embodiment has its own focus. For a part that is not described in detail in an embodiment, reference may be made to related descriptions of other embodiments.

Obviously, those skilled in the art can make various changes and modifications to the application without departing from the spirit and scope of the application. In this way, even if these modifications and variations of this application fall within the scope of the claims of this application and their equivalent technologies, this application is also intended to include these modifications and variations.

The above are only specific implementations of this application, but the protection scope of this application is not limited to this. Anyone familiar with the technical field can easily think of various equivalents within the technical scope disclosed in this application. Modifications or replacements, these modifications or replacements shall be covered within the scope of protection of this application. Therefore, the protection scope of this application should be subject to the protection scope of the claims.

Claims

A FAQ question and answer pair automatic construction method, including:

Get the document to be read;

Parsing the document to be read and segmenting the parsed document to obtain the segmented document as a target document;

According to the question to be answered and a preset screening model, a paragraph that matches the question to be answered is selected from the target document as a target paragraph;

According to the target paragraph and the question to be answered, a FAQ question and answer pair is generated based on a preset reading comprehension model.
The method according to claim 1, wherein the parsing the document to be read and segmenting the parsed document to obtain the segmented document as the target document comprises:

Parse the document to be read using a cascading CRF model to obtain an XML document;

The XML document is segmented in a preset segmentation manner to obtain a document with a preset document structure as a target document.
The method according to claim 1, wherein the filtering out a paragraph matching the question to be answered as a target paragraph from the target document based on the question to be answered and a preset screening model comprises:

Encoding the target document according to the question to be answered and the preset screening model to obtain a first paragraph text vector;

Calculating the probability that each of the first paragraph text vectors matches the question to be answered according to the question to be answered;

The paragraph corresponding to the first paragraph text vector with the highest probability is determined as a paragraph matching the question to be answered, and used as a target paragraph.
The method according to claim 1, wherein said generating a FAQ question and answer pair based on a preset reading comprehension model according to the target paragraph and the question to be answered comprises:

Respectively encoding the target paragraph and the question to be answered to obtain a second paragraph text vector and a question text vector;

Encoding the second paragraph text vector and the question text vector to obtain a new text vector;

Encoding the new text vector according to a preset extraction model to obtain a target text vector;

The target text vector is calculated to obtain the starting and ending positions of the answer to the question to be answered, thereby generating the FAQ question and answer pair.
The method according to claim 1, wherein after said generating a FAQ question and answer pair based on a preset reading comprehension model according to the target paragraph and the question to be answered, the method further comprises:

The FAQ question and answer pair is obtained and the obtained FAQ question and answer pair are fed back to the user.
5. The method according to claim 5, wherein after the obtaining the FAQ question and answer pair and feeding back the obtained FAQ question and answer pair to the user, the method further comprises:

Determine whether the modification instruction sent by the user is received;

If the modification instruction sent by the user is received, use the question input by the user in the modification instruction as the question to be answered;

Return to the step of performing the step of filtering out the paragraph matching the question to be answered from the target document as the target paragraph according to the question to be answered and the preset screening model.
The method according to claim 6, wherein after said determining whether a modification instruction sent by a user is received, the method further comprises:

If the modification instruction sent by the user is not received, determining whether the question to be answered is a question in a preset database question template;

If the question to be answered is not a question in the preset database question template, then the question in the preset database question template is updated according to the question to be answered.
A FAQ question and answer pair automatic construction device, which includes:

The obtaining unit is used to obtain the document to be read;

The parsing and segmentation unit is used to analyze the document to be read and segment the parsed document to obtain the segmented document as a target document;

The screening unit is configured to screen out paragraphs matching the question to be answered from the target document as the target paragraph according to the question to be answered and a preset screening model;

The generating unit is configured to generate FAQ question and answer pairs based on a preset reading comprehension model according to the target paragraph and the question to be answered.
The device for automatically constructing FAQ question and answer pairs according to claim 8, wherein the screening unit comprises:

A first encoding unit, configured to encode the target document according to the question to be answered and the preset screening model to obtain a first paragraph text vector;

A calculation unit, configured to calculate the probability that each of the first paragraph text vectors matches the question to be answered according to the question to be answered;

The determining unit is configured to determine the paragraph corresponding to the first paragraph text vector with the highest probability as the paragraph matching the question to be answered, and use it as the target paragraph.
The apparatus for automatically constructing FAQ question and answer pairs according to claim 8, wherein the generating unit comprises:

The second coding unit is configured to respectively code the target paragraph and the question to be answered to obtain a second paragraph text vector and a question text vector;

The third encoding unit is used to encode the second paragraph text vector and the question text vector to obtain a new text vector;

The fourth encoding unit is configured to encode the new text vector according to a preset extraction model to obtain a target text vector;

A generating subunit is used to calculate the target text vector to obtain the start and end positions of the answer to the question to be answered, thereby generating the FAQ question and answer pair.
A computer device includes a memory and a processor connected to the memory; wherein the memory is used to store a computer program; the processor is used to run the computer program stored in the memory to perform the following steps:

Get the document to be read;

Parsing the document to be read and segmenting the parsed document to obtain the segmented document as a target document;

According to the question to be answered and a preset screening model, a paragraph that matches the question to be answered is selected from the target document as a target paragraph;

According to the target paragraph and the question to be answered, a FAQ question and answer pair is generated based on a preset reading comprehension model.
11. The computer device according to claim 11, wherein the step of parsing the document to be read and segmenting the parsed document to obtain the segmented document as the target document comprises:

Parse the document to be read using a cascading CRF model to obtain an XML document;

The XML document is segmented in a preset segmentation manner to obtain a document with a preset document structure as a target document.
11. The computer device according to claim 11, wherein the step of selecting a paragraph matching the question to be answered as a target paragraph from the target document based on the question to be answered and a preset screening model comprises:

Encoding the target document according to the question to be answered and the preset screening model to obtain a first paragraph text vector;

Calculating the probability that each of the first paragraph text vectors matches the question to be answered according to the question to be answered;

The paragraph corresponding to the first paragraph text vector with the highest probability is determined as a paragraph matching the question to be answered, and used as a target paragraph.
11. The computer device according to claim 11, wherein the step of generating a FAQ question and answer pair based on a preset reading comprehension model according to the target paragraph and the question to be answered comprises:

Respectively encoding the target paragraph and the question to be answered to obtain a second paragraph text vector and a question text vector;

Encoding the second paragraph text vector and the question text vector to obtain a new text vector;

Encoding the new text vector according to a preset extraction model to obtain a target text vector;

The target text vector is calculated to obtain the start and end positions of the answer to the question to be answered, thereby generating the FAQ question and answer pair.
11. The computer device according to claim 11, wherein the step of generating a FAQ question and answer pair based on a preset reading comprehension model according to the target paragraph and the question to be answered comprises:

The FAQ question and answer pair is obtained and the obtained FAQ question and answer pair are fed back to the user.
15. The computer device according to claim 15, wherein after the step of obtaining the FAQ question and answer pair and feeding back the obtained FAQ question and answer pair to the user, the method further comprises:

Determine whether the modification instruction sent by the user is received;

If the modification instruction sent by the user is received, use the question input by the user in the modification instruction as the question to be answered;

Return to the step of performing the step of filtering out the paragraph matching the question to be answered from the target document as the target paragraph according to the question to be answered and the preset screening model.
The computer device according to claim 16, wherein after the step of determining whether a modification instruction sent by the user is received, the method further comprises:

If the modification instruction sent by the user is not received, determining whether the question to be answered is a question in a preset database question template;

If the question to be answered is not a question in the preset database question template, then the question in the preset database question template is updated according to the question to be answered.
A computer-readable storage medium storing a computer program, and when the computer program is executed by a processor, the processor executes the following steps:

Get the document to be read;

Parsing the document to be read and segmenting the parsed document to obtain the segmented document as a target document;

According to the question to be answered and a preset screening model, a paragraph that matches the question to be answered is selected from the target document as a target paragraph;

According to the target paragraph and the question to be answered, a FAQ question and answer pair is generated based on a preset reading comprehension model.
18. The computer-readable storage medium according to claim 18, wherein, according to the question to be answered and a preset screening model, a paragraph that matches the question to be answered is selected from the target document as the target paragraph. The steps include:

Encoding the target document according to the question to be answered and the preset screening model to obtain a first paragraph text vector;

Calculating the probability that each of the first paragraph text vectors matches the question to be answered according to the question to be answered;

The paragraph corresponding to the first paragraph text vector with the highest probability is determined as a paragraph matching the question to be answered, and used as a target paragraph.
18. The computer-readable storage medium of claim 18, wherein the step of generating FAQ question and answer pairs based on a preset reading comprehension model according to the target paragraph and the question to be answered comprises:

Respectively encoding the target paragraph and the question to be answered to obtain a second paragraph text vector and a question text vector;

Encoding the second paragraph text vector and the question text vector to obtain a new text vector;

Encoding the new text vector according to a preset extraction model to obtain a target text vector;

The target text vector is calculated to obtain the start and end positions of the answer to the question to be answered, thereby generating the FAQ question and answer pair.