CN111046152A

CN111046152A - FAQ question-answer pair automatic construction method and device, computer equipment and storage medium

Info

Publication number: CN111046152A
Application number: CN201910969443.4A
Authority: CN
Inventors: 杨凤鑫; 徐国强
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2019-10-12
Filing date: 2019-10-12
Publication date: 2020-04-21
Anticipated expiration: 2039-10-12
Also published as: CN111046152B; WO2021068352A1

Abstract

The embodiment of the invention discloses an automatic FAQ question-answer pair construction method, an automatic FAQ question-answer pair construction device, computer equipment and a storage medium. The method belongs to the technical field of artificial intelligence and natural language processing, and comprises the following steps: acquiring a document to be read; analyzing a document to be read and segmenting the analyzed document to obtain a segmented document as a target document; screening out paragraphs matched with the questions to be answered from the target document as target paragraphs according to the questions to be answered and a preset screening model; and generating an FAQ question-answer pair based on a preset reading understanding model according to the target paragraph and the question to be answered. According to the embodiment of the application, the target paragraphs matched with the questions to be answered are screened out firstly, and then the FAQ question-answer pairs are generated according to the target paragraphs and the questions to be answered, so that non-target paragraphs do not need to be processed, interference information brought by the FAQ question-answer pairs to the non-target paragraphs is reduced to a certain extent, and the matching accuracy of the generated FAQ question-answer pairs is higher.

Description

FAQ question-answer pair automatic construction method and device, computer equipment and storage medium

Technical Field

The invention relates to the technical field of artificial intelligence and natural language processing, in particular to an automatic FAQ question-answer pair construction method, an automatic FAQ question-answer pair construction device, computer equipment and a storage medium.

Background

FAQ is an abbreviation for English freqently assigned Questions, and Chinese means "Frequently Asked Questions", or more colloquially "Frequently Asked Questions answers". The FAQ is considered as a common on-line customer service means, and a good FAQ system should answer at least 80% of the users' general questions and common questions. Therefore, the method is convenient for users, greatly reduces the pressure of website workers, saves a large amount of customer service cost and increases the satisfaction degree of customers. Therefore, how to effectively implement the construction of the FAQ database is particularly important.

At present, the automatic construction of the FAQ in the field of question answering mainly includes the following three methods: (1) the method comprises the steps of segmenting words of an article to be read and a question to be answered, obtaining corresponding word strings after segmenting the words, inputting the word strings into an automatic reading understanding model, and outputting answers corresponding to the questions. (2) According to the similarity between the question provided by the user and the existing question record in the question-answer library, finding out the question matched with the question of the user in the existing question-answer database, and returning the corresponding answer to the user to finish the FAQ answer. (3) And establishing a sentence pattern template corresponding to the standard question sentence in a manual input mode for the established FAQ. Matching the question sentence of the user with the sentence pattern template, and then matching the sentence pattern template with the FAQ through the mapping of the sentence pattern template and the FAQ. Although the three methods can be successfully matched to a certain extent, and the automatic construction of the FAQ question-answer pair is realized, the matching accuracy of the FAQ question-answer pair is still low.

Disclosure of Invention

The embodiment of the invention provides an automatic FAQ question-answer pair construction method, an automatic FAQ question-answer pair construction device, computer equipment and a storage medium, and aims to solve the problem that the automatic FAQ question-answer pair construction matching accuracy is low in the prior art.

In a first aspect, an embodiment of the present invention provides an automatic construction method for an FAQ question-answer pair, including:

acquiring a document to be read;

analyzing the document to be read and segmenting the analyzed document to obtain a segmented document as a target document;

screening out paragraphs matched with the questions to be answered from the target document as target paragraphs according to the questions to be answered and a preset screening model;

and generating an FAQ question-answer pair based on a preset reading understanding model according to the target paragraph and the question to be answered.

In a second aspect, an embodiment of the present invention further provides an apparatus for automatically constructing an FAQ question-answer pair, where the apparatus includes:

the reading device comprises an acquisition unit, a reading unit and a reading unit, wherein the acquisition unit is used for acquiring a document to be read;

the analysis segmentation unit is used for analyzing the document to be read and segmenting the analyzed document to obtain a segmented document as a target document;

the screening unit is used for screening out paragraphs matched with the questions to be answered from the target document as target paragraphs according to the questions to be answered and a preset screening model;

and the generating unit is used for generating an FAQ question-answer pair based on a preset reading understanding model according to the target paragraph and the question to be answered.

In a third aspect, an embodiment of the present invention further provides a computer device, which includes a memory and a processor, where the memory stores a computer program, and the processor implements the above method when executing the computer program.

In a fourth aspect, the present invention also provides a computer-readable storage medium, which stores a computer program, and the computer program can implement the above method when being executed by a processor.

The embodiment of the invention provides an automatic FAQ question-answer pair construction method, an automatic FAQ question-answer pair construction device, computer equipment and a storage medium. Wherein the method comprises the following steps: acquiring a document to be read; analyzing the document to be read and segmenting the analyzed document to obtain a segmented document as a target document; screening out paragraphs matched with the questions to be answered from the target document as target paragraphs according to the questions to be answered and a preset screening model; and generating an FAQ question-answer pair based on a preset reading understanding model according to the target paragraph and the question to be answered. According to the technical scheme of the embodiment of the invention, the target paragraphs matched with the questions to be answered are screened out firstly, and then the FAQ question-answer pairs are generated according to the target paragraphs and the questions to be answered, so that non-target paragraphs are not required to be processed, interference information brought by the FAQ question-answer pairs to the non-target paragraphs is reduced to a certain extent, and the generated FAQ question-answer pairs can be matched with higher accuracy.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a scene schematic diagram of an automatic FAQ question-answer pair construction method according to an embodiment of the present invention;

fig. 2 is a schematic flow chart of an automatic FAQ question-answer pair construction method according to an embodiment of the present invention;

fig. 3 is a sub-flow diagram of an automatic construction method of an FAQ question-answer pair according to an embodiment of the present invention;

fig. 4 is a sub-flow diagram of an automatic construction method of an FAQ question-answer pair according to an embodiment of the present invention;

FIG. 5 is a sub-flow diagram of an automatic FAQ question-answer pair construction method according to an embodiment of the present invention;

fig. 6 is a schematic flow chart of an automatic FAQ question-answer pair construction method according to another embodiment of the present invention;

fig. 7 is a schematic block diagram of an automatic FAQ question-answer pair constructing apparatus according to an embodiment of the present invention;

FIG. 8 is a schematic block diagram of an analysis segmentation unit of an FAQ question-answer pair automatic construction device according to an embodiment of the present invention;

fig. 9 is a schematic block diagram of a screening unit of an FAQ question-answer pair automatic construction apparatus according to an embodiment of the present invention;

fig. 10 is a schematic block diagram of a generation unit of an FAQ question-answer pair automatic construction apparatus according to an embodiment of the present invention;

fig. 11 is a schematic block diagram of an FAQ question-answer pair automatic construction apparatus according to another embodiment of the present invention; and

fig. 12 is a schematic block diagram of a computer device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

Referring to fig. 1, fig. 1 is a scene schematic diagram of an automatic FAQ question-answer pair construction method according to an embodiment of the present invention. The FAQ question-answer pair automatic construction method provided by the embodiment of the invention can be applied to a server, and can be realized through a software program configured on the server. The server communicates with the terminal so that the server calls the document to be read uploaded by the user through the terminal, and obtains an FAQ question-answer pair after a series of processing is carried out according to the question to be answered and the document to be read, and automatic construction of the FAQ question-answer pair is achieved. The terminal may be a desktop computer, a laptop computer, a tablet computer, etc., and is not limited herein. In addition, in fig. 1, the number of the terminal and the user is one, and it is understood that in the actual application process, the number of the terminal and the user may be multiple, and fig. 1 only serves for the purpose of schematic illustration.

Referring to fig. 2, fig. 2 is a schematic flow chart of an automatic FAQ question-answer pair construction method according to an embodiment of the present invention. As shown in fig. 2, the method includes the following steps S100-S130.

And S100, acquiring a document to be read.

Specifically, to implement automatic construction of an FAQ question-answer pair, the server needs to acquire a document to be read first, and then generates the FAQ question-answer pair after performing a series of processing based on the document to be read. In the embodiment of the present invention, a user may upload a document to be read through a user terminal, and specifically, the user may upload the document to be read through an FAQ web page of the user terminal to send the document to be read to a server. In the embodiment of the present invention, the document to be read is a PDF document.

It should be noted that in other embodiments, the document to be read may be other types of documents, such as a Word document.

S110, analyzing the document to be read and segmenting the analyzed document to obtain a segmented document as a target document.

Specifically, after acquiring the document to be read, the server needs to analyze the document to be read to obtain the document in the required format, and then segment the content in the document to finally obtain the document with the preset document structure.

Referring to fig. 3, in an embodiment, for example, in the embodiment of the present invention, the step S110 includes the following steps S111-S112.

And S111, analyzing the document to be read by adopting a laminated CRF model to obtain an XML document.

S112, segmenting the XML document in a preset segmentation mode to obtain a document with a preset document structure as a target document.

In the embodiment of the invention, a document to be read is analyzed by adopting a laminated CRF model so as to obtain an XML document. In this embodiment, the laminated CRF model is adopted because the laminated CRF model has a shorter processing time and a better processing effect for analyzing the document to be read. XML is an abbreviation for Extensible Markup Language, which is referred to herein collectively as the Extensible Markup Language. After the document to be read is parsed to obtain the XML document, the XML document needs to be segmented to obtain the segmented document as the target document. Specifically, the segmenting of the XML document may be implemented by a preset segmenting mode, wherein the employed segmenting mode includes various modes, for example, the segmenting mode selected in the embodiment is a two-level title as the segmenting mode. In other embodiments, other segmentation modes such as a first-level title or an article paragraph can be selected according to actual requirements.

It should be noted that, in other embodiments, other models may be used to parse the document to be read, for example, a hidden Markov model, that is, an hmm (hidden Markov model) model may be used.

And S120, according to the questions to be answered and a preset screening model, screening out paragraphs matched with the questions to be answered from the target document as target paragraphs.

Specifically, after the server analyzes the document to be read and segments the analyzed document to obtain the target document, the server also needs to screen the target document to obtain a paragraph matched with the question to be answered as a target paragraph. In the embodiment of the invention, the questions to be answered are questions stored in a question template in a preset database. Specifically, after the user uploads the document to be read on the FAQ webpage side of the terminal, the user may select a corresponding question template according to the uploaded document content or document name to be read. For example, if the document content to be read is an insurance or accident on the FAQ web page, the user selects a question template related to the insurance or accident, which includes a plurality of common questions related to the insurance or accident. And the server calls a corresponding problem template according to the selection of the user, and screens out a paragraph matched with the problem in the problem template from the target document as a target paragraph according to the problem in the problem template and a preset screening model.

In some embodiments, such as the present invention, as shown in FIG. 4, the step S120 may include the following steps S121-S123.

S121, coding the target document according to the question to be answered and the preset screening model to obtain a first paragraph text vector.

In the embodiment of the invention, in order to screen the target document to obtain the paragraphs matched with the questions to be answered, the target document is encoded according to the questions to be answered and the screening model to obtain a first paragraph text vector. The preset screening model is, for example, a bert (bidirectional Encoder expressions) model. The Bert model is a model which adopts a bidirectional language based on a Transformer, can extract grammatical semantic information of a document, and can also extract the grammatical semantic information by combining with context information of the document. Specifically, the server generates a first paragraph text vector for the target document according to the question to be answered and the Bert model. The first paragraph text vector is a three-dimensional vector, and the three-dimensional vector is a text vector representation of the target document. The reason that the Bert model is adopted to generate the first paragraph text vector for the target document is that the Bert model can extract the syntactic semantic information of the target document and can also extract the syntactic semantic information by combining the context information of the target document, so that the extraction accuracy is improved.

It should be noted that, in other embodiments, other models may also be used to filter the target document according to actual requirements to obtain a target paragraph, for example, a Word2vec (Word to vector) model.

And S122, calculating the probability of matching each first paragraph text vector with the question to be answered according to the question to be answered.

S123, determining the paragraph corresponding to the first paragraph text vector with the maximum probability as the paragraph matched with the question to be answered, and taking the paragraph as a target paragraph.

In the embodiment of the present invention, after the server generates the first paragraph text vector for the target document according to the question to be answered and the Bert model, the server also needs to calculate the probability that each first paragraph text vector matches the question to be answered according to the question to be answered and sequence the calculated probability to take the paragraph corresponding to the first paragraph text vector with the largest probability as the target paragraph. Specifically, the probability that each first paragraph of text vector is matched with the question to be answered is calculated according to the question to be answered by using a Softmax function, the calculated probabilities are ranked after the probability is obtained, and the paragraph corresponding to the first paragraph of text vector with the highest probability is taken as the target paragraph.

And S130, generating an FAQ question-answer pair based on a preset reading understanding model according to the target paragraph and the question to be answered.

Specifically, after the server screens out paragraphs matched with the questions to be answered from the target document, the server generates an FAQ question-answer pair according to the screened paragraphs and the questions to be answered. Specifically, the FAQ question-answer pairs may be automatically generated through a preset reading understanding model. The reading understanding model is used for predicting the positions of the beginning and the end of the answer corresponding to the question to be answered in the target paragraph according to the target paragraph and the question to be answered, so that the answer is determined, and an FAQ question-answer pair is generated. In the embodiment of the invention, because the paragraphs matched with the questions to be answered are screened out from the target document to serve as the target paragraphs, and the server can generate the FAQ question-answer pairs based on the preset generation model according to the target paragraphs and the questions to be answered, the non-target paragraphs do not need to be processed, the interference information brought by the generation of the FAQ question-answer pairs to the non-target paragraphs is reduced to a certain extent, the accuracy of the generated FAQ question-answer pairs is higher, the influence caused by cross-fields can be weakened, and the FAQ question-answer pairs with higher matching accuracy are generated for the cross-field questions.

In some embodiments, such as this embodiment, as shown in fig. 5, the step S130 may include the following steps S131-S134.

S131, coding the target paragraph and the question to be answered respectively to obtain a second paragraph text vector and a question text vector.

In the embodiment of the invention, after the server screens out the paragraphs matched with the questions to be answered from the target document, the screened target paragraphs and the questions to be answered are respectively coded. Specifically, a preset model such as a Bert model is used to encode the target paragraph and the question to be answered, and then a preset model such as an EncoderBlock is used to re-encode the encoded target paragraph and the question to be answered, so as to obtain a second paragraph text vector and a question text vector. The second paragraph text vector and the question text vector are three-dimensional vectors, the first component of the three-dimensional vectors is Batch _ Size, the Batch _ Size is a Batch processing parameter, and the limit value is the total number of training set samples. In this embodiment, the value of Batch _ Size is 32, which indicates that the preset model is processed by small batches of the target paragraph and the question to be answered. In other embodiments, the value of Batch _ Size may be set to other values, and only the target paragraph and the question to be answered need to be encoded to obtain the second paragraph text vector and the question text vector. The second component in the three-dimensional vector is the length of the sentence. The third component in the three-dimensional vector is the dimension corresponding to each word. The Encoder Block includes a convolutional neural network, a self-attention mechanism, and a forward neural network. The Convolutional Neural Network (CNN) is a kind of feed forward neural network (feed forward neural network) that includes convolution calculation and has a Deep structure, and is one of the typical algorithms of Deep learning (Deep learning). The Self-Attention mechanism (Self-Attention) utilizes the Attention mechanism, which allows for characterization of context information. Specifically, a Bert model is used to encode the target paragraph and the question to be answered to obtain a first temporary paragraph text vector and a first temporary question text vector, where the first temporary paragraph text vector and the first temporary question text vector are three-dimensional vectors, and may be referred to as a first three-dimensional vector. And then, coding the first temporary paragraph text vector and the first temporary problem text vector by using a convolutional neural network to obtain a second temporary paragraph text vector and a second temporary problem text vector, wherein the second temporary paragraph text vector and the second temporary problem text vector are both three-dimensional vectors and can be called second three-dimensional vectors, and the dimension of each word in the second three-dimensional vectors is reduced compared with the dimension of each word in the first three-dimensional vectors. And secondly, calculating a weight for each other word except the current word in the second temporary paragraph text vector and the second temporary problem text vector through a self-attention mechanism, performing weighted summation on the weights to obtain a third temporary paragraph text vector and a third temporary problem text vector, wherein the third temporary paragraph text vector and the third temporary problem text vector can be called as third three-dimensional vectors, the meaning of each component of the third three-dimensional vectors and the meaning of each component of the second three-dimensional vectors are the same, the third three-dimensional vectors only further extract the second three-dimensional vectors, and the dimension of each word in the third three-dimensional vectors is reduced compared with the dimension of each word in the second three-dimensional vectors. And finally, continuously extracting the third temporary paragraph text vector and the third temporary problem text vector through a forward neural network to obtain a finally required paragraph text vector and a finally required problem text vector, wherein the finally required paragraph text vector is defined as a second paragraph text vector, and the dimension of each word in the second paragraph text vector and the finally required problem text vector is reduced compared with the dimension of each word in the third temporary paragraph text vector and the third temporary problem text vector. Understandably, in this step, only one Encoder Block is superimposed, and there is a multilayer network in one Encoder Block, and when the number of layers of the network is higher, the problem of gradient disappearance exists when the network propagates backwards, and in order to alleviate this problem, in this embodiment, when the convolutional neural network, the self-attention mechanism, and the forward neural network are used for encoding, a residual error is added, and the addition of the residual error can alleviate the problem.

It should be noted that, in other embodiments, other models may also be used to encode the target paragraph and the question to be answered respectively according to actual requirements, as long as the second paragraph text vector and the question text vector are obtained, for example, an rnn (current Neural network) model is used to replace a Self-attention machine system (Self-attention).

S132, encoding the second paragraph text vector and the problem text vector to obtain a new text vector.

In the embodiment of the invention, after the target paragraph and the question to be answered are respectively encoded to obtain the second paragraph text vector and the question text vector, the second paragraph text vector and the question text vector are also encoded to obtain a new text vector. Specifically, the Attention coding operation is carried out on the second paragraph text vector and the question text vector in the Context-Query Attention layer. Wherein the Attention encoding operation includes Context-to-Query and Query-to-Context encoding operations. The content-to-Query attribute coding operation means that the length N of the content and the length M of the Query form a correlation matrix N x M, then Softmax coding calculation is carried out on each line of the correlation matrix N x M to obtain an attribute score, and finally the attribute score and the text vector of the original Query are calculated, weighted and summed to obtain the text vector containing the attribute information. The Attention coding operation of Query-to-Context means that the length M of Query and the length N of Context form a correlation matrix M x N, then Softmax calculation is carried out on each line of the correlation matrix M x N to obtain an Attention score, and finally calculation, weighting and summation are carried out on the Attention score and the text vector of original Context to obtain the text vector containing the Attention information. And carrying out Context-to-Query and Query-to-Context coding operations on the second paragraph text vector and the problem text vector at a Context-Query authorization layer to obtain a new text vector.

It should be noted that, in the embodiment of the present invention, the new text vector is also a three-dimensional vector, and the meaning represented by each component is the same as the second paragraph text vector and the problem text vector, only the interaction between the second paragraph text vector and the problem text vector is implemented, and the third component, i.e., the dimension of each word, is increased, and is not described herein again for brevity and convenience of description.

S133, encoding the new text vector according to a preset extraction model to obtain a target text vector.

In the embodiment of the present invention, after the second paragraph text vector and the problem text vector are encoded to obtain a new text vector, the new text vector needs to be encoded according to a preset extraction model to obtain a target text vector. The preset extraction model is, for example, an Encoder Block, the number of the Encoder Block is different from the number of blocks in the Encoder Block in step S131, but the Encoder Block includes a convolutional neural network, a self-attention mechanism, and a forward neural network, and a residual error is added when the convolutional neural network, the self-attention mechanism, and the forward neural network encode a new text vector. Three Encoder blocks are superposed in the step to encode the new text vector so as to further extract a target text vector from the new text vector, and the dimension of each word in the target vector is reduced compared with that of each word in the new text vector, so that the matching accuracy of the FAQ question-answer pair is higher.

S134, calculating the target text vector to obtain the positions of the beginning and the end of the answer of the question to be answered, and generating the FAQ question-answer pair.

In the embodiment of the invention, after the new text vector is coded by the Encoder Block to obtain the target text vector, the target text vector is calculated to obtain the positions of the beginning and the end of the answer of the question to be answered, so that an FAQ question-answer pair is generated. Specifically, in step S133, the text vector obtained by the first encorder Block coding and the text vector obtained by the second encorder Block coding are spliced together to be used as the position where the answer to the question to be answered starts, the text vector obtained by the first encorder Block coding and the text vector obtained by the third encorder Block coding are spliced together to be used as the position where the answer to the question to be answered ends, Softmax operations are respectively performed on the positions where the answer to the question to be answered starts and ends to obtain the probabilities of the positions where the answer to the question to be answered starts and ends, and the positions where the probabilities of the positions where the answer to the question to be answered starts and ends are the maximum are used as the positions where the answer to the question to be answered starts and ends, so that an FAQ question-answer pair is generated, and automatic construction of an FAQ question-answer pair is further realized.

Fig. 6 is a schematic flow chart of an automatic FAQ question-answer pair construction method according to another embodiment of the present invention, as shown in fig. 6, in this embodiment, the method includes steps S100 to S190. That is, in the present embodiment, the method further includes steps S140-S190 after step S130 of the above embodiment.

And S140, obtaining the FAQ question-answer pair and feeding back the obtained FAQ question-answer pair to a user.

In the embodiment of the invention, a user uploads the content of a document to be read on an FAQ webpage end page, a relevant question template is selected according to the content or the document name of the document to be read, a server calls the relevant question template according to the selection of the user, screens a model according to the questions in the question template and the Bert, screens out paragraphs matched with the questions in the question template from a target document as target paragraphs, generates an FAQ question-answer pair according to the target paragraphs and the questions to be answered, and then feeds the generated FAQ question-answer pair back to the user. Specifically, the server acquires the generated FAQ question-answer pair and displays the acquired FAQ question-answer pair in a page of the FAQ webpage, and the user can perform subsequent operations as required. For example, if the user is satisfied with the generated FAQ question-answer pair, the FAQ question-answer pair can be directly derived, and if the user is not satisfied with the FAQ question-answer pair, the FAQ question-answer pair can be modified on the FAQ webpage modification interface.

S150, judging whether a modification instruction sent by a user is received.

In the embodiment of the invention, after the server acquires the FAQ question-answer pair and feeds back the acquired FAQ question-answer pair to the user, whether a modification instruction sent by the user is received or not is judged.

And S160, if a modification instruction sent by the user is received, taking the question input by the user in the modification instruction as the question to be answered, returning to the step S120 of executing the step of selecting the paragraph matched with the question to be answered from the target document as the target paragraph according to the question to be answered and a preset screening model.

In the embodiment of the present invention, if the server receives the modification instruction sent by the user, it indicates that the user is not satisfied with the FAQ question-answer pair, the user may input the question that the user wants to ask in the modification page of the FAQ web page, and then the server takes the question input by the user in the modification instruction as the question to be answered, and returns to execute step S120. Namely, after the questions to be answered are re-determined, paragraphs matched with the questions to be answered are screened out from the target document as target paragraphs according to the question pairs to be answered and a preset screening model, then the subsequent steps are sequentially executed, and finally the obtained FAQ question-answer pairs are fed back to the user.

S170, if the modification instruction sent by the user is not received, judging whether the question to be answered is a question in a preset database question template.

And S180, if the question to be answered is not the question in the preset database question template, updating the question in the preset database question template according to the question to be answered.

And S190, if the question to be answered is a question in a preset database question template, not updating the question in the preset database question template.

In the embodiment of the invention, if the server does not receive the modification instruction sent by the user, the server indicates that the user is satisfied with the generated FAQ question and answer, whether the question to be answered is the question in the preset database question template is judged, if the question to be answered is the question in the preset database, the server indicates that the question to be answered is the question in the question template and the user is satisfied, and the generated FAQ question and answer pair can be derived. If the question to be answered is not a question in the preset database, the question to be answered is a question input by the user and the answer accuracy is high, the question input by the user needs to be supplemented into the preset database question template to update and expand the question in the preset database question template, so that when the next automatic generation operation of the FAQ question-answer pair is carried out, the question in the preset database question template is richer, and the user requirements can be met better.

It should be noted that, in the embodiment of the present invention, the modification operation performed by the user on the modification interface of the FAQ web page is recorded, and the modified result is taken as the history of the questions to be answered, and these history can be used as a large amount of labeled data to perform the optimization operation on the model for the FAQ question-answering.

Fig. 7 is a schematic block diagram of an FAQ question-answer pair automatic construction apparatus 200 according to an embodiment of the present invention. As shown in fig. 7, the present invention further provides an automatic FAQ question-answer pair constructing apparatus 200 corresponding to the above automatic FAQ question-answer pair constructing method. The FAQ question-answer pair automatic construction apparatus 200 includes a unit for executing the above-described FAQ question-answer pair automatic construction method, and may be configured in a server. Specifically, referring to fig. 7, the FAQ question-answer pair automatic construction apparatus 200 includes an acquisition unit 201, an analysis segmentation unit 202, a screening unit 203, and a generation unit 204.

The acquiring unit 201 is configured to acquire a document to be read; the parsing and segmenting unit 202 is configured to parse the document to be read and segment the parsed document to obtain a segmented document as a target document; the screening unit 203 is configured to screen out a paragraph matched with the question to be answered from the target document as a target paragraph according to the question to be answered and a preset screening model; the generating unit 204 is configured to generate an FAQ question-answer pair based on a preset reading understanding model according to the target paragraph and the question to be answered.

In some embodiments, such as the present embodiment, as shown in fig. 8, the parsing and segmenting unit 202 includes a parsing unit 2021 and a segmenting unit 2022.

The parsing unit 2021 is configured to parse the document to be read by using a stacked CRF model to obtain an XML document; the segmenting unit 2022 is configured to segment the XML document by a preset segmentation manner to obtain a document with a preset document structure as a target document.

In some embodiments, for example, in this embodiment, as shown in fig. 9, the screening unit 203 includes a first decoding unit 2031, a calculating unit 2032, and a determining unit 2033.

The first encoding unit 2031 is configured to encode the target document according to the question to be answered and the preset screening model to obtain a first paragraph text vector; the first calculating unit 2032 is configured to calculate, according to the question to be answered, a probability that each of the first paragraph text vectors matches the question to be answered; the determining unit 2033 is configured to determine the paragraph corresponding to the first paragraph text vector with the highest probability as the paragraph matching the question to be answered, and serve as the target paragraph.

In some embodiments, for example, in the present embodiment, as shown in fig. 10, the generating unit 204 includes a second encoding unit 2041, a third encoding unit 2042, a fourth encoding unit 2043, and a generating subunit 2044.

The second encoding unit 2041 is configured to encode the target paragraph and the question to be answered respectively to obtain a second paragraph text vector and a question text vector; the second encoding unit 2042 is configured to encode the second paragraph text vector and the question text vector to obtain a new text vector; the third encoding unit 2043 is configured to encode the new text vector according to a preset extraction model to obtain a target text vector; the second generating subunit 2044 is configured to calculate the target text vector to obtain the positions of the beginning and the end of the answer of the question to be answered, so as to generate the FAQ question-answer pair.

In some embodiments, for example, in this embodiment, as shown in fig. 11, the apparatus 200 further includes a feedback unit 205, a first determining unit 206, a modifying unit 207, a second determining unit 208, and an updating unit 209.

The feedback unit 205 is configured to obtain the FAQ question-answer pair and feed back the obtained FAQ question-answer pair to the user; the first judging unit 206 is configured to judge whether a modification instruction sent by a user is received; the modifying unit 207 is configured to, if the modifying instruction sent by the user is received, take the question input by the user in the modifying instruction as the question to be answered; the second determining unit 208 is configured to determine whether the question to be answered is a question in a preset database question template if the modification instruction sent by the user is not received; the updating unit 209 is configured to update the question in the preset database question template according to the question to be answered if the question to be answered is not a question in the preset database question template.

The above-described FAQ question-answer pair automatic construction apparatus may be implemented in the form of a computer program that can be run on a computer device as shown in fig. 12.

Referring to fig. 12, fig. 12 is a schematic block diagram of a computer device according to an embodiment of the present application. The computer device 300 is a server, and specifically, the server may be an independent server or a server cluster composed of a plurality of servers.

Referring to fig. 12, the computer device 300 includes a processor 302, memory, and a network interface 305 connected by a system bus 301, wherein the memory may include a non-volatile storage medium 503 and an internal memory 304.

The nonvolatile storage medium 303 may store an operating system 3031 and a computer program 3032. The computer program 3032, when executed, causes the processor 302 to perform a method for automatic construction of FAQ question-answer pairs.

The processor 302 is used to provide computing and control capabilities to support the operation of the overall computer device 300.

The internal memory 304 provides an environment for running the computer program 3032 in the nonvolatile storage medium 303, and the computer program 3032, when executed by the processor 302, causes the processor 302 to execute an automatic FAQ question-answer pair construction method.

The network interface 305 is used for network communication with other devices. Those skilled in the art will appreciate that the configuration shown in fig. 12 is a block diagram of only a portion of the configuration associated with the present application and does not constitute a limitation of the computer apparatus 300 to which the present application is applied, and that a particular computer apparatus 300 may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

Wherein the processor 302 is configured to run a computer program 3032 stored in the memory to implement the following steps: acquiring a document to be read; analyzing the document to be read and segmenting the analyzed document to obtain a segmented document as a target document; screening out paragraphs matched with the questions to be answered from the target document as target paragraphs according to the questions to be answered and a preset screening model; and generating an FAQ question-answer pair based on a preset reading understanding model according to the target paragraph and the question to be answered.

In some embodiments, for example, in this embodiment, when the processor 302 implements the steps of parsing the document to be read and segmenting the parsed document to obtain a segmented document as a target document, the following steps are specifically implemented: analyzing the document to be read by adopting a cascading CRF model to obtain an XML document; and segmenting the XML document in a preset segmentation mode to obtain a document with a preset document structure as a target document.

In some embodiments, for example, in this embodiment, when the step of screening out a paragraph matching the question to be answered from the target document as a target paragraph according to the question to be answered and a preset screening model is implemented, the processor 302 specifically implements the following steps: coding the target document according to the question to be answered and the preset screening model to obtain a first paragraph text vector; calculating the probability of matching each first paragraph text vector with the question to be answered according to the question to be answered; and determining the paragraph corresponding to the first paragraph text vector with the highest probability as the paragraph matched with the question to be answered, and taking the paragraph as a target paragraph.

In some embodiments, for example, in this embodiment, when the processor 302 implements the step of generating an FAQ question-answer pair based on a preset reading understanding model according to the target paragraph and the question to be answered, the following steps are specifically implemented: respectively encoding the target paragraph and the question to be answered to obtain a second paragraph text vector and a question text vector; encoding the second paragraph text vector and the problem text vector to obtain a new text vector; coding the new text vector according to a preset extraction model to obtain a target text vector; and calculating the target text vector to obtain the positions of the beginning and the end of the answer of the question to be answered, thereby generating the FAQ question-answer pair.

In some embodiments, for example, in this embodiment, after the step of implementing the target paragraph and the question to be answered and generating an FAQ question-and-answer pair based on a preset generation model, the processor 302 further includes the following steps: obtaining the FAQ question-answer pair and feeding back the obtained FAQ question-answer pair to a user; judging whether a modification instruction sent by a user is received; if the modification instruction sent by the user is received, taking the question input by the user in the modification instruction as the question to be answered, returning to execute the step of screening out the paragraph matched with the question to be answered from the target document as a target paragraph according to the question to be answered and a preset screening model; if the modification instruction sent by the user is not received, judging whether the question to be answered is a question in a preset database question template; if the question to be answered is not a question in a preset database question template, updating the question in the preset database question template according to the question to be answered; and if the question to be answered is a question in a preset database question template, not updating the question in the preset database question template.

It should be understood that, in the embodiment of the present Application, the Processor 302 may be a Central Processing Unit (CPU), and the Processor 302 may also be other general-purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, and the like. Wherein a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

It will be understood by those skilled in the art that all or part of the flow of the method implementing the above embodiments may be implemented by a computer program instructing associated hardware. The computer program may be stored in a storage medium, which is a computer-readable storage medium. The computer program is executed by at least one processor in the computer system to implement the flow steps of the embodiments of the method described above.

Accordingly, the present invention also provides a storage medium. The storage medium may be a computer-readable storage medium. The storage medium stores a computer program. The computer program, when executed by a processor, causes the processor to perform the steps of: acquiring a document to be read; analyzing the document to be read and segmenting the analyzed document to obtain a segmented document as a target document; screening out paragraphs matched with the questions to be answered from the target document as target paragraphs according to the questions to be answered and a preset screening model; and generating an FAQ question-answer pair based on a preset reading understanding model according to the target paragraph and the question to be answered.

In some embodiments, for example, in this embodiment, when the processor executes the computer program to implement the steps of parsing the document to be read and segmenting the parsed document to obtain a segmented document as a target document, the following steps are specifically implemented: analyzing the document to be read by adopting a cascading CRF model to obtain an XML document; and segmenting the XML document in a preset segmentation mode to obtain a document with a preset document structure as a target document.

In some embodiments, for example, in this embodiment, when the processor executes the computer program to implement the step of screening out a paragraph matching the question to be answered from the target document as a target paragraph according to the question to be answered and a preset screening model, the following steps are specifically implemented: coding the target document according to the question to be answered and the preset screening model to obtain a first paragraph text vector; calculating the probability of matching each first paragraph text vector with the question to be answered according to the question to be answered; and determining the paragraph corresponding to the first paragraph text vector with the highest probability as the paragraph matched with the question to be answered, and taking the paragraph as a target paragraph.

In some embodiments, for example, in this embodiment, when the processor executes the computer program to implement the step of generating an FAQ question-answer pair according to the target paragraph and the question to be answered based on a preset reading understanding model, the following steps are specifically implemented: respectively encoding the target paragraph and the question to be answered to obtain a second paragraph text vector and a question text vector; encoding the second paragraph text vector and the problem text vector to obtain a new text vector; coding the new text vector according to a preset extraction model to obtain a target text vector; and calculating the target text vector to obtain the positions of the beginning and the end of the answer of the question to be answered, thereby generating the FAQ question-answer pair.

In some embodiments, for example, in this embodiment, after the step of executing the computer program to implement the target paragraph and the question to be answered and generating an FAQ question-and-answer pair based on a preset generation model, the processor further specifically implements the following steps: obtaining the FAQ question-answer pair and feeding back the obtained FAQ question-answer pair to a user; judging whether a modification instruction sent by a user is received; if the modification instruction sent by the user is received, taking the question input by the user in the modification instruction as the question to be answered, returning to execute the step of screening out the paragraph matched with the question to be answered from the target document as a target paragraph according to the question to be answered and a preset screening model; if the modification instruction sent by the user is not received, judging whether the question to be answered is a question in a preset database question template; if the question to be answered is not a question in a preset database question template, updating the question in the preset database question template according to the question to be answered; and if the question to be answered is a question in a preset database question template, not updating the question in the preset database question template.

The storage medium may be a usb disk, a removable hard disk, a Read-Only Memory (ROM), a magnetic disk, or an optical disk, which can store various computer readable storage media.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative. For example, the division of each unit is only one logic function division, and there may be another division manner in actual implementation. For example, various elements or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented.

The steps in the method of the embodiment of the invention can be sequentially adjusted, combined and deleted according to actual needs. The units in the device of the embodiment of the invention can be merged, divided and deleted according to actual needs. In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a storage medium. Based on such understanding, the technical solution of the present invention essentially or partially contributes to the prior art, or all or part of the technical solution can be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a terminal, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, while the invention has been described with respect to the above-described embodiments, it will be understood that the invention is not limited thereto but may be embodied with various modifications and changes.

While the invention has been described with reference to specific embodiments, the invention is not limited thereto, and various equivalent modifications and substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. An FAQ question-answer pair automatic construction method is characterized by comprising the following steps:

acquiring a document to be read;

2. The method according to claim 1, wherein parsing the document to be read and segmenting the parsed document to obtain a segmented document as a target document comprises:

analyzing the document to be read by adopting a cascading CRF model to obtain an XML document;

and segmenting the XML document in a preset segmentation mode to obtain a document with a preset document structure as a target document.

3. The method according to claim 1, wherein the step of screening out a paragraph matching the question to be answered from the target document as a target paragraph according to the question to be answered and a preset screening model comprises:

coding the target document according to the question to be answered and the preset screening model to obtain a first paragraph text vector;

calculating the probability of matching each first paragraph text vector with the question to be answered according to the question to be answered;

and determining the paragraph corresponding to the first paragraph text vector with the highest probability as the paragraph matched with the question to be answered, and taking the paragraph as a target paragraph.

4. The method according to claim 1, wherein the generating of the FAQ question-answer pair based on a preset reading understanding model according to the target paragraph and the question to be answered comprises:

respectively encoding the target paragraph and the question to be answered to obtain a second paragraph text vector and a question text vector;

encoding the second paragraph text vector and the problem text vector to obtain a new text vector;

coding the new text vector according to a preset extraction model to obtain a target text vector;

and calculating the target text vector to obtain the positions of the beginning and the end of the answer of the question to be answered, thereby generating the FAQ question-answer pair.

5. The method according to claim 1, wherein after generating the FAQ question-answer pair based on a preset reading understanding model according to the target paragraph and the question to be answered, the method further comprises:

and obtaining the FAQ question-answer pair and feeding back the obtained FAQ question-answer pair to a user.

6. The method according to claim 5, wherein after the obtaining the FAQ question-answer pair and feeding back the obtained FAQ question-answer pair to the user, further comprising:

judging whether a modification instruction sent by a user is received;

if the modification instruction sent by the user is received, taking the question input by the user in the modification instruction as the question to be answered;

and returning to execute the step of screening out the paragraphs matched with the questions to be answered from the target document as target paragraphs according to the questions to be answered and a preset screening model.

7. The method of claim 6, wherein after determining whether a modification instruction sent by a user is received, the method further comprises:

if the modification instruction sent by the user is not received, judging whether the question to be answered is a question in a preset database question template;

and if the question to be answered is not the question in the preset database question template, updating the question in the preset database question template according to the question to be answered.

8. An automatic FAQ question-answer pair constructing device, comprising:

9. A computer arrangement, characterized in that the computer arrangement comprises a memory having stored thereon a computer program and a processor implementing the method according to any of claims 1-7 when executing the computer program.

10. A computer-readable storage medium, characterized in that the storage medium stores a computer program which, when executed by a processor, implements the method according to any one of claims 1-7.