CN111881279A

CN111881279A - Transformer model-based question answering method, question answering device and storage device

Info

Publication number: CN111881279A
Application number: CN202010737212.3A
Authority: CN
Inventors: 骆加维; 吴信朝; 周宸; 周宝; 陈远旭
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2020-07-28
Filing date: 2020-07-28
Publication date: 2020-11-03
Also published as: WO2021139297A1

Abstract

The invention relates to the technical field of natural language processing, and particularly discloses a question-answering method, a question-answering device and a storage device based on a Transformer model. The question answering method comprises the following steps: acquiring a question text input by a user, and processing the question text to obtain a question sequence; decoding the question sequence to obtain a plurality of candidate answers related to the question sequence; splicing the question sequence with each candidate answer; and scoring each splicing result, and selecting a candidate answer corresponding to the highest score as the optimal answer of the question sequence. By the method, the problems that the answer to the question is not accurate in intention identification, the answer is not humanized and the contextual relation in the conversation process is not tight can be solved.

Description

Transformer model-based question answering method, question answering device and storage device

Technical Field

The invention relates to the technical field of natural language processing, in particular to a question answering method, a question answering device and a storage device based on a Transformer model.

Background

The traditional on-line question-answering system is built on a pipeline basis. On the premise of single round of question answering or domain knowledge question answering, the knowledge base and answers corresponding to questions in the knowledge base are preset, when a user asks for a question, the actual intention of a question of the user is recognized through the intention recognition module, the screening range of the knowledge base is narrowed through intention recognition, the questions are recalled, deep semantic similarity matching is conducted through the deep learning model, and finally text answers with high matching degree are returned. In addition to pipeline-based approaches, there is also currently a rapid growth in end-to-end based dialog systems. However, compared with the traditional knowledge base question answering, the question answering model system based on the end-to-end method has the following disadvantages: 1. answer question intent recognition is not accurate enough. 2. The reply answers are not user friendly. 3. The context during a dialog is not close enough to be more like a simple question and answer for a single round of dialog.

Disclosure of Invention

The invention provides a question-answering method, a question-answering device and a storage device based on a Transformer model, which can solve the problems that the answer intention identification is not accurate enough, the answer reply is not humanized enough and the context contact in the conversation process is not tight enough.

In order to solve the technical problems, the invention adopts a technical scheme that: a question-answering method based on a Transformer model is provided, which comprises the following steps:

acquiring a question text input by a user, and processing the question text to obtain a question sequence;

decoding the question sequence to obtain a plurality of candidate answers related to the question sequence;

splicing the question sequence with each candidate answer;

and scoring each splicing result, and selecting the candidate answer corresponding to the highest score as the optimal answer of the question sequence.

According to an embodiment of the present invention, the network structure of the Transformer model includes a decoding layer and a mutual information layer disposed behind the decoding layer, and the step of decoding the question sequence to obtain a plurality of candidate answers related to the question sequence includes:

inputting said sequence of questions into a decoding layer, outputting one of said candidate answers associated with said sequence of questions;

and circularly splicing the question sequence and the output result of the decoding layer, and inputting the spliced question sequence and the output result into the decoding layer again to obtain a plurality of candidate answers.

According to one embodiment of the invention, the decoding layer comprises: the system comprises a self-attention mechanism module, a feedforward network module and a normalization processing module which are arranged in sequence; said step of inputting said sequence of questions into said decoding layer and outputting one of said candidate answers associated with said sequence of questions comprises:

extracting the features of the question sequence by adopting the self-attention mechanism module;

carrying out nonlinear transformation on the feature extraction result by adopting the feedforward network module;

and carrying out normalization processing on the nonlinear transformation result by adopting the normalization processing module.

According to an embodiment of the present invention, the step of obtaining a question text input by a user, and processing the question text to obtain a question sequence further includes:

acquiring a question text input by a user, wherein the question text comprises a question and a dialogue sentence containing the question;

inserting tags into the question sentences and the dialogue sentences;

coding and word embedding processing are carried out on the question after the tag is inserted, and a question sequence is obtained, wherein the question sequence comprises: sequence coding and position coding, the position coding being relative position coding.

According to an embodiment of the present invention, the step of inserting tags for the question sentences and the dialogue sentences includes;

and inserting a start tag at the beginning of the question, inserting an end tag at the end of the question, and inserting a separation tag in the dialog sentence.

According to an embodiment of the present invention, the step of scoring each of the concatenation results, and selecting the candidate answer corresponding to the highest score as the optimal answer of the question sequence includes:

calculating the correlation between the question sentence sequence and the candidate answer in each splicing result based on a joint probability distribution algorithm;

scoring the correlation, wherein the higher the degree of correlation, the higher the corresponding score;

and selecting the candidate answer corresponding to the highest score as the optimal answer of the question sequence.

According to an embodiment of the present invention, the question answering method further includes:

constructing the Transformer model, wherein the network structure of the Transformer model comprises a decoding layer and a mutual information layer arranged behind the decoding layer;

and optimizing the Transformer model by adopting a loss function.

According to an embodiment of the present invention, the step of optimizing the Transformer model by using a loss function further includes:

calculating a loss deviation value of the decoding layer and a loss deviation value of the mutual information layer;

selecting the maximum value obtained by superposing the loss deviation value of the decoding layer and the loss deviation value of the mutual information layer as the loss deviation value of the Transformer model;

and updating the parameters of the Transformer model according to the loss deviation value of the Transformer model.

In order to solve the technical problem, the invention adopts another technical scheme that: provided is a question answering device based on a Transformer model, comprising:

the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring a question text input by a user and processing the question text to obtain a question sequence;

a decoding module, coupled to the obtaining module, for decoding the question sequence to obtain a plurality of candidate answers related to the question sequence;

a concatenation module, coupled to the decoding module, for concatenating the sequence of question sentences with each of the candidate answers;

and the scoring module is coupled with the splicing module and used for scoring each splicing result and selecting the candidate answer corresponding to the highest score as the optimal answer of the question sequence.

In order to solve the technical problems, the invention adopts another technical scheme that: provided is a storage device which stores a program file capable of realizing the above-described question answering method based on the Transformer model.

The invention has the beneficial effects that: by inputting the question sequence into the decoding layer, a plurality of candidate answers related to the question sequence are obtained, the diversity of answers is increased, the mechanicalness of the same answer is effectively avoided, the question sequence and each candidate answer are spliced, each splicing result is scored, the candidate answer corresponding to the highest score is selected as the optimal answer of the question sequence, the context relevance can be strengthened, and spoken answers can be effectively screened out.

Drawings

FIG. 1 is a schematic diagram of a partial network structure of a Transformer model according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart of a method for question answering based on a Transformer model according to a first embodiment of the present invention;

FIG. 3 is a schematic flow chart of step S202 in FIG. 2;

FIG. 4 is a flow chart of a Transformer model-based question-answering method according to a second embodiment of the present invention;

FIG. 5 is a schematic structural diagram of a transponder model-based question answering device according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a memory device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terms "first", "second" and "third" in the present invention are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first," "second," or "third" may explicitly or implicitly include at least one of the feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise. All directional indicators (such as up, down, left, right, front, and rear … …) in the embodiments of the present invention are only used to explain the relative positional relationship between the components, the movement, and the like in a specific posture (as shown in the drawings), and if the specific posture is changed, the directional indicator is changed accordingly. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the invention. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

Referring to fig. 1, a network structure of a Transformer model according to an embodiment of the present invention includes a decoding layer 10 and a mutual information layer 20 disposed behind the decoding layer 10, where the decoding layer 10 includes: the device comprises a self-attention mechanism module 11, a feedforward network module 12 and a normalization processing module 13 which are arranged in sequence. Fig. 2 is a schematic flow chart of a first embodiment of the method for question answering based on a Transformer model according to the present invention, and it should be noted that the method of the present invention is not limited to the flow chart shown in fig. 2 if substantially the same result is obtained. As shown in fig. 2, the method comprises the steps of:

step S201: and acquiring a question text input by a user, and processing the question text to obtain a question sequence.

In step S201, the question text includes a question and a dialogue sentence containing the question; first, tags are inserted into question and dialogue, specifically, a start tag is inserted into the beginning of a question, an end tag is inserted into the end of a question, and a separation tag is inserted into a dialogue, for example, "Beg" Query "Sep" Sen, Beg indicates the beginning of a question with dialogue open, Sep indicates the end of a question, and the following dialogue is all separated by Sep. And then, coding and word embedding the question sentence with the inserted tag to obtain a question sentence sequence. The word embedding of the embodiment adopts the NLP general model technology. The question sequence of the present embodiment includes: sequence coding and position coding, wherein the position coding is relative position coding, and the relevance of a short-distance conversation can be effectively improved by using the relative position coding.

Step S202: the question sequence is decoded to obtain a plurality of candidate answers related to the question sequence.

In step S202, the question sequence input in this embodiment is formed by adding and splicing sequence codes and position codes. Firstly, inputting a question sequence into a decoding layer, and outputting a candidate answer related to the question sequence; and after splicing the question sequence and the output result of the decoding layer, the question sequence is input into the decoding layer again to obtain a plurality of candidate answers. For example, the question sequence Q1 is first input into the decoding layer, one candidate answer a1 is output, then Q1 is spliced with a1 and input into the decoding layer again, another candidate answer a2 is output, Q1 is spliced with a2 and input into the decoding layer again, another candidate answer A3 is output, and the loop is repeated for a plurality of times to obtain candidate answers a1, a2 and A3 …. According to the method, the question sequence is input into the decoding layer, a plurality of candidate answers related to the question sequence are obtained, the diversity of answers is increased, and the mechanicalness that a user returns the same answer after inputting the question is effectively avoided.

Referring to fig. 3, step S202 further includes the following steps that are performed in sequence:

step S301: and (4) performing feature extraction on the question sequence by adopting a self-attention mechanism module.

In step S301, the self-attention mechanism module relates to attention mechanisms at different positions of a single sequence, and can calculate a representation of a question sequence, thereby effectively improving the extraction capability of the implicit semantic features of the text. In this embodiment, when a vector (formed by splicing sequence coding and position coding) is input to the decoding layer, the self-attention mechanism module multiplies the input vector by the attention weight vector, and adds the offset vector to obtain the key value, the value and the query vector of the input vector.

Step S302: and carrying out nonlinear transformation on the feature extraction result by adopting a feedforward network module.

In step S302, the feed-forward network module adopts an FFNN feed-forward network, which performs nonlinear transformation on the feature extraction result and projects the feature extraction result back to the dimension of the model.

Step S303: and a normalization processing module is adopted to perform normalization processing on the nonlinear transformation result.

In step S303, the normalization processing module performs normalization processing using a softmax function, and the normalization processing module ensures uniformity of distribution between the input and the final output of the sample, and can effectively accelerate convergence.

In an embodiment, the specific work flow of step S202 proceeds as follows: the structure of the Transformer model includes an Encoder and a Decoder.

In this embodiment, the input part of the transform model is input to the encoder and decoder via PositionEncoding (PE) by embed. In the input of the transform model, the word vector and the result of the position coding are added and then input to the encoder/decoder.

Specifically, the calculation formula of PE is as follows:

where pos refers to the position of the word in the sequence, d_modelIs the dimension of the model, 2i denotes the even dimension, 2i +1 denotes the odd dimension.

The encoder is provided with two sublayers, namely a Multi-head attention layer (a Multi-head attention mechanism) and a Feed-forward networks layer (a full link network), wherein the Multi-head attention mechanism learns the relationship inside a source sentence by using self-attention, and the full link network respectively performs the same operation on a vector at each position, including two linear transformations and a ReLU activation function.

There are three sub-layers in the decoder, a Masked Multi-head authentication layer (Multi-head mechanism of mask), a Multi-head authentication layer (Multi-head mechanism) and a Feed-forward networks layer (full link network). The multi-head attention mechanism is composed of a plurality of self-attention mechanisms. The multi-head attention mechanism of the mask is used for learning the relation in the target sentence by utilizing the self-attention mechanism, then the output of the layer and the result transmitted by the encoder are input into the multi-head attention mechanism, and the multi-head attention mechanism is not the self-attention mechanism but an encoder-decoder and is used for learning the relation between the source sentence and the target sentence.

In the multi-head attention machine system, firstly, the similarity between K (key value) and Q (query vector) is calculated to obtain S (similarity), then S is normalized through a softmax function to obtain a weight a, and finally, the weighted sum of a and V (value) is calculated to obtain an attention vector, namely K (key value), V (value) and Q (query vector). In the self-attention mechanism, K (key value), V (value) and Q (query vector) are the same. In a multi-headed attention mechanism in the decoder, Q represents the output of the last step of the decoder, and K and V are the outputs from the encoder.

An Add & Norm layer is also included above each multi-head attention mechanism, wherein Add represents residual connection (ResidualConnection) for preventing network degradation, and Norm represents a normalization layer (Layernormalization) for normalizing activation values of each layer, i.e. converting input into data with a mean value of 0 and a variance of 1, so as to avoid data falling into a saturation region of an activation function. The normalization layer is to calculate the mean and variance for each sample, not a batch of data.

The encoder and decoder of this embodiment are substantially identical, with the difference that a Mask is added. Mask may Mask certain values so that they do not play a role in parameter updating. The main purpose of using a mask in the decoder is to ensure that only the first i-1 words are used for prediction of the word at the ith position and no future information is used.

Step S203: and splicing the question sequence with each candidate answer.

In step 203, the input question sequence and the candidate answers output in step 202 are respectively spliced to obtain a plurality of splicing results. The concatenation is in the form of "Begin" Query "Sep Ans, where Query denotes question sequence and Ans denotes candidate answer. For example, question sequence Q1 was spliced to candidate answers a1, a2, A3, respectively, to obtain "Begin" Q1 "Sep" a1, "Begin" Q1 "Sep" a2, "Begin" Q1 "Sep" A3, respectively.

Step S204: and scoring each splicing result, and selecting a candidate answer corresponding to the highest score as the optimal answer of the question sequence.

In step S204, calculating the correlation between the question sequence and the candidate answer in each splicing result and scoring the correlation based on a joint probability distribution algorithm and a reverse scoring training model, wherein the higher the correlation is, the higher the corresponding score is; and selecting the candidate answer corresponding to the highest score as the optimal answer of the question sequence, so that the finally output answer is not only a proper answer in the preamble background context, but also an answer close to the whole dialogue intention.

The method for question-answering based on the Transformer model in the first embodiment of the invention increases the diversity of answers by obtaining a plurality of candidate answers related to a question sequence, effectively avoids the mechanicalness of returning the same answer after a user inputs a question, simultaneously splices the question sequence and each candidate answer, scores each spliced result, selects the candidate answer corresponding to the highest score as the optimal answer of the question sequence, can strengthen the relevance of context, and effectively screens out spoken replies.

Fig. 4 is a schematic flow chart of a Transformer model-based question-answering method according to a second embodiment of the present invention, and it should be noted that the method of the present invention is not limited to the flow chart shown in fig. 4 if substantially the same result is obtained. As shown in fig. 4, the method includes the steps of:

step S401: and constructing a Transformer model.

In step S401, the network structure of the transform model includes a decoding layer and a mutual information layer disposed behind the decoding layer, where the decoding layer includes: the device comprises a self-attention mechanism module, a feedforward network module and a normalization processing module which are arranged in sequence.

Step S402: and (5) optimizing the Transformer model by using a loss function.

In step S402, the loss function includes a loss function of the decoding layer and a loss function of the mutual information layer, and first, a loss deviation value of the decoding layer and a loss deviation value of the mutual information layer are calculated; selecting the maximum value obtained by superposing the loss deviation value of the decoding layer and the loss deviation value of the mutual information layer as the loss deviation value of the Transformer model; and updating parameters of the Transformer model according to the loss deviation value of the Transformer model.

Specifically, the calculation formula of the loss deviation value of the Transformer model is as follows:

Loss＝Max(Loss_AR+Loss_MMI) Wherein, Loss represents Loss deviation value of the transform model, Loss_ARIndicating Loss bias, Loss, of the decoding layer_MMIIndicating a loss offset value of the mutual information layer. The loss offset value of the Transformer model in this embodiment is a maximum value obtained after the loss offset value of the decoding layer and the loss offset value of the mutual information layer are superimposed, where the loss offset value of the mutual information layer in this embodiment is a variable, and in the calculation process, a result with the highest correlation between a current input question and a preamble dialog is obtained.

Further, the loss deviation value of the decoding layer is calculated according to the following formula:

wherein P represents probability, x represents words, z and T represent positions of words in the question text, z and T take integers from 1 to T, x_tRepresenting words in the t position, x_z＜tRepresenting the word before the t position.

The loss deviation value of the mutual information layer is calculated according to the following formula: loss_MMIMax (P (m/n)), where P denotes a probability, n denotes a vector of a currently input question, m denotes a vector of preamble dialog information preceding the currently input question, and P (m/n) denotes a probability of correlation of the currently input question with the preamble dialog.

Steps S403 to S406 are similar to steps S201 to S204 in fig. 2, and will not be described in detail here, and steps S401 and S402 of this embodiment may be executed before step S403 or after step S403.

The responder based on the Transformer model in the second embodiment of the invention enables the output to be more accurate and reliable by optimizing the Transformer model on the basis of the first embodiment.

Fig. 5 is a schematic structural diagram of a Transformer model-based question answering device according to an embodiment of the present invention. As shown in fig. 5, the question answering device 50 includes an acquisition module 51, a decoding module 52, a splicing module 53, and a scoring module 54.

The obtaining module 51 is configured to obtain a question text input by a user, and process the question text to obtain a question sequence.

The decoding module 52 is coupled to the obtaining module 51, and is configured to decode the question sequence to obtain a plurality of candidate answers related to the question sequence.

The concatenation module 53 is coupled to the decoding module 52 for concatenating the sequence of question sentences with each candidate answer.

The scoring module 54 is coupled to the splicing module 53, and configured to score each spliced result, and select a candidate answer corresponding to the highest score as an optimal answer of the question sequence.

Referring to fig. 6, fig. 6 is a schematic structural diagram of a memory device according to an embodiment of the invention. The storage device of the embodiment of the present invention stores a program file 61 capable of implementing all the methods described above, wherein the program file 61 may be stored in the storage device in the form of a software product, and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute all or part of the steps of the methods described in the embodiments of the present invention. The aforementioned storage device includes: various media capable of storing program codes, such as a usb disk, a mobile hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, or terminal devices, such as a computer, a server, a mobile phone, and a tablet.

In the embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A question-answering method based on a Transformer model is characterized by comprising the following steps:

splicing the question sequence with each candidate answer;

2. The question answering method according to claim 1, wherein the network structure of the fransformer model comprises a decoding layer and a mutual information layer arranged behind the decoding layer, and the step of decoding the question sequence to obtain a plurality of candidate answers related to the question sequence comprises:

3. The question-answering method according to claim 2, wherein the decoding layer comprises: the system comprises a self-attention mechanism module, a feedforward network module and a normalization processing module which are arranged in sequence; said step of inputting said sequence of questions into said decoding layer and outputting one of said candidate answers associated with said sequence of questions comprises:

4. The question-answering method according to claim 1, wherein the step of obtaining a question text input by a user and processing the question text to obtain a question sequence further comprises:

inserting tags into the question sentences and the dialogue sentences;

5. The question-answering method according to claim 4, wherein the step of inserting tags into the question sentences and the dialogue sentences includes;

6. The question-answering method according to claim 1, wherein the step of scoring each of the stitched results and selecting the candidate answer corresponding to the highest score as the optimal answer of the question sequence comprises:

7. The question-answering method according to claim 1, characterized by further comprising:

and optimizing the Transformer model by adopting a loss function.

8. The question-answering method according to claim 7, wherein the step of optimizing the Transformer model using a loss function further comprises:

9. A question answering device based on a Transformer model, the question answering device comprising:

10. A storage device, characterized by storing a program file capable of implementing the Transformer model-based question-answering method according to any one of claims 1 to 8.