CN111881279A - Transformer model-based question answering method, question answering device and storage device - Google Patents

Transformer model-based question answering method, question answering device and storage device Download PDF

Info

Publication number
CN111881279A
CN111881279A CN202010737212.3A CN202010737212A CN111881279A CN 111881279 A CN111881279 A CN 111881279A CN 202010737212 A CN202010737212 A CN 202010737212A CN 111881279 A CN111881279 A CN 111881279A
Authority
CN
China
Prior art keywords
question
sequence
module
decoding
transformer model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010737212.3A
Other languages
Chinese (zh)
Inventor
骆加维
吴信朝
周宸
周宝
陈远旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202010737212.3A priority Critical patent/CN111881279A/en
Priority to PCT/CN2020/121199 priority patent/WO2021139297A1/en
Publication of CN111881279A publication Critical patent/CN111881279A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Human Computer Interaction (AREA)
  • Evolutionary Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to the technical field of natural language processing, and particularly discloses a question-answering method, a question-answering device and a storage device based on a Transformer model. The question answering method comprises the following steps: acquiring a question text input by a user, and processing the question text to obtain a question sequence; decoding the question sequence to obtain a plurality of candidate answers related to the question sequence; splicing the question sequence with each candidate answer; and scoring each splicing result, and selecting a candidate answer corresponding to the highest score as the optimal answer of the question sequence. By the method, the problems that the answer to the question is not accurate in intention identification, the answer is not humanized and the contextual relation in the conversation process is not tight can be solved.

Description

Transformer model-based question answering method, question answering device and storage device
Technical Field
The invention relates to the technical field of natural language processing, in particular to a question answering method, a question answering device and a storage device based on a Transformer model.
Background
The traditional on-line question-answering system is built on a pipeline basis. On the premise of single round of question answering or domain knowledge question answering, the knowledge base and answers corresponding to questions in the knowledge base are preset, when a user asks for a question, the actual intention of a question of the user is recognized through the intention recognition module, the screening range of the knowledge base is narrowed through intention recognition, the questions are recalled, deep semantic similarity matching is conducted through the deep learning model, and finally text answers with high matching degree are returned. In addition to pipeline-based approaches, there is also currently a rapid growth in end-to-end based dialog systems. However, compared with the traditional knowledge base question answering, the question answering model system based on the end-to-end method has the following disadvantages: 1. answer question intent recognition is not accurate enough. 2. The reply answers are not user friendly. 3. The context during a dialog is not close enough to be more like a simple question and answer for a single round of dialog.
Disclosure of Invention
The invention provides a question-answering method, a question-answering device and a storage device based on a Transformer model, which can solve the problems that the answer intention identification is not accurate enough, the answer reply is not humanized enough and the context contact in the conversation process is not tight enough.
In order to solve the technical problems, the invention adopts a technical scheme that: a question-answering method based on a Transformer model is provided, which comprises the following steps:
acquiring a question text input by a user, and processing the question text to obtain a question sequence;
decoding the question sequence to obtain a plurality of candidate answers related to the question sequence;
splicing the question sequence with each candidate answer;
and scoring each splicing result, and selecting the candidate answer corresponding to the highest score as the optimal answer of the question sequence.
According to an embodiment of the present invention, the network structure of the Transformer model includes a decoding layer and a mutual information layer disposed behind the decoding layer, and the step of decoding the question sequence to obtain a plurality of candidate answers related to the question sequence includes:
inputting said sequence of questions into a decoding layer, outputting one of said candidate answers associated with said sequence of questions;
and circularly splicing the question sequence and the output result of the decoding layer, and inputting the spliced question sequence and the output result into the decoding layer again to obtain a plurality of candidate answers.
According to one embodiment of the invention, the decoding layer comprises: the system comprises a self-attention mechanism module, a feedforward network module and a normalization processing module which are arranged in sequence; said step of inputting said sequence of questions into said decoding layer and outputting one of said candidate answers associated with said sequence of questions comprises:
extracting the features of the question sequence by adopting the self-attention mechanism module;
carrying out nonlinear transformation on the feature extraction result by adopting the feedforward network module;
and carrying out normalization processing on the nonlinear transformation result by adopting the normalization processing module.
According to an embodiment of the present invention, the step of obtaining a question text input by a user, and processing the question text to obtain a question sequence further includes:
acquiring a question text input by a user, wherein the question text comprises a question and a dialogue sentence containing the question;
inserting tags into the question sentences and the dialogue sentences;
coding and word embedding processing are carried out on the question after the tag is inserted, and a question sequence is obtained, wherein the question sequence comprises: sequence coding and position coding, the position coding being relative position coding.
According to an embodiment of the present invention, the step of inserting tags for the question sentences and the dialogue sentences includes;
and inserting a start tag at the beginning of the question, inserting an end tag at the end of the question, and inserting a separation tag in the dialog sentence.
According to an embodiment of the present invention, the step of scoring each of the concatenation results, and selecting the candidate answer corresponding to the highest score as the optimal answer of the question sequence includes:
calculating the correlation between the question sentence sequence and the candidate answer in each splicing result based on a joint probability distribution algorithm;
scoring the correlation, wherein the higher the degree of correlation, the higher the corresponding score;
and selecting the candidate answer corresponding to the highest score as the optimal answer of the question sequence.
According to an embodiment of the present invention, the question answering method further includes:
constructing the Transformer model, wherein the network structure of the Transformer model comprises a decoding layer and a mutual information layer arranged behind the decoding layer;
and optimizing the Transformer model by adopting a loss function.
According to an embodiment of the present invention, the step of optimizing the Transformer model by using a loss function further includes:
calculating a loss deviation value of the decoding layer and a loss deviation value of the mutual information layer;
selecting the maximum value obtained by superposing the loss deviation value of the decoding layer and the loss deviation value of the mutual information layer as the loss deviation value of the Transformer model;
and updating the parameters of the Transformer model according to the loss deviation value of the Transformer model.
In order to solve the technical problem, the invention adopts another technical scheme that: provided is a question answering device based on a Transformer model, comprising:
the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring a question text input by a user and processing the question text to obtain a question sequence;
a decoding module, coupled to the obtaining module, for decoding the question sequence to obtain a plurality of candidate answers related to the question sequence;
a concatenation module, coupled to the decoding module, for concatenating the sequence of question sentences with each of the candidate answers;
and the scoring module is coupled with the splicing module and used for scoring each splicing result and selecting the candidate answer corresponding to the highest score as the optimal answer of the question sequence.
In order to solve the technical problems, the invention adopts another technical scheme that: provided is a storage device which stores a program file capable of realizing the above-described question answering method based on the Transformer model.
The invention has the beneficial effects that: by inputting the question sequence into the decoding layer, a plurality of candidate answers related to the question sequence are obtained, the diversity of answers is increased, the mechanicalness of the same answer is effectively avoided, the question sequence and each candidate answer are spliced, each splicing result is scored, the candidate answer corresponding to the highest score is selected as the optimal answer of the question sequence, the context relevance can be strengthened, and spoken answers can be effectively screened out.
Drawings
FIG. 1 is a schematic diagram of a partial network structure of a Transformer model according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of a method for question answering based on a Transformer model according to a first embodiment of the present invention;
FIG. 3 is a schematic flow chart of step S202 in FIG. 2;
FIG. 4 is a flow chart of a Transformer model-based question-answering method according to a second embodiment of the present invention;
FIG. 5 is a schematic structural diagram of a transponder model-based question answering device according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a memory device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terms "first", "second" and "third" in the present invention are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first," "second," or "third" may explicitly or implicitly include at least one of the feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise. All directional indicators (such as up, down, left, right, front, and rear … …) in the embodiments of the present invention are only used to explain the relative positional relationship between the components, the movement, and the like in a specific posture (as shown in the drawings), and if the specific posture is changed, the directional indicator is changed accordingly. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the invention. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
Referring to fig. 1, a network structure of a Transformer model according to an embodiment of the present invention includes a decoding layer 10 and a mutual information layer 20 disposed behind the decoding layer 10, where the decoding layer 10 includes: the device comprises a self-attention mechanism module 11, a feedforward network module 12 and a normalization processing module 13 which are arranged in sequence. Fig. 2 is a schematic flow chart of a first embodiment of the method for question answering based on a Transformer model according to the present invention, and it should be noted that the method of the present invention is not limited to the flow chart shown in fig. 2 if substantially the same result is obtained. As shown in fig. 2, the method comprises the steps of:
step S201: and acquiring a question text input by a user, and processing the question text to obtain a question sequence.
In step S201, the question text includes a question and a dialogue sentence containing the question; first, tags are inserted into question and dialogue, specifically, a start tag is inserted into the beginning of a question, an end tag is inserted into the end of a question, and a separation tag is inserted into a dialogue, for example, "Beg" Query "Sep" Sen, Beg indicates the beginning of a question with dialogue open, Sep indicates the end of a question, and the following dialogue is all separated by Sep. And then, coding and word embedding the question sentence with the inserted tag to obtain a question sentence sequence. The word embedding of the embodiment adopts the NLP general model technology. The question sequence of the present embodiment includes: sequence coding and position coding, wherein the position coding is relative position coding, and the relevance of a short-distance conversation can be effectively improved by using the relative position coding.
Step S202: the question sequence is decoded to obtain a plurality of candidate answers related to the question sequence.
In step S202, the question sequence input in this embodiment is formed by adding and splicing sequence codes and position codes. Firstly, inputting a question sequence into a decoding layer, and outputting a candidate answer related to the question sequence; and after splicing the question sequence and the output result of the decoding layer, the question sequence is input into the decoding layer again to obtain a plurality of candidate answers. For example, the question sequence Q1 is first input into the decoding layer, one candidate answer a1 is output, then Q1 is spliced with a1 and input into the decoding layer again, another candidate answer a2 is output, Q1 is spliced with a2 and input into the decoding layer again, another candidate answer A3 is output, and the loop is repeated for a plurality of times to obtain candidate answers a1, a2 and A3 …. According to the method, the question sequence is input into the decoding layer, a plurality of candidate answers related to the question sequence are obtained, the diversity of answers is increased, and the mechanicalness that a user returns the same answer after inputting the question is effectively avoided.
Referring to fig. 3, step S202 further includes the following steps that are performed in sequence:
step S301: and (4) performing feature extraction on the question sequence by adopting a self-attention mechanism module.
In step S301, the self-attention mechanism module relates to attention mechanisms at different positions of a single sequence, and can calculate a representation of a question sequence, thereby effectively improving the extraction capability of the implicit semantic features of the text. In this embodiment, when a vector (formed by splicing sequence coding and position coding) is input to the decoding layer, the self-attention mechanism module multiplies the input vector by the attention weight vector, and adds the offset vector to obtain the key value, the value and the query vector of the input vector.
Step S302: and carrying out nonlinear transformation on the feature extraction result by adopting a feedforward network module.
In step S302, the feed-forward network module adopts an FFNN feed-forward network, which performs nonlinear transformation on the feature extraction result and projects the feature extraction result back to the dimension of the model.
Step S303: and a normalization processing module is adopted to perform normalization processing on the nonlinear transformation result.
In step S303, the normalization processing module performs normalization processing using a softmax function, and the normalization processing module ensures uniformity of distribution between the input and the final output of the sample, and can effectively accelerate convergence.
In an embodiment, the specific work flow of step S202 proceeds as follows: the structure of the Transformer model includes an Encoder and a Decoder.
In this embodiment, the input part of the transform model is input to the encoder and decoder via PositionEncoding (PE) by embed. In the input of the transform model, the word vector and the result of the position coding are added and then input to the encoder/decoder.
Specifically, the calculation formula of PE is as follows:
Figure BDA0002605421580000071
Figure BDA0002605421580000072
where pos refers to the position of the word in the sequence, dmodelIs the dimension of the model, 2i denotes the even dimension, 2i +1 denotes the odd dimension.
The encoder is provided with two sublayers, namely a Multi-head attention layer (a Multi-head attention mechanism) and a Feed-forward networks layer (a full link network), wherein the Multi-head attention mechanism learns the relationship inside a source sentence by using self-attention, and the full link network respectively performs the same operation on a vector at each position, including two linear transformations and a ReLU activation function.
There are three sub-layers in the decoder, a Masked Multi-head authentication layer (Multi-head mechanism of mask), a Multi-head authentication layer (Multi-head mechanism) and a Feed-forward networks layer (full link network). The multi-head attention mechanism is composed of a plurality of self-attention mechanisms. The multi-head attention mechanism of the mask is used for learning the relation in the target sentence by utilizing the self-attention mechanism, then the output of the layer and the result transmitted by the encoder are input into the multi-head attention mechanism, and the multi-head attention mechanism is not the self-attention mechanism but an encoder-decoder and is used for learning the relation between the source sentence and the target sentence.
In the multi-head attention machine system, firstly, the similarity between K (key value) and Q (query vector) is calculated to obtain S (similarity), then S is normalized through a softmax function to obtain a weight a, and finally, the weighted sum of a and V (value) is calculated to obtain an attention vector, namely K (key value), V (value) and Q (query vector). In the self-attention mechanism, K (key value), V (value) and Q (query vector) are the same. In a multi-headed attention mechanism in the decoder, Q represents the output of the last step of the decoder, and K and V are the outputs from the encoder.
An Add & Norm layer is also included above each multi-head attention mechanism, wherein Add represents residual connection (ResidualConnection) for preventing network degradation, and Norm represents a normalization layer (Layernormalization) for normalizing activation values of each layer, i.e. converting input into data with a mean value of 0 and a variance of 1, so as to avoid data falling into a saturation region of an activation function. The normalization layer is to calculate the mean and variance for each sample, not a batch of data.
The encoder and decoder of this embodiment are substantially identical, with the difference that a Mask is added. Mask may Mask certain values so that they do not play a role in parameter updating. The main purpose of using a mask in the decoder is to ensure that only the first i-1 words are used for prediction of the word at the ith position and no future information is used.
Step S203: and splicing the question sequence with each candidate answer.
In step 203, the input question sequence and the candidate answers output in step 202 are respectively spliced to obtain a plurality of splicing results. The concatenation is in the form of "Begin" Query "Sep Ans, where Query denotes question sequence and Ans denotes candidate answer. For example, question sequence Q1 was spliced to candidate answers a1, a2, A3, respectively, to obtain "Begin" Q1 "Sep" a1, "Begin" Q1 "Sep" a2, "Begin" Q1 "Sep" A3, respectively.
Step S204: and scoring each splicing result, and selecting a candidate answer corresponding to the highest score as the optimal answer of the question sequence.
In step S204, calculating the correlation between the question sequence and the candidate answer in each splicing result and scoring the correlation based on a joint probability distribution algorithm and a reverse scoring training model, wherein the higher the correlation is, the higher the corresponding score is; and selecting the candidate answer corresponding to the highest score as the optimal answer of the question sequence, so that the finally output answer is not only a proper answer in the preamble background context, but also an answer close to the whole dialogue intention.
The method for question-answering based on the Transformer model in the first embodiment of the invention increases the diversity of answers by obtaining a plurality of candidate answers related to a question sequence, effectively avoids the mechanicalness of returning the same answer after a user inputs a question, simultaneously splices the question sequence and each candidate answer, scores each spliced result, selects the candidate answer corresponding to the highest score as the optimal answer of the question sequence, can strengthen the relevance of context, and effectively screens out spoken replies.
Fig. 4 is a schematic flow chart of a Transformer model-based question-answering method according to a second embodiment of the present invention, and it should be noted that the method of the present invention is not limited to the flow chart shown in fig. 4 if substantially the same result is obtained. As shown in fig. 4, the method includes the steps of:
step S401: and constructing a Transformer model.
In step S401, the network structure of the transform model includes a decoding layer and a mutual information layer disposed behind the decoding layer, where the decoding layer includes: the device comprises a self-attention mechanism module, a feedforward network module and a normalization processing module which are arranged in sequence.
Step S402: and (5) optimizing the Transformer model by using a loss function.
In step S402, the loss function includes a loss function of the decoding layer and a loss function of the mutual information layer, and first, a loss deviation value of the decoding layer and a loss deviation value of the mutual information layer are calculated; selecting the maximum value obtained by superposing the loss deviation value of the decoding layer and the loss deviation value of the mutual information layer as the loss deviation value of the Transformer model; and updating parameters of the Transformer model according to the loss deviation value of the Transformer model.
Specifically, the calculation formula of the loss deviation value of the Transformer model is as follows:
Loss=Max(LossAR+LossMMI) Wherein, Loss represents Loss deviation value of the transform model, LossARIndicating Loss bias, Loss, of the decoding layerMMIIndicating a loss offset value of the mutual information layer. The loss offset value of the Transformer model in this embodiment is a maximum value obtained after the loss offset value of the decoding layer and the loss offset value of the mutual information layer are superimposed, where the loss offset value of the mutual information layer in this embodiment is a variable, and in the calculation process, a result with the highest correlation between a current input question and a preamble dialog is obtained.
Further, the loss deviation value of the decoding layer is calculated according to the following formula:
Figure BDA0002605421580000091
Figure BDA0002605421580000092
wherein P represents probability, x represents words, z and T represent positions of words in the question text, z and T take integers from 1 to T, xtRepresenting words in the t position, xz<tRepresenting the word before the t position.
The loss deviation value of the mutual information layer is calculated according to the following formula: lossMMIMax (P (m/n)), where P denotes a probability, n denotes a vector of a currently input question, m denotes a vector of preamble dialog information preceding the currently input question, and P (m/n) denotes a probability of correlation of the currently input question with the preamble dialog.
Steps S403 to S406 are similar to steps S201 to S204 in fig. 2, and will not be described in detail here, and steps S401 and S402 of this embodiment may be executed before step S403 or after step S403.
The responder based on the Transformer model in the second embodiment of the invention enables the output to be more accurate and reliable by optimizing the Transformer model on the basis of the first embodiment.
Fig. 5 is a schematic structural diagram of a Transformer model-based question answering device according to an embodiment of the present invention. As shown in fig. 5, the question answering device 50 includes an acquisition module 51, a decoding module 52, a splicing module 53, and a scoring module 54.
The obtaining module 51 is configured to obtain a question text input by a user, and process the question text to obtain a question sequence.
The decoding module 52 is coupled to the obtaining module 51, and is configured to decode the question sequence to obtain a plurality of candidate answers related to the question sequence.
The concatenation module 53 is coupled to the decoding module 52 for concatenating the sequence of question sentences with each candidate answer.
The scoring module 54 is coupled to the splicing module 53, and configured to score each spliced result, and select a candidate answer corresponding to the highest score as an optimal answer of the question sequence.
Referring to fig. 6, fig. 6 is a schematic structural diagram of a memory device according to an embodiment of the invention. The storage device of the embodiment of the present invention stores a program file 61 capable of implementing all the methods described above, wherein the program file 61 may be stored in the storage device in the form of a software product, and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute all or part of the steps of the methods described in the embodiments of the present invention. The aforementioned storage device includes: various media capable of storing program codes, such as a usb disk, a mobile hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, or terminal devices, such as a computer, a server, a mobile phone, and a tablet.
In the embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A question-answering method based on a Transformer model is characterized by comprising the following steps:
acquiring a question text input by a user, and processing the question text to obtain a question sequence;
decoding the question sequence to obtain a plurality of candidate answers related to the question sequence;
splicing the question sequence with each candidate answer;
and scoring each splicing result, and selecting the candidate answer corresponding to the highest score as the optimal answer of the question sequence.
2. The question answering method according to claim 1, wherein the network structure of the fransformer model comprises a decoding layer and a mutual information layer arranged behind the decoding layer, and the step of decoding the question sequence to obtain a plurality of candidate answers related to the question sequence comprises:
inputting said sequence of questions into a decoding layer, outputting one of said candidate answers associated with said sequence of questions;
and circularly splicing the question sequence and the output result of the decoding layer, and inputting the spliced question sequence and the output result into the decoding layer again to obtain a plurality of candidate answers.
3. The question-answering method according to claim 2, wherein the decoding layer comprises: the system comprises a self-attention mechanism module, a feedforward network module and a normalization processing module which are arranged in sequence; said step of inputting said sequence of questions into said decoding layer and outputting one of said candidate answers associated with said sequence of questions comprises:
extracting the features of the question sequence by adopting the self-attention mechanism module;
carrying out nonlinear transformation on the feature extraction result by adopting the feedforward network module;
and carrying out normalization processing on the nonlinear transformation result by adopting the normalization processing module.
4. The question-answering method according to claim 1, wherein the step of obtaining a question text input by a user and processing the question text to obtain a question sequence further comprises:
acquiring a question text input by a user, wherein the question text comprises a question and a dialogue sentence containing the question;
inserting tags into the question sentences and the dialogue sentences;
coding and word embedding processing are carried out on the question after the tag is inserted, and a question sequence is obtained, wherein the question sequence comprises: sequence coding and position coding, the position coding being relative position coding.
5. The question-answering method according to claim 4, wherein the step of inserting tags into the question sentences and the dialogue sentences includes;
and inserting a start tag at the beginning of the question, inserting an end tag at the end of the question, and inserting a separation tag in the dialog sentence.
6. The question-answering method according to claim 1, wherein the step of scoring each of the stitched results and selecting the candidate answer corresponding to the highest score as the optimal answer of the question sequence comprises:
calculating the correlation between the question sentence sequence and the candidate answer in each splicing result based on a joint probability distribution algorithm;
scoring the correlation, wherein the higher the degree of correlation, the higher the corresponding score;
and selecting the candidate answer corresponding to the highest score as the optimal answer of the question sequence.
7. The question-answering method according to claim 1, characterized by further comprising:
constructing the Transformer model, wherein the network structure of the Transformer model comprises a decoding layer and a mutual information layer arranged behind the decoding layer;
and optimizing the Transformer model by adopting a loss function.
8. The question-answering method according to claim 7, wherein the step of optimizing the Transformer model using a loss function further comprises:
calculating a loss deviation value of the decoding layer and a loss deviation value of the mutual information layer;
selecting the maximum value obtained by superposing the loss deviation value of the decoding layer and the loss deviation value of the mutual information layer as the loss deviation value of the Transformer model;
and updating the parameters of the Transformer model according to the loss deviation value of the Transformer model.
9. A question answering device based on a Transformer model, the question answering device comprising:
the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring a question text input by a user and processing the question text to obtain a question sequence;
a decoding module, coupled to the obtaining module, for decoding the question sequence to obtain a plurality of candidate answers related to the question sequence;
a concatenation module, coupled to the decoding module, for concatenating the sequence of question sentences with each of the candidate answers;
and the scoring module is coupled with the splicing module and used for scoring each splicing result and selecting the candidate answer corresponding to the highest score as the optimal answer of the question sequence.
10. A storage device, characterized by storing a program file capable of implementing the Transformer model-based question-answering method according to any one of claims 1 to 8.
CN202010737212.3A 2020-07-28 2020-07-28 Transformer model-based question answering method, question answering device and storage device Pending CN111881279A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010737212.3A CN111881279A (en) 2020-07-28 2020-07-28 Transformer model-based question answering method, question answering device and storage device
PCT/CN2020/121199 WO2021139297A1 (en) 2020-07-28 2020-10-15 Question-answer method and question-answer apparatus based on transformer model, and storage apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010737212.3A CN111881279A (en) 2020-07-28 2020-07-28 Transformer model-based question answering method, question answering device and storage device

Publications (1)

Publication Number Publication Date
CN111881279A true CN111881279A (en) 2020-11-03

Family

ID=73201394

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010737212.3A Pending CN111881279A (en) 2020-07-28 2020-07-28 Transformer model-based question answering method, question answering device and storage device

Country Status (2)

Country Link
CN (1) CN111881279A (en)
WO (1) WO2021139297A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112612881A (en) * 2020-12-28 2021-04-06 电子科技大学 Chinese intelligent dialogue method based on Transformer
CN113064972A (en) * 2021-04-12 2021-07-02 平安国际智慧城市科技股份有限公司 Intelligent question and answer method, device, equipment and storage medium
CN113704437A (en) * 2021-09-03 2021-11-26 重庆邮电大学 Knowledge base question-answering method integrating multi-head attention mechanism and relative position coding
CN114328908A (en) * 2021-11-08 2022-04-12 腾讯科技(深圳)有限公司 Question and answer sentence quality inspection method and device and related products
CN116737894A (en) * 2023-06-02 2023-09-12 深圳市客一客信息科技有限公司 Intelligent robot service system based on model training

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113704443B (en) * 2021-09-08 2023-10-13 天津大学 Dialog generation method integrating explicit personalized information and implicit personalized information
CN115080715B (en) * 2022-05-30 2023-05-30 重庆理工大学 Span extraction reading understanding method based on residual structure and bidirectional fusion attention
CN116737888B (en) * 2023-01-11 2024-05-17 北京百度网讯科技有限公司 Training method of dialogue generation model and method and device for determining reply text
CN116595339A (en) * 2023-07-19 2023-08-15 东方空间技术(山东)有限公司 Intelligent processing method, device and equipment for space data
CN117992599A (en) * 2024-04-07 2024-05-07 腾讯科技(深圳)有限公司 Question and answer method and device based on large language model and computer equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110502627A (en) * 2019-08-28 2019-11-26 上海海事大学 A kind of answer generation method based on multilayer Transformer polymerization encoder
CN110543557A (en) * 2019-09-06 2019-12-06 北京工业大学 construction method of medical intelligent question-answering system based on attention mechanism
CN110543552A (en) * 2019-09-06 2019-12-06 网易(杭州)网络有限公司 Conversation interaction method and device and electronic equipment

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019222751A1 (en) * 2018-05-18 2019-11-21 Google Llc Universal transformers
CN110647619B (en) * 2019-08-01 2023-05-05 中山大学 General knowledge question-answering method based on question generation and convolutional neural network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110502627A (en) * 2019-08-28 2019-11-26 上海海事大学 A kind of answer generation method based on multilayer Transformer polymerization encoder
CN110543557A (en) * 2019-09-06 2019-12-06 北京工业大学 construction method of medical intelligent question-answering system based on attention mechanism
CN110543552A (en) * 2019-09-06 2019-12-06 网易(杭州)网络有限公司 Conversation interaction method and device and electronic equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
WEIXIN_43917778: "基于Transformer模型的智能问答原理详解(学习笔记)", pages 1 - 10, Retrieved from the Internet <URL:https://blog.csdn.net/weixin_43917778/article/details/1001133677> *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112612881A (en) * 2020-12-28 2021-04-06 电子科技大学 Chinese intelligent dialogue method based on Transformer
CN112612881B (en) * 2020-12-28 2022-03-25 电子科技大学 Chinese intelligent dialogue method based on Transformer
CN113064972A (en) * 2021-04-12 2021-07-02 平安国际智慧城市科技股份有限公司 Intelligent question and answer method, device, equipment and storage medium
CN113704437A (en) * 2021-09-03 2021-11-26 重庆邮电大学 Knowledge base question-answering method integrating multi-head attention mechanism and relative position coding
CN113704437B (en) * 2021-09-03 2023-08-11 重庆邮电大学 Knowledge base question-answering method integrating multi-head attention mechanism and relative position coding
CN114328908A (en) * 2021-11-08 2022-04-12 腾讯科技(深圳)有限公司 Question and answer sentence quality inspection method and device and related products
CN116737894A (en) * 2023-06-02 2023-09-12 深圳市客一客信息科技有限公司 Intelligent robot service system based on model training
CN116737894B (en) * 2023-06-02 2024-02-20 深圳市客一客信息科技有限公司 Intelligent robot service system based on model training

Also Published As

Publication number Publication date
WO2021139297A1 (en) 2021-07-15

Similar Documents

Publication Publication Date Title
CN111881279A (en) Transformer model-based question answering method, question answering device and storage device
CN111309889B (en) Method and device for text processing
CN112214591B (en) Dialog prediction method and device
CN111241237A (en) Intelligent question and answer data processing method and device based on operation and maintenance service
CN110990555B (en) End-to-end retrieval type dialogue method and system and computer equipment
WO2020233131A1 (en) Question-and-answer processing method and apparatus, computer device and storage medium
CN116579339B (en) Task execution method and optimization task execution method
CN111680510B (en) Text processing method and device, computer equipment and storage medium
CN111079418A (en) Named body recognition method and device, electronic equipment and storage medium
CN116975288A (en) Text processing method and text processing model training method
CN115115984A (en) Video data processing method, apparatus, program product, computer device, and medium
CN110053055A (en) A kind of robot and its method answered a question, storage medium
CN113051384A (en) User portrait extraction method based on conversation and related device
CN116737883A (en) Man-machine interaction method, device, equipment and storage medium
CN117114063A (en) Method for training a generative large language model and for processing image tasks
CN117056474A (en) Session response method and device, electronic equipment and storage medium
CN113792133B (en) Question judging method and device, electronic equipment and medium
CN110795531A (en) Intention identification method, device and storage medium
CN116108918A (en) Training method and related device for dialogue pre-training model
CN112800191B (en) Question and answer method and device based on picture and computer readable storage medium
CN115858756A (en) Shared emotion man-machine conversation system based on perception emotional tendency
Tanaka et al. End-to-end rich transcription-style automatic speech recognition with semi-supervised learning
CN114117001A (en) Reference resolution method and device and reference resolution model training method and device
CN116915916A (en) Call processing method, device, electronic equipment and medium
CN111680136A (en) Method and device for matching spoken language and semantics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination