CN112732879B - Downstream task processing method and model of question-answering task - Google Patents

Downstream task processing method and model of question-answering task Download PDF

Info

Publication number
CN112732879B
CN112732879B CN202011539404.XA CN202011539404A CN112732879B CN 112732879 B CN112732879 B CN 112732879B CN 202011539404 A CN202011539404 A CN 202011539404A CN 112732879 B CN112732879 B CN 112732879B
Authority
CN
China
Prior art keywords
context
question
representation
vector
ckey
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011539404.XA
Other languages
Chinese (zh)
Other versions
CN112732879A (en
Inventor
王勇
雷冲
陈秋怡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Jiulai Technology Co ltd
Original Assignee
Chongqing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Technology filed Critical Chongqing University of Technology
Priority to CN202011539404.XA priority Critical patent/CN112732879B/en
Publication of CN112732879A publication Critical patent/CN112732879A/en
Application granted granted Critical
Publication of CN112732879B publication Critical patent/CN112732879B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a method and a model for processing a downstream task of a question-answering task, which are used for obtaining a context expression H of key information perceptionCKeyAnd problem representation of key information perception HQKeyGenerating a problem-aware context representation G; calculating an update vector z and a memory weight G based on G, updating G to obtain an output vector Gg(ii) a Generating a context granularity vector GCAnd sequence granularity vector GCLSGenerating an output vector CoutUsing softmax to calculate the probability of each word in the context as the start-stop position of the answer, and extracting the continuous subsequence with the highest probability as the answer. The invention provides a bidirectional cascade attention mechanism, constructs a mechanism taking accurate reading and skimming as a whole and a multi-granularity module based on a grain calculation idea, so that the model effectively pays attention to and screens effective information, better understands texts under various granularities, gives more accurate answers, and makes new progress in performance on the basis of a baseline model.

Description

Downstream task processing method and model of question-answering task
Technical Field
The invention belongs to the technical field of natural language processing, and particularly relates to a method and a model for processing a downstream task of a question and answer task.
Background
Machine-reading understanding is a very challenging task in natural language processing, aiming to determine the correct answer to a question according to a given context. Common machine reading understanding tasks are divided into full-form fill-in, multiple choices, segment extraction and free answer according to answer forms. The newly developed pre-training language model achieves a series of successes in various natural language understanding tasks by virtue of the powerful text representation capability. The pre-trained language models are used as encoders of deep learning language models, and are used for extracting language association characteristics of relevant texts and carrying out fine adjustment in combination with a downstream data processing structure specific to a specific task. With the great success of the development of pre-trained language models, people focus more attention on the encoder end of the deep learning language model, leading to the development of downstream processing technology customized for specific tasks entering the bottleneck. Although one can directly benefit from a variety of powerful coders with similar structures, applying the general knowledge coding implied in large-scale corpora to language models with very large-scale parameters is a time and resource consuming matter. And the current language representation coding technology is slowly developed, so that the further improvement of the performance of the pre-training language model is limited. These have all highlighted the importance of developing downstream processing techniques under specific tasks.
In summary, the existing deep learning language model has the following disadvantages: (1) the unimportant part in the text is emphasized, and the important part is ignored; (2) there is a phenomenon of being unstable, that is, easily affected by an interfering sentence in a text having a plurality of words identical to the question, and only matching by the character itself is possible, and semantic matching is not possible.
Therefore, the preference of how to focus the model on the key information in the text and help the model jump out of the local information too much focused on the text becomes an urgent problem to be solved by those skilled in the art.
Disclosure of Invention
Aiming at the defects in the prior art, the problems to be solved by the invention are as follows: the model is made to focus on key information in the text and a preference to help the model jump out of local information that is too focused on the text.
In order to solve the technical problems, the invention adopts the following technical scheme:
a downstream task processing method of a question-answering task comprises the following steps:
s1, inputting the question and the context into a pre-training language module to obtain the language association characteristics of the context;
s2 obtaining context representation H of key information perception based on language association characteristics of context by utilizing bidirectional attention mechanismCKeyAnd problem representation of key information perception HQKey
S3 context representation H based on key information perception using bidirectional attention flowCKeyAnd problem representation of key information perception HQKeyObtaining a problem-aware context representation G;
s4, calculating an update vector z and a memory weight G based on the problem-aware context representation G by using a gate mechanism, and updating the problem-aware context representation G by using the update vector z and the memory weight G to obtain an output vector Gg
S5, calculating context-based by using particlesLanguage-dependent feature generation context granularity vector GCAnd sequence granularity vector GCLSBased on context granularity vector GCSequence size vector GCLSAnd output vector GgGenerating an output vector C of a multi-angle understanding context and a context global and local relationout
S6, generating multi-angle understanding context and output vector C of overall and local relation of context based on language association characteristics of contextoutAfter the linear layer processing, the probability of each word in the computation context as the starting and ending position of the answer is calculated by using softmax, and the continuous subsequence with the maximum probability is extracted as the answer.
Preferably, the language-dependent feature of the context is H, H ═ H1,h2,h3,...,hs},h1To hsRepresenting the coded representation of the sequence of the concatenation of the context and the question, S representing the length of the sequence of the concatenation of the context and the question, step S2 comprising:
s201, intercepting a question intercepting part H based on the context and the position of the question in the HQAnd a context intercept part HC,HQ={h2,h3,h4,...,hn+1},HC={hn+3,hn+4,...,hn+m+2N represents the length of the question word, and m represents the length of the context word;
s202, constructing a similarity matrix S,
S=WS(HC,HQ,HC·HQ)
in the formula, WSIs a trainable matrix;
s203, performing softmax operation on each row and each column of the similarity matrix S to obtain S1And S2,S1Representing for each context word the relevance of all question words to it; s2Representing for each question word the relevance of all context words to it; s1=softmax(S),S2=softmax(S);
S204, highlighting the weight of the question key words and the context key words;
s205, generating a context representation H of key information perception based on the following formulaCKeyAnd problem representation of key information perception HQKey
HCKey=HC+HC⊙AC
HQKey=HQ+HQ⊙AQ
In the formula, ACRepresenting the context-critical section attention associated with the question keyword, AQRepresenting a question key part attention associated with the context keyword;
AC=S2·SQkey
AQ=S1·SCkey
in the formula, SQkeyQuestion weight, S, representing a salient keywordCkeyA contextual weight representing a salient keyword;
SQkey=mean(S1)
SCkey=mean(S2)。
preferably, step S3 includes:
s301, obtaining the following formula
Figure GDA0002958557340000031
And
Figure GDA0002958557340000032
Figure GDA0002958557340000033
Figure GDA0002958557340000034
S′=WS′(HCKey,HQKey,HCKey·HQKey)
in the formula (I), the compound is shown in the specification,
Figure GDA0002958557340000035
indicating for each context word, the relevance of all question words to it,
Figure GDA0002958557340000036
representing the correlation between all the context words and each question word, S' representing the correlation between the question words and the context words after obtaining the key information, and recalculating the correlation between the question words and the context words (same operation as S202), WS′Represented as a trainable matrix;
s302, calculating a context expression A based on the question words (which refer to the expressions of all the question words, wherein the weights of all the question words after averaging are also the weights of the key words in the question words, but the weights of the key words in the question words can be highlighted in an averaging mode) and a question word expression B based on the context words;
Figure GDA0002958557340000041
Figure GDA0002958557340000042
s303, adding HCKeySplicing the A and the B to obtain a context expression G of problem perception;
G=W3([HcKey;HcKey⊙A;HcKey⊙B])+b3
W3and b3Trainable matrices and biases, respectively.
Preferably, step S3 further includes:
s304, representing the problem of key information perception HQKeyAnd question-aware context representation G as H in step S2QAnd HCAnd repeating the steps S2 and S3 until a final problem-aware context representation G is obtained after cycling for a preset number of times.
Preferably, in step S4:
z=tanh(Wz·G+bz)
g=sigmoid(Wg[G;A]+bg)
Figure GDA0002958557340000043
in the formula, Wz,Wg,bzAnd bgIs a trainable matrix and bias.
Preferably, step S5 includes:
s501, removing HCOf (1) [ PAD ]]After filling part, averaging to obtain context granularity vector GC
S502, extracting [ CLS ] in H]Identifier as a sequence granularity vector GCLS
S503, calculating language association characteristics based on the context based on the following formula to generate an output vector C of the multi-angle understanding context and the relation between the context and the local partout
Cout=W4·(Cg+GC+GCLS)+b4
In the formula, W4And b4Trainable matrices and biases, respectively.
A question-answering task downstream task processing model is used for realizing the question-answering task downstream task processing method, and comprises the following steps:
a pre-training language module for generating language-associated features of a context based on a question and the context;
a skimming module for deriving a key information-aware context representation H using a context-based language-dependent feature of a bidirectional attention mechanismCKeyAnd problem representation of key information perception HQKey
A perusal module for context representation H based on key information perception using bi-directional attention flowCKeyAnd problem representation of key information perception HQKeyObtaining a problem-aware context representation G;
the door mechanism module is used for calculating an update vector z and a memory weight G by utilizing the door mechanism based on the context representation G of the problem perception, and obtaining an output vector G by utilizing the update vector z and the memory weight G to update the context representation G of the problem perceptiong
A particle computation module for generating a context granularity vector G using particle computation context-based language-dependent featuresCAnd sequence granularity vector GCLSBased on context granularity vector GCSequence size vector GCLSAnd output vector GgGenerating an output vector C of a multi-angle understanding context and a context global and local relationout
An answer prediction module for generating an output vector C of the multi-angle understanding context and the relation between the context and the local part based on the language association characteristics of the contextoutAfter linear layer processing, the probability of each word in the context as the start-stop position of the answer is calculated by using softmax, and the continuous subsequence with the highest probability is extracted as the answer.
Preferably, the loss function of the downstream task processing model of the question-answering task in the training process is as follows:
Figure GDA0002958557340000051
in the formula (I), the compound is shown in the specification,
Figure GDA0002958557340000052
and
Figure GDA0002958557340000053
respectively representing the start position and the end position of the true answer of the ith sample, N being the total number of samples,
Figure GDA0002958557340000054
the representation represents the predicted probability of the starting position of the real answer at the time of model inference,
Figure GDA0002958557340000055
to representThe predicted probability of the end position of the true answer.
In summary, compared with the prior art, the invention discloses a method for processing the downstream task of the question-answering task, which adds a downstream processing structure on the basis of a pre-training model, comprises a skimming module, a precision reading module and a door mechanism module, and can simulate the behavior of reading and comprehensively screening information for many times when a human reads and understands a task. The skimming module is used for helping the model to determine the context representation of key information perception and the problem representation of key information perception; the precision reading module sends the vector output from the encoder end to a bidirectional attention flow layer to establish a complete incidence relation between the problem and the context; meanwhile, according to the thought of particle calculation, a multi-particle module for calculating context particle size and sequence particle size is added into the model, and a parallel structure is formed by the multi-particle module and the word particle size obtained by the multi-particle module, so that the model can simulate the behavior of human beings from words to sentences and from local parts to whole comprehension of texts.
Drawings
FIG. 1 is a flow chart of a method for processing a downstream task of a question-answering task according to the present invention;
FIG. 2 is a block diagram of a downstream task processing model for a question-answering task as disclosed herein;
FIG. 3 is a skimming module (bidirectional stacked attention mechanism);
FIG. 4 is a comparison of RoBERTA and F1 of the present invention in broken lines;
FIG. 5 is a graph of a polyline comparison of RoBERTA and EM of the present invention;
FIG. 6 is a diagram of a key portion of a question key and its associated context;
FIG. 7 is a diagram of a key portion of a context keyword and associated question words.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings.
As shown in fig. 1, the present invention discloses a method for processing a downstream task of a question and answer task, which comprises the following steps:
s1, inputting the question and the context into a pre-training language module to obtain the language association characteristics of the context;
the specific operation is to send the sequence composed of the problem and the context splicing into a pre-training language module, namely an encoder for encoding.
S2 obtaining context representation H of key information perception based on language association characteristics of context by utilizing bidirectional attention mechanismCKeyAnd problem representation of key information perception HQKey
S3 context representation H based on key information perception using bidirectional attention flowCKeyAnd problem representation of key information perception HQKeyObtaining a problem-aware context representation G;
s4, calculating an update vector z and a memory weight G based on the problem-aware context representation G by using a gate mechanism, and updating the problem-aware context representation G by using the update vector z and the memory weight G to obtain an output vector Gg
S5, generating context granularity vector G based on language correlation characteristics of context by using granularity calculationCAnd sequence granularity vector GCLSBased on context granularity vector GCSequence size vector GCLSAnd output vector GgGenerating an output vector C of a multi-angle understanding context and a context global and local relationout
S6, generating multi-angle understanding context and output vector C of overall and local relation of context based on language association characteristics of contextoutAfter linear layer processing, the probability of each word in the context as the start-stop position of the answer is calculated by using softmax, and the continuous subsequence with the highest probability is extracted as the answer.
Aiming at the problem that the unimportant part in the text is regarded as important and the important part is ignored in the prior art, the invention provides a bidirectional cascading attention mechanism, so that a model can sense the keywords of the problem and the related context key part thereof, and the context keywords and the related problem key part thereof. And the door mechanism module is used for the model to automatically screen the parts related to the problems in the context and forget the parts unrelated to the problems. Through the two mechanisms described above, the model is forced to focus on key information in the text.
Aiming at the problem of over-stability in the prior art, the particle calculation module provided by the invention is used for increasing context granularity and sequence granularity in the model, so that the model can understand texts from local and global angles and is helped to jump out the preference of over-paying attention to local information of the texts.
The skimming module is inspired by the idea of a stacked attention mechanism, which is proposed by the present invention, as shown in fig. 3. After the similarity matrix is obtained through the context and the question representation, softmax is respectively solved horizontally and vertically for the obtained matrix, and the highlighting keyword is respectively solved, and the attention of the context key part associated with the question keyword and the attention of the question word key part associated with the context keyword are respectively calculated. The context expression and the problem word expression are respectively multiplied by the corresponding attention moment array and then added with the original expression to obtain the context expression H of key information perceptionCKeyAnd problem representation of key information perception HQKey
In specific implementation, the language-dependent characteristic of the context is H, H ═ H1,h2,h3,...,hs},h1To hsRepresenting the coded representation of the sequence of the concatenation of the context and the question, S representing the length of the sequence of the concatenation of the context and the question, step S2 comprising:
s201, intercepting a question intercepting part H based on the context and the position of the question in the HQAnd a context intercept part HC(the sequence is composed of problem and context concatenation, so after encoding, the encoded parts of the problem and context need to be intercepted from the corresponding positions in the sequence for subsequent operations), HQ={h2,h3,h4,...,hn+1},HC={hn+3,hn+4,...,hn+m+2N represents the length of the question word, and m represents the length of the context word;
all questions and context words become n and m in length. If the length is not sufficient, [ PAD ]]And filling, and if the length exceeds the length, performing truncation. When coded by BERT, the sequence needs to be as followsSplicing is performed in such a way that: [ CLS]+HQ+[SEP]+HC+[SEP]Of the form (1) that only requires H for subsequent operationsQAnd HCSo will [ CLS](i.e. h)1) And two [ SEP ]](i.e. h)n+2,hn+m+3) And (4) discarding. A sequence consisting of a question and a context, and [ CLS]And two [ SEP ]]Composition, problem length is n, context length is m, so sequence length is n + m + 3.
S202, constructing a similarity matrix S,
S=WS(HC,HQ,HC·HQ)
in the formula, WSIs a trainable matrix;
in a simplified form, the specific process is S ═ Wa*HC+Wb*HQ+Wc*HC*HQ+ bias, where denotes matrix multiplication, bias. By WsTo represent W hereina、WbAnd WcAnd finally the shape of the S matrix is [ b, m, n ]]And b represents the size of each batch.
S203, performing softmax operation on each row and each column of the similarity matrix S to obtain S1And S2,S1Representing for each context word the relevance of all question words to it; s2Representing for each question word the relevance of all context words to it; s1=softmax(S),S2=softmax↓(S);
S204, highlighting the weight of the problem keyword and the context keyword;
aiming at each context word, the relevance of all questions to the context word, so the weight of the keywords in the question words can be highlighted by longitudinally averaging, namely the more key a question word is, the greater the relevance to each context word is, the greater the average value is, and the weight S of the question keywords is highlighted by the weightQkey. By averaging the horizontal direction in the same way, the keyword weight S in the context can be highlightedCkey
S205, generating a context representation H of key information perception based on the following formulaCKeyAnd key informationPerceptual problem representation HQKey
HCKey=HC+HC⊙AC
HQKey=HQ+HQ⊙AQ
In the formula, ACRepresenting the context-critical section attention associated with the question keyword, AQRepresenting a question key part attention associated with the context keyword;
AC=S2·SQkey
AQ=S1·SCkey
in the formula, SQkeyQuestion weight, S, representing a salient keywordCkeyA contextual weight representing a salient keyword;
SQkey=mean(S1)
SCkey=mean(S2)。
in specific implementation, step S3 includes:
s301, obtaining the following formula
Figure GDA0002958557340000091
And
Figure GDA0002958557340000092
Figure GDA0002958557340000093
Figure GDA0002958557340000094
S′=WS′(HCKey,HQKey,HCKey·HQKey)
in the formula (I), the compound is shown in the specification,
Figure GDA0002958557340000095
for each upper partThe following words, the relevance of all the question words to them,
Figure GDA0002958557340000096
representing the correlation between all the context words and each question word, S' representing the correlation between the question words and the context words after obtaining the key information, and recalculating the correlation between the question words and the context words (same operation as S202), WS′Represented as a trainable matrix;
s302, calculating a context expression A based on the question words and a question word expression B based on the context words;
Figure GDA0002958557340000097
Figure GDA0002958557340000098
s303, adding HCKeySplicing the A and the B to obtain a context expression G of problem perception;
G=W3([HcKey;HcKey⊙A;HcKey⊙B])+b3
W3and b3Trainable matrices and biases, respectively.
When a human being finishes a reading and understanding task, the human being often reads for many times to deepen the understanding of the text. The model simulates the behavior in the process of passing through the skimming module and the perusal module again for multiple times, grasps key information in the problems and paragraphs by skimming, further grasps the main idea of the text by perusal and screens important information in accordance with the problems. And repeatedly reading, and continuously adjusting the judged key information to obtain more comprehensive context expression, and finally determining the answer of the question. The invention simulates the behavior of reading the text repeatedly by human by using a multi-hop loop mechanism, and helps the model deepen the understanding of the text. The experimental data below also demonstrate that the multi-hop loop mechanism helps to improve the model performance.
In specific implementation, step S3 further includes:
s304, representing the problem of key information perception HQKeyAnd question-aware context representation G as H in step S2QAnd HCAnd repeating the steps S2 and S3 until a final problem-aware context representation G is obtained after cycling for a preset number of times.
In the reading and understanding process, various structures adopt a door-like mechanism to simulate the human beings to screen and memorize important contents after reading for many times, and neglect the behaviors of unimportant contents, such as LSTM, GRU and training Reader. And judging the part needing to be memorized or forgotten by the model, generating an updating vector and updating the memory result of the model. In the invention, a problem perception context expression G and a key information perception problem expression H are expressedQKeyAnd sending the data to a door mechanism to enable the model to judge the part needing to be memorized or forgotten, generating an updating vector z by using G, and updating the memorized result of the model. Merging G and A into a linear layer with sigmoid, that is, when part of G is more relevant to the problem content, the memory weight G approaches 1, and more relevant information is retained
In the specific implementation, in step S4:
z=tanh(Wz·G+bz)
g=sigmoid(Wg[G;A]+bg)
Figure GDA0002958557340000101
in the formula, Wz,Wg,bzAnd bgIs a trainable matrix and bias.
The invention simultaneously adopts a bidirectional stacked attention mechanism and a door mechanism to force the model to pay more attention to the key information in the text. The hyper-parameters in the experiment can be adjusted automatically according to the current equipment performance. The performance of the model is also different under different hyper-parameter settings. The experimental data of the invention are model performance results established under the hyper-parameter settings of the patent. Under the same super-parameter setting, the performance of the model of the invention is superior to that of other models in a comparison test.
Particle computation is an effective solution to the structuring problem. One recognized feature of artificial intelligence is that people can observe and analyze the same problem at very different granularities, and that people can not only solve problems in different granularity worlds, but also jump from one granularity world to another granularity world quickly, and this ability to deal with different granularity worlds is a powerful manifestation of human problem solving. The particle calculation model divides a research object into a plurality of layers with different particle sizes, each layer is mutually associated to form a unified whole, the different particle sizes represent different angles and ranges of information, and the particle calculation idea is helpful for solving problems of the model under various particle sizes and helps the model to understand the relation between the local part and the whole of the text. The invention provides a method for understanding texts and the relation between the text totality and the text part in terms of word granularity, context granularity and sequence granularity.
In specific implementation, step S5 includes:
s501, removing HCOf (1) [ PAD ]]After filling part, averaging to obtain context granularity vector GC
S502, extracting [ CLS ] in H]Identifier as a sequence granularity vector GCLS
S503, calculating language association characteristics based on the context based on the following formula to generate an output vector C of the multi-angle understanding context and the relation between the context and the local partout
Cout=W4·(Cg+GC+GCLS)+b4
In the formula, W4And b4Trainable matrices and biases, respectively.
In addition, the invention also discloses a question-answering task downstream task processing model, which is used for realizing the question-answering task downstream task processing method and comprises the following steps:
a pre-training language module for generating language-associated features of a context based on a question and the context;
a skimming module for utilizingBidirectional attention mechanism deriving key information-aware context representation based on context-dependent language featuresCKeyAnd problem representation of key information perception HQKey
A perusal module for context representation H based on key information perception using bi-directional attention flowCKeyAnd problem representation of key information perception HQKeyObtaining a problem-aware context representation G;
the door mechanism module is used for calculating an update vector z and a memory weight G by utilizing the door mechanism based on the context representation G of the problem perception, and obtaining an output vector G by utilizing the update vector z and the memory weight G to update the context representation G of the problem perceptiong
A particle computation module for generating a context granularity vector G using particle computation context-based language-dependent featuresCAnd sequence granularity vector GCLSBased on context granularity vector GCSequence size vector GCLSAnd output vector GgGenerating an output vector C of a multi-angle understanding context and a context global and local relationout
An answer prediction module for generating an output vector C of the multi-angle understanding context and the relation between the context and the local part based on the language association characteristics of the contextoutAfter linear layer processing, the probability of each word in the context as the start-stop position of the answer is calculated by using softmax, and the continuous subsequence with the highest probability is extracted as the answer.
The invention serves a fragment extraction type reading and understanding task, and the main model architecture is shown in figure 2. The downstream structure mainly comprises the following four parts: the device comprises a skimming module, a precision reading module, a door mechanism module and a grain calculation module. Wherein the skimming module and the perusal module are included in a multi-hop mechanism. The skimming module is used for judging keywords of the problem and the related context key part thereof, the context keywords and the related problem word key part thereof, and aligning the problem and the context information through the fine reading module to establish a complete association relationship; screening and memorizing key information in the information through a gate mechanism and updating the key information; the particle computation module is parallel to the structure, so that the model can understand the text from context granularity and sequence granularity in a multi-angle manner.
In specific implementation, the loss function of the downstream task processing model of the question-answering task in the training process is as follows:
Figure GDA0002958557340000121
in the formula (I), the compound is shown in the specification,
Figure GDA0002958557340000122
and
Figure GDA0002958557340000123
respectively representing the start position and the end position of the true answer of the ith sample, N being the total number of samples,
Figure GDA0002958557340000131
the representation represents the predicted probability of the starting position of the real answer at the time of model inference,
Figure GDA0002958557340000132
representing the predicted probability of the end position of the true answer.
The effect of the technical scheme disclosed by the invention can be verified through the following experiments:
the pre-training language model RoBERTA is used as the Encoder of the model and is used as a baseline model, and the pre-training models BERT and ALBERT with the same super parameters and ALBERT-Large with larger super parameters are used for carrying out comparison experiments. The experiment was carried out using DuReader2.0 under Tensorflow-1.12.0, SQuADv1.1 under Pytrch 1.0.1 and NVIDIA GTX 1080Ti using the hyper-parameters shown in Table 1.
TABLE 1 Superparameter of this experiment
Hyper Parameters Values
batch size 4
epoch 3
max query length(DuReader 2.0) 16
max query length(SQuAD v1.1) 24
max sequence length 512
learning rate 3×10-5
doc stride 384
warmup rate 0.1
multi-hop 3
The fuzzy matching degree (F1) and the accurate matching degree (EM) are used as evaluation indexes in the experiment. The EM measures whether the model predicted answer is a perfect match with the true answer. F1 measures the degree of lexical level matching between the model predicted answer and the real answer, which is calculated from the Precision (Precision) and Recall (Recall) of the lexical level.
Table 2 compares the results of the evaluation of multiple pre-trained models under the DuReader2.0 and SQuAD v1.1 development sets. The experiment has further improved F1(+ 0.94%; 0.526%) and EM (+ 0.918%; 0.464%) on the basis of the baseline model.
TABLE 2 model results for DuReader2.0 and SQuAD 1.1
Figure GDA0002958557340000141
Table 3 is a comparison of model parameters, where EM boosting is significant, indicating that the model can deepen understanding of the text and help predict more accurate answers.
TABLE 3 comparison of the parameters of the models
Model Params(M)
BERT 110
RoBERTa 110
S&IReader 119
In the DuReader2.0 experiment, 10890 steps were trained, and the checkpoint of the model was saved and the performance recorded every 2000 steps in the experiment. The changes in F1 and EM with training steps for this experiment and baseline model are shown in fig. 4 and 5. The experimental performance is slightly lower than that of a baseline model due to the increase of the number of parameters in the initial training period, and the performance is basically superior to that of the baseline model after full training.
Experiments prove that the text semantics can be deeply understood, as shown in table 4, a sample in a DuReader2.0 development set is used for example, a baseline model has wrong understanding on a question and a context text in the sample, and a predicted answer cannot be accurately positioned.
TABLE 4 an example of a comparable machine reading understanding
Figure GDA0002958557340000142
Figure GDA0002958557340000151
Meanwhile, the over-stability problem existing before can be solved to a certain extent by the method, as shown in table 5, a sample in a DuReader2.0 development set is selected for example, and the phenomenon that the baseline model is matched only through the word is found, namely, the part marked in table 5, so that the baseline model can be seen to be matched only according to the word at the position where the problem text appears in the context to obtain an incorrect answer, and the method can be matched according to the semantics of the problem and the context to find a correct answer.
TABLE 5 an example of development focus over stabilization
Figure GDA0002958557340000152
In order to analyze the influence of the skimming module, the fine-reading module, the door mechanism, the multi-granularity module and the multi-hop frequency on the model performance, the ablation experiment is carried out under Dureader 2.0. Table 6 shows the performance of the model under different ablation experiments.
TABLE 6 ablation results
Figure GDA0002958557340000161
The experiment shows that compared with the 7 th mode in table 6, the first mode shows that the bidirectional stacking attention mechanism helps the model to pay attention to the key contents, and the performance of the model can be improved to a certain extent. The second way shows that further establishing a more complete association of the question and the context helps to improve the performance. The third mode shows that the door mechanism helps the model to filter unimportant information so as to obviously improve the performance of the model. The fourth mode shows that the model processes texts under various granularities, so that the model can understand text information at multiple levels, and the performance of the model is further improved.
Meanwhile, the 5 th to 9 th ways show that increasing the number of multi-hop times properly helps the model to understand the text semantics more deeply so as to solve the problems of insufficient learning and over-stability to improve the accuracy of model predicted answers. However, the increase of the number of multi-hop also causes the increase of the parameter number and the calculation amount of the model, and the performance and the efficiency of the model are influenced. The experiments described above show that the best performance is achieved in this experiment under Multihop-3.
In order to further verify and explain the effectiveness of the skimming module in the invention, a sample in a DuReader2.0 development set is selected, and when the sample data enters the skimming module of the model, the problem key words and the relevant context key parts of the sample, the context key words and the relevant problem word key parts of the context key words are judged.
In the corresponding heat map, as shown in fig. 6 and 7, the horizontal axis and the vertical axis represent the question and the context text, respectively, and it can be seen from fig. 6 and 7 that the keyword- "price" of the question is accurately judged. Also as can be seen in FIG. 6, the skimming module section identifies key sections such as "income", "market", "region", and "company profit" that are highly associated with the question keyword "price" in the context semantics. Thus, as further verified and illustrated by the above-described sample and heat maps, the skimming module in the present invention has the ability to semantically identify key words and corresponding associated key components.
The above is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several changes and modifications can be made without departing from the technical solution, and the technical solution of the changes and modifications should be considered as falling within the scope of the claims of the present application.

Claims (8)

1. A method for processing a downstream task of a question-answering task is characterized by comprising the following steps:
s1, inputting the question and the context into a pre-training language module to obtain the language association characteristics of the context;
s2 obtaining context representation H of key information perception based on language association characteristics of context by utilizing bidirectional attention mechanismCKeyAnd problem representation of key information perception HQKey
S3 context representation H based on key information perception using bidirectional attention flowCKeyAnd problem representation of key information perception HQKeyObtaining a problem-aware context representation G;
s4, calculating an update vector z and a memory weight G based on the problem-aware context representation G by using a gate mechanism, and updating the problem-aware context representation G by using the update vector z and the memory weight G to obtain an output vector Gg
S5, generating context granularity vector G based on language correlation characteristics of context by using granularity calculationCAnd sequence granularity vector GCLSBased on context granularity vector GCSequence size vector GCLSAnd output vector GgGenerating an output vector C of a multi-angle understanding context and a context global and local relationout
S6, generating multi-angle understanding context and output vector C of overall and local relation of context based on language association characteristics of contextoutAfter processing through the linear layer, calculating the probability of each word as the starting and stopping position of the answer in the context by using softmax, and extracting the probabilityThe largest continuous subsequence serves as the answer.
2. The method of claim 1, wherein the context has a language-dependent characteristic of H, H ═ H1,h2,h3,...,hs},h1To hsRepresenting the coded representation of the sequence of the concatenation of the context and the question, S representing the length of the sequence of the concatenation of the context and the question, step S2 comprising:
s201, intercepting a question intercepting part H based on the context and the position of the question in the HQAnd a context intercept part HC,HQ={h2,h3,h4,...,hn+1},HC={hn+3,hn+4,...,hn+m+2N represents the length of the question word, and m represents the length of the context word;
s202, constructing a similarity matrix S,
S=WS(HC,HQ,HC·HQ)
in the formula, WSIs a trainable matrix;
s203, performing softmax operation on each row and each column of the similarity matrix S to obtain S1And S2,S1Representing for each context word the relevance of all question words to it; s2Representing for each question word the relevance of all context words to it; s1=softmax(S),S2=softmax(S);
S204, highlighting the weight of the question key words and the context key words;
s205, generating a context representation H of key information perception based on the following formulaCKeyAnd problem representation of key information perception HQKey
HCKey=HC+HC⊙AC
HQKey=HQ+HQ⊙AQ
In the formula, ACShow and questionThe context key part attention associated with the topic keyword, AQRepresenting a question key part attention associated with the context keyword;
AC=S2·SQkey
AQ=S1·SCkey
in the formula, SQkeyQuestion weight, S, representing a salient keywordCkeyA contextual weight representing a salient keyword;
SQkey=mean(S1)
SCkey=mean(S2)。
3. the method for processing the task downstream of the question-answering task according to claim 2, wherein the step S3 includes:
s301, obtaining the compound based on the following formula
Figure FDA0003563557740000021
And
Figure FDA0003563557740000022
Figure FDA0003563557740000023
Figure FDA0003563557740000024
S′=WS′(HCKey,HQKey,HCKey·HQKey)
in the formula (I), the compound is shown in the specification,
Figure FDA0003563557740000025
indicating for each context word, the relevance of all question words to it,
Figure FDA0003563557740000026
representing the correlation between all the context words and each question word, S' representing the correlation between the question words and the context words after obtaining the key information, and WS′Expressed as a trainable matrix;
s302, calculating a context expression A based on the question words and a question word expression B based on the context words;
Figure FDA0003563557740000027
Figure FDA0003563557740000028
s303, adding HCKeySplicing the A and the B to obtain a context expression G of problem perception;
G=W3([HcKey;HcKey⊙A;HcKey⊙B])+b3
W3and b3Trainable matrices and biases, respectively.
4. The method for processing the downstream task of the question-answering task according to claim 3, wherein the step S3 further includes:
s304, representing the problem of key information perception HQKeyAnd question-aware context representation G as H in step S2QAnd HCAnd repeating the steps S2 and S3 until a final problem-aware context representation G is obtained after cycling for a preset number of times.
5. The method for processing the downstream task of the question-answering task according to claim 4, wherein in step S4:
z=tanh(Wz·G+bz)
g=sigmoid(Wg[G;A]+bg)
Figure FDA0003563557740000031
in the formula, Wz,Wg,bzAnd bgIs a trainable matrix and bias.
6. The method for processing the downstream task of the question-answering task according to claim 5, wherein the step S5 includes:
s501, removing HCOf (1) [ PAD ]]After filling part, averaging to obtain context granularity vector GC
S502, extracting [ CLS ] in H]Identifier as a sequence granularity vector GCLS
S503, calculating language association characteristics based on the context based on the following formula to generate an output vector C of multi-angle understanding context and the relation between the context and local partout
Cout=W4·(Cg+GC+GCLS)+b4
In the formula, W4And b4Trainable matrices and offsets, respectively;
wherein the length of question and context needs to be n and m respectively, if the length is insufficient, using [ PAD [ ]]Filling, and if the length exceeds the length, performing truncation; when encoded by BERT, the sequences need to be spliced in such a way that: [ CLS]+HQ+[SEP]+HC+[SEP]In the form of (a); a sequence consisting of a question and a context, and [ CLS]And two [ SEP ]]Composition, problem length is n, context length is m, so sequence length is n + m + 3.
7. A downstream task processing model of a question-answering task, characterized in that a downstream task processing method for implementing a question-answering task according to any one of claims 1 to 6, comprises:
a pre-training language module for generating language-associated features of a context based on a question and the context;
a skimming module for deriving a key information-aware context representation H using a context-based language-dependent feature of a bidirectional attention mechanismCKeyAnd problem representation of key information perception HQKey
A perusal module for context representation H based on key information perception using bi-directional attention flowCKeyAnd problem representation of key information perception HQKeyObtaining a problem-aware context representation G;
the door mechanism module is used for calculating an update vector z and a memory weight G by utilizing the door mechanism based on the context representation G of the problem perception, and obtaining an output vector G by utilizing the update vector z and the memory weight G to update the context representation G of the problem perceptiong
A particle computation module for generating a context granularity vector G using particle computation context-based language-dependent featuresCAnd sequence granularity vector GCLSBased on context granularity vector GCSequence size vector GCLSAnd output vector GgGenerating an output vector C of a multi-angle understanding context and a context global and local relationout
An answer prediction module for generating an output vector C of the multi-angle understanding context and the relation between the context and the local part based on the language association characteristics of the contextoutAfter linear layer processing, the probability of each word in the context as the start-stop position of the answer is calculated by using softmax, and the continuous subsequence with the highest probability is extracted as the answer.
8. The question-answering task downstream task processing model according to claim 7, wherein the loss function of the question-answering task downstream task processing model in the training process is as follows:
Figure FDA0003563557740000041
in the formula (I), the compound is shown in the specification,
Figure FDA0003563557740000042
and
Figure FDA0003563557740000043
respectively representing the start position and the end position of the true answer of the ith sample, N being the total number of samples,
Figure FDA0003563557740000044
representing the predicted probability of the start position of the true answer at model inference,
Figure FDA0003563557740000045
representing the predicted probability of the end position of the true answer.
CN202011539404.XA 2020-12-23 2020-12-23 Downstream task processing method and model of question-answering task Active CN112732879B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011539404.XA CN112732879B (en) 2020-12-23 2020-12-23 Downstream task processing method and model of question-answering task

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011539404.XA CN112732879B (en) 2020-12-23 2020-12-23 Downstream task processing method and model of question-answering task

Publications (2)

Publication Number Publication Date
CN112732879A CN112732879A (en) 2021-04-30
CN112732879B true CN112732879B (en) 2022-05-10

Family

ID=75604645

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011539404.XA Active CN112732879B (en) 2020-12-23 2020-12-23 Downstream task processing method and model of question-answering task

Country Status (1)

Country Link
CN (1) CN112732879B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115080715B (en) * 2022-05-30 2023-05-30 重庆理工大学 Span extraction reading understanding method based on residual structure and bidirectional fusion attention
CN114780707B (en) * 2022-06-21 2022-11-22 浙江浙里信征信有限公司 Multi-hop question answering method based on multi-hop reasoning joint optimization

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110134771A (en) * 2019-04-09 2019-08-16 广东工业大学 A kind of implementation method based on more attention mechanism converged network question answering systems
CN111611361A (en) * 2020-04-01 2020-09-01 西南电子技术研究所(中国电子科技集团公司第十研究所) Intelligent reading, understanding, question answering system of extraction type machine

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106126596B (en) * 2016-06-20 2019-08-23 中国科学院自动化研究所 A kind of answering method based on stratification memory network
EP3385862A1 (en) * 2017-04-03 2018-10-10 Siemens Aktiengesellschaft A method and apparatus for performing hierarchical entity classification
CN109947912B (en) * 2019-01-25 2020-06-23 四川大学 Model method based on intra-paragraph reasoning and joint question answer matching
CN110442675A (en) * 2019-06-27 2019-11-12 平安科技(深圳)有限公司 Question and answer matching treatment, model training method, device, equipment and storage medium
CN110717431B (en) * 2019-09-27 2023-03-24 华侨大学 Fine-grained visual question and answer method combined with multi-view attention mechanism
CN110929515B (en) * 2019-11-21 2023-04-18 中国民航大学 Reading understanding method and system based on cooperative attention and adaptive adjustment
CN111814982B (en) * 2020-07-15 2021-03-16 四川大学 Multi-hop question-answer oriented dynamic reasoning network system and method
CN112100348A (en) * 2020-09-01 2020-12-18 武汉纺织大学 Knowledge base question-answer relation detection method and system of multi-granularity attention mechanism

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110134771A (en) * 2019-04-09 2019-08-16 广东工业大学 A kind of implementation method based on more attention mechanism converged network question answering systems
CN111611361A (en) * 2020-04-01 2020-09-01 西南电子技术研究所(中国电子科技集团公司第十研究所) Intelligent reading, understanding, question answering system of extraction type machine

Also Published As

Publication number Publication date
CN112732879A (en) 2021-04-30

Similar Documents

Publication Publication Date Title
US20210390271A1 (en) Neural machine translation systems
CN110390397B (en) Text inclusion recognition method and device
CN106502985A (en) A kind of neural network modeling approach and device for generating title
US20210125516A1 (en) Answer training device, answer training method, answer generation device, answer generation method, and program
Nagaraj et al. Kannada to English Machine Translation Using Deep Neural Network.
CN108845990A (en) Answer selection method, device and electronic equipment based on two-way attention mechanism
CN111524593B (en) Medical question-answering method and system based on context language model and knowledge embedding
CN109858046B (en) Learning long-term dependencies in neural networks using assistance loss
CN112732879B (en) Downstream task processing method and model of question-answering task
CN114297399B (en) Knowledge graph generation method, system, storage medium and electronic equipment
CN111079018A (en) Exercise personalized recommendation method, exercise personalized recommendation device, exercise personalized recommendation equipment and computer readable storage medium
CN114218379A (en) Intelligent question-answering system-oriented method for attributing questions which cannot be answered
CN110852071A (en) Knowledge point detection method, device, equipment and readable storage medium
Kumari et al. Context-based question answering system with suggested questions
CN117057414B (en) Text generation-oriented multi-step collaborative prompt learning black box knowledge distillation method and system
CN118467706A (en) Retrieval enhancement question-answering method and system combined with historical data
Arifin et al. Automatic essay scoring for Indonesian short answers using siamese Manhattan long short-term memory
CN117235347A (en) Teenager algorithm code aided learning system and method based on large language model
KR20240128104A (en) Generating output sequences with inline evidence using language model neural networks
CN112580365B (en) Chapter analysis method, electronic equipment and storage device
CN115357712A (en) Aspect level emotion analysis method and device, electronic equipment and storage medium
CN114139535A (en) Keyword sentence making method and device, computer equipment and readable medium
CN114358579A (en) Evaluation method, evaluation device, electronic device, and computer-readable storage medium
Ratna et al. Hybrid deep learning cnn-bidirectional lstm and manhattan distance for japanese automated short answer grading: Use case in japanese language studies
Liu et al. Investigating the Robustness of Natural Language Generation from Logical Forms via Counterfactual Samples

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20230705

Address after: No. 1811, 18th Floor, Building 19, Section 1201, Lushan Avenue, Wan'an Street, Tianfu New District, Chengdu, Sichuan, China (Sichuan) Pilot Free Trade Zone, 610213, China

Patentee after: Sichuan Jiulai Technology Co.,Ltd.

Address before: No. 69 lijiatuo Chongqing District of Banan City Road 400054 red

Patentee before: Chongqing University of Technology