CN116108153B - Multi-task combined training machine reading and understanding method based on gating mechanism - Google Patents

Multi-task combined training machine reading and understanding method based on gating mechanism Download PDF

Info

Publication number
CN116108153B
CN116108153B CN202310112991.1A CN202310112991A CN116108153B CN 116108153 B CN116108153 B CN 116108153B CN 202310112991 A CN202310112991 A CN 202310112991A CN 116108153 B CN116108153 B CN 116108153B
Authority
CN
China
Prior art keywords
article
question
attention
follows
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310112991.1A
Other languages
Chinese (zh)
Other versions
CN116108153A (en
Inventor
王勇
陈秋怡
张梅
王永明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Technology
Original Assignee
Chongqing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Technology filed Critical Chongqing University of Technology
Priority to CN202310112991.1A priority Critical patent/CN116108153B/en
Publication of CN116108153A publication Critical patent/CN116108153A/en
Application granted granted Critical
Publication of CN116108153B publication Critical patent/CN116108153B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention belongs to the technical field of natural language processing, and particularly relates to a multi-task combined training machine reading and understanding method based on a gating mechanism. The method comprises the following steps: an article and question coding module; an interaction module; a multi-level residual structure module; and an answer prediction module. The invention filters the interactive associated features through a gating mechanism, controls the inflow of important information and the outflow of useless information to grasp the flow of information, and accurately sends the information into an output layer to predict the answer; the multi-level residual structure is built through the thought of introducing the residual structure, and the representations after the interaction of the articles and the problems are fused with the original semantic information, so that the semantic information is more abundant, the understanding of the articles is more sufficient, and the degradation of a network is avoided; the edge loss function is added to perform multi-task combined training, so that strong coupling of classification tasks and extraction tasks is ensured, and feature differences between positive examples and negative examples are further learned.

Description

Multi-task combined training machine reading and understanding method based on gating mechanism
Technical Field
The invention belongs to the technical field of natural language processing, and particularly relates to a multi-task combined training machine reading and understanding method based on a gating mechanism.
Background
An important task in natural language processing is a question-answering system, where machine reading understanding is a popular study of the question-answering system. The study presents articles and questions from which a piece of text is extracted as an answer to the question, by reading and understanding. In real life, not all questions have answers, and in order to meet the real needs, a machine reading understanding model is required to accurately extract the answers of the questions from the articles, judge whether the answers of the questions exist or not, and is an important challenge in the field of natural language processing. From the practical application, the application of reading and understanding has penetrated into aspects of our lives. For example, in a common search engine, when a user inputs a keyword to be queried, it is necessary to find related web pages from massive website information, and it takes a lot of time. If the related art is applied to a search engine, a desired answer can be found more precisely. Other common practical application scenes also include a naughty customer service dialogue system, and the answers can be returned by inputting common questions, so that manpower and material resources are saved for enterprises.
Pre-trained language models, such as BERT, ALBERT, etc., are research hotspots in recent years of natural language processing, and are commonly used in machine reading understanding models. Many machine reading understanding models also employ a attentive mechanism to simulate human behavior with problem reading, such as BiDAF, QANet, aoA, etc. Fusion net proposes an improved reading understanding network model based on word history and full attention, wherein the full attention calculates the weighting coefficient of all the history information of the words, and simultaneously reduces the dimension of high-dimensional features in the word history, thereby improving the efficiency. The ASMI model solves the problem of insufficient robustness, provides a context attention mechanism, predicts context answers, and simultaneously provides a new negative sample generation method. These models typically highlight articles and critical information of the problem when computing attention, and by fusion, result in a semantic vector representation containing the problem and the article interactions.
The classification of questions and the extraction of answers are divided into an end-to-end model and a two-stage model. The retrospective reader model adopts two stages, combines the two stages of skip and finish, and obtains new promotion. The skip module reads the articles and questions to give a preliminary judgment, and the finish module verifies the answers to give candidates. And integrating the output of the two modules to give out a final classification result and giving corresponding answers. S & I Reader is an end-to-end reading model, and provides a finish reading module and a skip reading module, and simulates the behavior of multiple reading of people through multiple hops. Meanwhile, a multi-granularity module is added, and important characteristics of the text are enriched. The rmr+answer verifier model is an end-to-end model, and proposes a read-before-test structure, not only uses a reader to extract candidate answers and generate answer-free probabilities, but also uses an answer verifier to determine whether predicted answers are contained in an input segment, and adopts auxiliary loss for further detection.
However, in each of the above prior arts, there are the following technical problems: (1) extracting feature redundancy. After associating the article and the question feature, no control is given over the flow of information. (2) semantic information is not comprehensive. Only the upper and lower Wen Yuyi vectors obtained from the pre-training language model or only the key information semantic vectors obtained by the techniques such as attention mechanism and the like are contained, so that little information can be expressed, and meanwhile, network degradation caused by adding a network layer can occur, so that the network characterization capability is not strong. (3) classification of questions and extraction of answers are not strongly coupled. The difference between the answer questions and the question-and-answer questions cannot be learned.
In order to solve the technical problem, the invention provides a multi-task combined training machine reading and understanding method based on a gating mechanism.
Disclosure of Invention
The invention aims to provide a multi-task combined training machine reading and understanding method based on a gating mechanism, which aims to solve the problems in the prior art pointed out in the background art.
In order to achieve the above purpose, the present invention provides the following technical solutions:
the invention provides a multi-task combined training machine reading and understanding method based on a gating mechanism, which comprises the following steps:
the method comprises the steps of performing context coding on input articles and questions through an article and question coding module;
through the interaction module, important characteristics of the context information are highlighted by adopting an attention mechanism and a gating mechanism, and the highlighted key characteristics are updated;
the original semantic information is respectively fused with the representation obtained through the attention mechanism and the representation obtained through the gating mechanism through a multi-level residual error structure module;
the answers of the questions and the answers of the questions are predicted by an answer prediction module.
Further, the context encoding of the inputted articles and questions by the article and question encoding module includes:
an article defined with m words is p= { P 1 ,p 2 ,…,p m The problem of n words is q= { Q 1 ,q 2 ,…,q n };
Splicing the problem Q and the article P into a fixed-length sequence: the starting position is marked by [ CLS ] and is used as sentence vector of the whole sequence;
the question Q and the article P are separated by an identifier [ SEP ], and the end of the article P is also identified by [ SEP ];
for the length of the whole sequence, if the sequence exceeds a fixed length, cutting off, and generating a next sequence by adopting a sliding window; if the sequence does not reach the fixed length, the [ PAD ] is used for filling;
the generated sequence is sent as input to the encoder side and E= { E 1 ,e 2 ,…,e s As a sequence of vectors with embedded features;
sending vector E into a multi-layer transducer structure, wherein each layer comprises two parts, one part is multi-headed attention and the other part is a feed-forward layer;
the output of the encoder finally obtained by multilayer transform is used as H= { H 1 ,h 2 ,…,h s And } represents.
Further, the attention mechanism of the interaction module adopts a bidirectional attention flow model, and the working principle comprises:
the similarity score between the ith article word and the jth question word is calculated using the dot product, expressed as follows:
wherein p is i Represent the i-th article word, q j Represents the j-th question word, T is a transposed symbol, S ij ∈R m×n Representing generated S ij The dimension is m×n;
building the attention of the article to the question and the attention of the question to the article to obtain a question-based article representation:
multiple similarity scores S ij Forming a similarity matrix S, and carrying out row normalization on the similarity matrix S to obtain a matrix S 1 The expression is as follows:
S 1 =softmax (S)
calculating, for each article word, which of the question words is most relevant to it;
the article's attention to the question highlights the feature of the question word as follows:
A pq =S 1 ·Q
wherein A is pq Representing the attention of the article to the question, Q being the question word;
the rows are firstly maximized and then the columns are normalized to obtain a matrix S 2 The expression is as follows:
S 2 =softmax (max (S))
to indicate which article word is most relevant to a word in the question word, and prove that the word is important for answering the question;
the attention of a question to an article highlights the features of the article word from the article word associated with the question word as follows:
A qp =S 2 ·P
wherein A is qp Is the attention of the question to the article, P is the article word;
the final question-based article representation is obtained by fusion, expressed as follows:
QP=[P;A pq ;P·A pq ;P·A qp ]。
further, the operating principle of the gating mechanism of the interaction module includes:
the article words are respectively spliced with the attention of the article to the problem and the fused article representation based on the problem, and the weight value is obtained through an activation function, and is expressed as follows:
z=sigmoid(W z [P;A pq ]+b z )
r=sigmoid(W r [P;QP']+b r )
wherein P represents an article word, A pq Representing the attention of the article to the problem, QP' represents the problem-based article representation after the gating mechanism
z and r are used to determine the weight of the update portion, respectively, and update the extracted features as follows:
hz=(1-z)⊙A pq +z⊙P
hr=(1-r)⊙QP'+r⊙P
and taking the average value of the two updated vectors to obtain a final gating mechanism vector, wherein the final gating mechanism vector is expressed as follows:
G=mean(hz+hr)
where G represents the vector obtained after the gating mechanism.
Further, the fusing, by the multi-level residual structure module, the original semantic information with the representation obtained by the attention mechanism and the representation obtained by the gating mechanism respectively includes:
the fine granularity vector obtained through the attention mechanism and the gating mechanism is expressed as the effect of simulating human fine reading; the vector sequence obtained from the encoder end is used as coarse granularity vector representation, and the result of human skip is simulated;
the first level residual structure of the article P and the attention-representative QP was constructed using a jump connection, expressed as follows:
QP'=ReLU(P+QP)
wherein ReLU is an activation function;
the second-level residual structure of the context vector representation H and the updated representation G is established by using the jump connection through a gating mechanism, and is expressed as follows:
I=ReLU(H+G)
wherein, reLU is an activation function, I epsilon R s×h The dimension representing I is s×h;
the resulting I is used to determine the probability of each word in the sequence as a start-stop position.
Further, the predicting, by the answer predicting module, the answers of the questions and the answers of the answers to the questions includes:
the additional edge loss function is proposed to maximize the Euclidean distance between answer prediction and no answer classification, and the final trained loss function L contains three losses expressed as follows:
L=Loss ext +Loss class +Loss joint
in the reading process, semantic vector representation I which finally contains two granularities of thickness is obtained, the semantic vector representation I is sent to a full-connection layer, start-stop position representations of each word are respectively obtained, and in the training process, a cross entropy loss function is adopted as a training target and expressed as follows:
wherein,and->The real position labels of the ith problem starting and stopping positions are respectively, and N is the number of the problems;
for the answers of questions, a classification task is trained by the vector representation based on the context information generated by the pre-training language model, and as the answers of the questions are classified into two categories, a cross entropy loss function of the two categories is adopted in the training process, the expression is as follows:
adopting edge loss function joint training to enable the sample to narrow the distance to the answering direction corresponding to the label and far away from the opposite direction, performing one-step learning of the characteristic difference between the sample and the label, and enabling the extraction task of the answer and the classification task of the question to have strong coupling; the probability of obtaining the answer start-stop position after normalization is expressed by the obtained start-stop position, and the product of the start-stop position probability is taken as the probability of a positive sample, and is expressed as follows:
P has_ans =softmax(P' start to ·P' Ending )
After vector representations generated by the pre-training language model are classified, the probability that the questions are not answered is obtained and is used as the probability of a negative sample, and the probability is expressed as follows:
P no_ans =softmax(H)
the edge loss function calculates the distance between the label and the positive and negative sample probabilities, and the Euclidean distance between the label and the negative sample probability is calculated as follows:
d(x,y)=||x-y|| 2
the edge loss function maximizes the distance between the unanswered classification and the answered prediction during training as follows:
compared with the prior art, the invention has the beneficial effects that:
(1) The invention provides a gating mechanism, which filters the related characteristics after interaction and controls the inflow of important information and the outflow of useless information so as to grasp the flow of information and accurately send the information into an output layer to predict the answer;
(2) The invention introduces the thought of residual structure, builds a new module of multi-level residual structure, fuses the representation after the interaction of the article and the problem with the original semantic information, so that the semantic information is more comprehensive, the understanding of the article is more sufficient, and the degradation of the network is avoided;
(3) The invention provides a multi-task combined training method by adding an edge loss function, and ensures the coupling of classification tasks and extraction tasks. And constructing triples of the label, the positive example and the negative example, calculating the distance between the label and the positive example and the distance between the label and the negative example through a distance function, enabling the distance between the label and the corresponding sample to be smaller and the distance between the label and the corresponding sample to be larger and larger, and further learning the characteristic difference between the label and the corresponding sample.
Drawings
FIG. 1 is a diagram of an overall model framework of the present invention;
FIG. 2 is a triplet edge loss schematic.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
First, the system defines an article with m words as p= { P 1 ,p 2 ,…,p m The problem of n words is q= { Q 1 ,q 2 ,…,q n }。
Not only is an answer to the question Q found correctly from the article P, but also it is possible to accurately judge whether the question is answerable. For a answerable question, returning a start position and an end position represents that the answer is a continuous piece of text a= { p Start to ,…,p Ending -a }; for non-answerable questions, a null character is assigned to mark that it has no answer, i.e., a= []。
Referring to fig. 1, the system model mainly includes four modules: the system comprises an article and problem encoding module, an interaction module, a multi-level residual error structure module and a multi-task joint training module.
(1) Article and question coding module: performing context coding on the input articles and questions; (2) an interaction module: highlighting important characteristics of the context information by adopting an attention mechanism and a gating mechanism, and updating the highlighted key characteristics; (3) a multi-stage residual structure module: the original semantic information is respectively fused with the representation obtained through the attention mechanism and the representation obtained through the gating mechanism; (4) an answer prediction module: predict the answerability of the question and answer to the answerable question.
1. Article and question coding module
The module first concatenates the question Q and the article P into a fixed length sequence, the starting position being defined by [ CLS ]]Is identified, typically as a sentence vector for the entire sequence. With an identifier [ SEP ] between Q and P]Spaced apart, the end of P is also referred to as [ SEP ]]And (5) identification. For the length of the whole sequence, if the sequence is overlong, cutting off, and generating a next sequence by adopting a sliding window; if the sequence does not reach fixed length, then use [ PAD ]]And (5) supplementing. The generated sequence is sent as input to the encoder side and E= { E 1 ,e 2 ,…,e s As a sequence of vectors with embedded features.
Vector E is sent to the multi-layer transducer structure. Wherein each layer comprises two parts, one part being a multi-headed attention and the other part being a feed-forward layer. The output of the encoder finally obtained by multilayer transform is used as H= { H 1 ,h 2 ,…,h s And } represents.
2. Interactive module
1) Attention mechanism
This module uses a bi-directional attention flow model proposed by Seo et al [ Seo M, kembhavi A, faradai A, et al Bidirection attention flow for machine comprehension [ C ]// Proceedings of the 5th International Conference on Learning Representations,2017] to highlight the focus of the article and problem understanding. First, the similarity score between the ith article word and the jth question word is calculated using the dot product:
wherein p is i Represent the i-th article word, q j Represent the jth questionInscription, T is a transposed symbol) S ij ∈R m×n Representing generated S ij The dimension is m×n.
Then, the attention of the article to the question and the attention of the question to the article are constructed to obtain a question-based article representation. The method comprises the following steps of obtaining a similarity matrix S (S ij Set of (S) to perform row normalization to obtain matrix S 1 For each article word, it is calculated which question word is most relevant to it, as in equation (2). The article's attention to the question highlights the feature of the question word as shown in equation (3). Similarly, the rows are first maximized and then the columns are normalized to obtain a matrix S 2 As in equation (4) to indicate which article word is most relevant to a word in the question word, it proves that the word is critical to answering the question. The attention of the question to the article highlights the feature of the article word according to the article word related to the question word as shown in formula (5). The final problem-based article representation is obtained using a fusion approach as in equation (6).
S 1 =softmax (S) (2)
A pq =S 1 ·Q (3)
S 2 =softmax (max (S)) (4)
A qp =S 2 ·P (5)
QP=[P;A pq ;P·A pq ;P·A qp ] (6)
Wherein A is pq Representing the attention of the article to the question, Q being the question word; a is that qp Is the attention of the question to the article, P is the article word;
2) Gating mechanism
In order to simulate the forgetting and memory updating behaviors of people during reading, a gating mechanism is adopted to update the characteristics after the attention mechanism.
The article words are respectively spliced with the attention of the article to the problem and the fused article representation based on the problem, and the weights are obtained through an activation function, and are expressed as follows:
z=sigmoid(W z [P;A pq ]+b z )
r=sigmoid(W r [P;QP']+b r ) (7)
wherein P represents an article word, A pq Representing the attention of the article to the problem, QP' represents the problem-based article representation after the gating mechanism
z and r are used to determine the weight of the update portion, respectively, and update the extracted features as follows:
hz=(1-z)⊙A pq +z⊙P
hr=(1-r)⊙QP'+r⊙P (8)
and taking the average value of the two updated vectors to obtain a final gating mechanism vector, wherein the final gating mechanism vector is expressed as follows:
G=mean(hz+hr) (9)
where G represents the vector obtained after the gating mechanism.
3. Multistage residual structure module
When people read, two reading modes, namely skip reading and finish reading, are usually adopted. Therefore, the fine granularity vector representation obtained through the attention mechanism and the gating mechanism is used as an effect for simulating human skimming, the vector sequence obtained from the encoder end is used as a coarse granularity vector representation, and a result for simulating human skimming is obtained. In order to ensure the integrity of the information and ensure that the information is smoothly transmitted to the next layer, the degradation of a network is avoided, and a multi-level residual structure is provided for respectively connecting an attention mechanism and a gating mechanism.
First, a first level residual structure of the article P and the attention expression QP is constructed using a jump connection, as shown in equation (10). And then establishing a second-level residual structure of the context vector representation H and the updated representation G by using a jump connection through a gating mechanism, as shown in a formula (11). The resulting I is used to determine the probability of each word in the sequence as a start-stop position. This is different from the previous method of obtaining probabilities only by question-based article representation. The method can better integrate the original information, can obtain the semantic information of the key part, and helps us to locate and accurately extract the answer from the two granularity of thickness.
QP'=ReLU(P+QP) (10)
Wherein ReLU is an activation function
I=ReLU(H+G) (11)
Wherein ReLU is an activation function
Wherein I is E R s×h ,I∈R s×h The dimension representing I is sxh.
4. Answer prediction module
The objective function typically includes an extraction task and a classification task. On this basis, the model proposes an additional edge loss function that maximizes the Euclidean distance between answer predictions and no answer classifications. The final trained loss function contains three losses, as shown in equation (12), each of which is explained in detail below.
L=Loss ext +Loss class +Loss joint (12)
Wherein,and->The actual position labels of the ith problem start and stop positions are respectively, and N is the number of the problems.
1) Answer extraction
Through the reading process, the semantic vector representation I finally containing the granularity of two kinds of granularity is obtained, and is sent to a full-connection layer to respectively obtain the start and stop position representation of each word. During training, a cross entropy loss function is used as a training target, as in equation (13).
Wherein,and->The true positions of the ith problem start-stop positions respectivelyAnd placing labels, wherein N is the number of problems.
2) Problem classification
For the answers to questions, a classification task is trained by pre-training a vector representation of the language model generated based on the context information. Since the answers to questions are classified into two categories, we use the cross entropy loss function of the two categories during training as shown in equation (14) below:
wherein y' i Is the answers to the predicted ith question, y i Is the answerability of the ith question mark, N is the number of questions.
3) Joint training
Referring to fig. 2, in order to keep the answers given by the answer extraction task and the question classification task consistent logically, edge loss function joint training is adopted, so that the distance between the sample and the label is reduced in the answering direction corresponding to the label, the sample is far away from the opposite direction, the feature difference between the sample and the label is learned in one step, and the answer extraction task and the question classification task have strong coupling. And (3) normalizing the obtained start and stop position representations to obtain probabilities of answer start and stop positions respectively, and taking the product of the start and stop position probabilities as the probability of a positive sample, wherein the probability is shown in a formula (15). Meanwhile, after the vector representation generated by the pre-training language model is classified, the probability that the problem has no answer is obtained and is used as the probability of a negative sample, as shown in a formula (16). The edge loss function calculates the distance between the label and the positive and negative sample probabilities, and calculates the Euclidean distance between the label and the negative sample probability, as in equation (17), margin defaults to 1. The edge loss function maximizes the distance between the unanswered classification and the answered prediction during training, as shown in equation (18).
P has_ans =softmax(P' Start to ·P' Ending ) (15)
P no_ans =softmax(H) (16)
d(x,y)=||x-y|| 2 (17)
Wherein x and y do not refer to any particular value, but merely represent that the relationship between them is in accordance with the Euclidean distance formula.
To verify the effect of the model of the invention, we performed comparative verification as shown in the following table:
thus, in combination with the above, it can be seen that:
the invention provides a multi-stage residual error structure module which is built by adding an edge loss function to perform multi-task combined training. The most important are the following three points:
1) The invention provides a gating mechanism, which filters the related characteristics after interaction and controls the inflow of important information and the outflow of useless information so as to grasp the flow of information and accurately send the information into an output layer to predict the answer;
(2) The invention introduces the thought of residual structure, builds a new module of multi-level residual structure, fuses the representation after the interaction of the article and the problem with the original semantic information, so that the semantic information is more comprehensive, the understanding of the article is more sufficient, and the degradation of the network is avoided;
(3) The invention provides a multi-task combined training method by adding an edge loss function, and ensures the coupling of classification tasks and extraction tasks. Constructing a triplet of the label, the positive example and the negative example, calculating the distance between the label and the positive example and the distance between the label and the negative example through a distance function, enabling the distance between the label and the corresponding sample to be smaller and the distance between the label and the corresponding sample to be larger, and further learning the characteristic difference between the label and the positive example and the distance between the label and the negative example
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (4)

1. A method for reading and understanding a multi-task joint training machine based on a gating mechanism, which is characterized by comprising the following steps:
the method comprises the steps of performing context coding on input articles and questions through an article and question coding module;
through the interaction module, important characteristics of the context information are highlighted by adopting an attention mechanism and a gating mechanism, and the highlighted key characteristics are updated;
the original semantic information is respectively fused with the representation obtained through the attention mechanism and the representation obtained through the gating mechanism through a multi-level residual error structure module;
predicting the repliability of the questions and the answers of the repliable questions through an answer prediction module;
the method for fusing the original semantic information with the representation obtained by the attention mechanism and the representation obtained by the gating mechanism through the multi-level residual error structure module comprises the following steps:
the fine granularity vector obtained through the attention mechanism and the gating mechanism is expressed as the effect of simulating human fine reading; the vector sequence obtained from the encoder end is used as coarse granularity vector representation, and the result of human skip is simulated;
the first level residual structure of the article P and the attention-representative QP was constructed using a jump connection, expressed as follows:
QP′=ReLU(P+QP)
wherein ReLU is an activation function;
through a gating mechanism, a second-stage residual structure of a context vector representation H and a vector G obtained through the gating mechanism is established by using a jump connection, and the second-stage residual structure is expressed as follows:
I=ReLU(H+G)
wherein, reLU is an activation function, I epsilon R s×h The dimension representing I is s×h;
the finally obtained I is used for determining the probability of each word in the sequence as a start-stop position;
the method for predicting the answers of the questions and the answers of the answers comprises the following steps:
the additional edge loss function is proposed to maximize the Euclidean distance between answer prediction and no answer classification, and the final trained loss function L contains three losses expressed as follows:
L=Loss ext +Loss class +Loss joint
in the reading process, semantic vector representation I which finally contains two granularities of thickness is obtained, the semantic vector representation I is sent to a full-connection layer, start-stop position representations of each word are respectively obtained, and in the training process, a cross entropy loss function is adopted as a training target and expressed as follows:
wherein,and->The real position labels of the ith problem starting and stopping positions are respectively, and N is the number of the problems;
for the answers of questions, a classification task is trained by the vector representation based on the context information generated by the pre-training language model, and as the answers of the questions are classified into two categories, a cross entropy loss function of the two categories is adopted in the training process, the expression is as follows:
adopting edge loss function joint training to enable the sample to narrow the distance to the answering direction corresponding to the label and far away from the opposite direction, further learning the characteristic difference between the sample and the label, and enabling the extraction task of the answer and the classification task of the question to have strong coupling; the probability of obtaining the answer start-stop position after normalization is expressed by the obtained start-stop position, and the product of the start-stop position probability is taken as the probability of a positive sample, and is expressed as follows:
P has_ans =softmax(P′ start to ·P′ Ending )
After vector representations generated by the pre-training language model are classified, the probability that the questions are not answered is obtained and is used as the probability of a negative sample, and the probability is expressed as follows:
P no_ans =softmax(H)
the edge loss function calculates the distance between the label and the positive and negative sample probabilities, and the Euclidean distance between the label and the negative sample probability is calculated as follows:
d(x,y)=||x-y|| 2
the edge loss function maximizes the distance between the unanswered classification and the answered prediction during training as follows:
2. the method for reading and understanding a multi-task joint training machine based on a gating mechanism according to claim 1, wherein the context encoding of the inputted articles and questions by the article and question encoding module comprises:
an article defined with m words is p= { P 1 ,p 2 ,…,p m The problem of n words is q= { Q 1 ,q 2 ,…,q n };
Splicing the problem Q and the article P into a fixed-length sequence: the starting position is marked by [ CLS ] and is used as sentence vector of the whole sequence;
the question Q and the article P are separated by an identifier [ SEP ], and the end of the article P is also identified by [ SEP ];
for the length of the whole sequence, if the sequence exceeds a fixed length, cutting off, and generating a next sequence by adopting a sliding window; if the sequence does not reach the fixed length, the [ PAD ] is used for filling;
the generated sequence is sent as input to the encoder side and E= { E 1 ,e 2 ,…,e s As a sequence of vectors with embedded features;
sending vector E into a multi-layer transducer structure, wherein each layer comprises two parts, one part is multi-headed attention and the other part is a feed-forward layer;
the output of the encoder finally obtained by multilayer transform is used as H= { H 1 ,h 2 ,…,h s And } represents.
3. The method for reading and understanding the multi-task combined training machine based on the gating mechanism according to claim 1, wherein the attention mechanism of the interaction module adopts a bidirectional attention flow model, and the working principle comprises:
the similarity score between the ith article word and the jth question word is calculated using the dot product, expressed as follows:
where pi represents the ith article word, q j Represents the j-th question word, T is a transposed symbol, S ij ∈R m×n Representing the generated S ij The dimension is m×n;
building the attention of the article to the question and the attention of the question to the article to obtain a question-based article representation:
multiple similarity scores S ij Forming a similarity matrix S, and carrying out row normalization on the similarity matrix S to obtain a matrix S 1 The expression is as follows:
S 1 =softmax (S)
calculating, for each article word, which of the question words is most relevant to it;
the article's attention to the question highlights the feature of the question word as follows:
A pq =S 1 •Q
wherein A is pq Representing the attention of the article to the question, Q being the question;
the rows are firstly maximized and then the columns are normalized to obtain a matrix S 2 The expression is as follows:
S 2 =softmax (max (S))
to indicate which article word is most relevant to a word in the question word, and prove that the article word is important to answer the question;
the attention of a question to an article highlights the features of the article word from the article word associated with the question word as follows:
A qp =S 2 •P
wherein A is qp Is the attention of the question to the article, P is the article;
the final question-based article representation is obtained by fusion, expressed as follows:
QP=[P;A pq ;P·A pq ;P·A qp ]。
4. the method for reading and understanding the multi-task combined training machine based on the gating mechanism according to claim 1, wherein the working principle of the gating mechanism of the interaction module comprises the following steps:
the article words are respectively spliced with the attention of the article to the problem and the fused article representation based on the problem, and the weight value is obtained through an activation function, and is expressed as follows:
z=sigmoid(W z [P;A pq ]+b z )
r=sigmoid(V r [P;QP′]+b r )
wherein P represents an article, A pq Representing the attention of the article to the problem, QP' represents the problem-based article representation after the gating mechanism
z and r are used to determine the weight of the update portion, respectively, and update the extracted features as follows:
hz=(1-z)⊙A pq +z⊙P
hr=(1-r)⊙QP′+r⊙P
and taking the average value of the two updated vectors to obtain a final gating mechanism vector, wherein the final gating mechanism vector is expressed as follows:
G=mean(hz+hr)
where G represents the vector obtained after the gating mechanism.
CN202310112991.1A 2023-02-14 2023-02-14 Multi-task combined training machine reading and understanding method based on gating mechanism Active CN116108153B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310112991.1A CN116108153B (en) 2023-02-14 2023-02-14 Multi-task combined training machine reading and understanding method based on gating mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310112991.1A CN116108153B (en) 2023-02-14 2023-02-14 Multi-task combined training machine reading and understanding method based on gating mechanism

Publications (2)

Publication Number Publication Date
CN116108153A CN116108153A (en) 2023-05-12
CN116108153B true CN116108153B (en) 2024-01-23

Family

ID=86253963

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310112991.1A Active CN116108153B (en) 2023-02-14 2023-02-14 Multi-task combined training machine reading and understanding method based on gating mechanism

Country Status (1)

Country Link
CN (1) CN116108153B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112269868A (en) * 2020-12-21 2021-01-26 中南大学 Use method of machine reading understanding model based on multi-task joint training
CN114398976A (en) * 2022-01-13 2022-04-26 福州大学 Machine reading understanding method based on BERT and gate control type attention enhancement network
CN114861627A (en) * 2022-04-08 2022-08-05 清华大学深圳国际研究生院 Method and model for automatically generating interference item of choice question based on deep learning
CN115080715A (en) * 2022-05-30 2022-09-20 重庆理工大学 Span extraction reading understanding method based on residual error structure and bidirectional fusion attention

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113807512B (en) * 2020-06-12 2024-01-23 株式会社理光 Training method and device for machine reading understanding model and readable storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112269868A (en) * 2020-12-21 2021-01-26 中南大学 Use method of machine reading understanding model based on multi-task joint training
CN114398976A (en) * 2022-01-13 2022-04-26 福州大学 Machine reading understanding method based on BERT and gate control type attention enhancement network
CN114861627A (en) * 2022-04-08 2022-08-05 清华大学深圳国际研究生院 Method and model for automatically generating interference item of choice question based on deep learning
CN115080715A (en) * 2022-05-30 2022-09-20 重庆理工大学 Span extraction reading understanding method based on residual error structure and bidirectional fusion attention

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Yiming Cui 等.Interactive Gated Decoder for Machine Reading Comprehension.《Interactive Gated Decoder for Machine Reading Comprehension》.2022,全文. *
Zhilin Yang 等.WORDS OR CHARACTERS? FINE-GRAINED GATING FOR READING COMPREHENSION.《https://arxiv.org/pdf/1611.01724.pdf》.2017,全文. *

Also Published As

Publication number Publication date
CN116108153A (en) 2023-05-12

Similar Documents

Publication Publication Date Title
CN111554268B (en) Language identification method based on language model, text classification method and device
CN110609891A (en) Visual dialog generation method based on context awareness graph neural network
CN112270193A (en) Chinese named entity identification method based on BERT-FLAT
CN112347268A (en) Text-enhanced knowledge graph joint representation learning method and device
CN111475655B (en) Power distribution network knowledge graph-based power scheduling text entity linking method
CN116450796B (en) Intelligent question-answering model construction method and device
Zhang et al. Deep relation embedding for cross-modal retrieval
WO2023024412A1 (en) Visual question answering method and apparatus based on deep learning model, and medium and device
CN111753207B (en) Collaborative filtering method for neural map based on comments
CN109933792A (en) Viewpoint type problem based on multi-layer biaxially oriented LSTM and verifying model reads understanding method
CN115080715B (en) Span extraction reading understanding method based on residual structure and bidirectional fusion attention
CN114037945A (en) Cross-modal retrieval method based on multi-granularity feature interaction
CN114385802A (en) Common-emotion conversation generation method integrating theme prediction and emotion inference
CN112232086A (en) Semantic recognition method and device, computer equipment and storage medium
CN113705191A (en) Method, device and equipment for generating sample statement and storage medium
CN113901188A (en) Retrieval type personalized dialogue method and system
CN114492460A (en) Event causal relationship extraction method based on derivative prompt learning
CN114360584A (en) Phoneme-level-based speech emotion layered recognition method and system
CN115905187B (en) Intelligent proposition system oriented to cloud computing engineering technician authentication
CN116108153B (en) Multi-task combined training machine reading and understanding method based on gating mechanism
CN112651225A (en) Multi-item selection machine reading understanding method based on multi-stage maximum attention
CN111414466A (en) Multi-round dialogue modeling method based on depth model fusion
CN116362242A (en) Small sample slot value extraction method, device, equipment and storage medium
CN115130461A (en) Text matching method and device, electronic equipment and storage medium
CN115238077A (en) Text analysis method, device and equipment based on artificial intelligence and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant