CN112579739A - Reading understanding method based on ELMo embedding and gating self-attention mechanism - Google Patents

Reading understanding method based on ELMo embedding and gating self-attention mechanism Download PDF

Info

Publication number
CN112579739A
CN112579739A CN202011542671.2A CN202011542671A CN112579739A CN 112579739 A CN112579739 A CN 112579739A CN 202011542671 A CN202011542671 A CN 202011542671A CN 112579739 A CN112579739 A CN 112579739A
Authority
CN
China
Prior art keywords
representation
word
attention
layer
question
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011542671.2A
Other languages
Chinese (zh)
Inventor
任福继
张伟伟
鲍艳伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei University of Technology
Original Assignee
Hefei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei University of Technology filed Critical Hefei University of Technology
Priority to CN202011542671.2A priority Critical patent/CN112579739A/en
Publication of CN112579739A publication Critical patent/CN112579739A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3346Query execution using probabilistic model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a reading understanding method based on an ELMo embedding and gating self-attention mechanism, which is based on a model related to an ELMo embedding and gating self-attention function. In addition, the method multiplexes the feature representations of all layers at the answer layer, and carries out the position prediction of the final answer by using a bilinear function, thereby further improving the overall performance of the system. In experiments on SQuAD data sets, the model is proved to be greatly superior to a plurality of baseline models, the performance is improved by about 5 percent compared with the original baseline, the performance is close to the average level of human tests, and the effectiveness of the method is fully proved.

Description

Reading understanding method based on ELMo embedding and gating self-attention mechanism
Technical Field
The invention relates to the technical field of computers, in particular to a reading understanding method based on an ELMo embedding and gating self-attention mechanism.
Background
Machine reading understanding is always an important component of artificial intelligence and is a research hotspot in the field of natural language processing. A great deal of human knowledge is transmitted in the form of unstructured natural language texts, so that a machine can read and understand the texts, and the method has an important significance and has a direct application value for search engines, intelligent customer service and the like. Machine-reading understanding has received widespread attention in the field of natural language processing in recent years, and one of the reasons for this has been the development and application of attention mechanisms that enable models to focus on more relevant portions of the context given a problem. The Stanford SQuAD dataset requires answering the corresponding questions of a given article, the answers can be any possible span in the context. To answer the corresponding question of an article, complex interactions between the question and the context need to be encoded. And extracting a segment from the original text as an answer according to the interactive fusion information, wherein the specific extraction method is to output a start index and an end index of the predicted answer in the article.
With the continuous development of neural networks in recent years and the wide application of LSTM to the task of machine reading and understanding, good performance effects are achieved by combining with an attention mechanism. However, some classical baseline models have a certain improvement space in accuracy, which does not consider the context dependency problem of long text, i.e. the associated information of long context cannot be well captured, and the ambiguity problem of words in different contexts is ignored.
Disclosure of Invention
The invention aims to make up the defects of the prior art, provides a reading understanding method based on an ELMo embedding and gating self-attention mechanism, introduces ELMo word embedding to obtain more accurate context word embedding representation, and adds a self-attention layer with a gating function to relieve the problem of needing further reasoning related to long context. In addition, the answer layer adopts a characteristic reuse method, and a bilinear function is used for calculating the final index position, so that the performance of the system is further improved. In experiments on the SQuAD data set, the model is proved to be greatly superior to most of baseline models, the performance of the model is close to the average level of human tests, and the effectiveness of the model is fully proved.
The invention is realized by the following technical scheme:
a reading comprehension method based on an ELMo embedding and gating self-attention mechanism, comprising the following steps:
s1, performing word segmentation and pretreatment on the article and the question respectively, and establishing a glove word vocabulary and a character list in words appearing in the article and the question after word segmentation;
s2, inputting each word to obtain an ELMo embedded representation containing context information by using a pre-trained ELMo coder;
s3, mapping each word into a corresponding word vector in a glove word vocabulary to obtain the word level representation of the word;
s4, finding out the corresponding representation in the character table for each letter of the word, taking the character vector as the input of the convolutional neural network, and obtaining the character embedded representation with fixed length of each word through the maximal pooling of the output of the convolutional layer;
s5, splicing the direct vectors of the representations obtained in the steps S2, S3 and S4, and respectively carrying out primary processing on the vectors by using a Highway network to obtain primary vector representations of articles and problems;
s6, in step S5, the question vector representation and the article vector representation use a bidirectional BilSTM sharing parameters to carry out context information fusion, so that the representation of each word is adjusted according to the context information;
s7, matching the text and the question by using the bidirectional attention layer for the representation obtained in the step S6 to obtain the article word representation after the article and the question are mutually sensed;
s8, further fusing and reasoning the text representation through the LSTM modeling layer of the two-way double layer to obtain modeling representations of articles and problems respectively, wherein the representations are obtained in the step S7;
s9, carrying out association matching of long context on the text representation obtained in the step S8 by gating a self-attention layer to obtain self-attention representation of the word;
and S10, the output layer combines the representation obtained in the steps S7, S8 and S9 to deduce a starting index and an ending index of the final answer by using a bilinear function, wherein the answer is a phrase between the indexes.
The step S1 is described in detail as follows:
firstly, word lists and character lists in words appearing in articles and problems after word segmentation are established, and the subsequent steps can obtain corresponding indexes of the words and the characters according to the two lists and then obtain corresponding embedded representations of the words and the characters according to the indexes. Secondly, taking each question-answer pair of the data set as a sample, and dividing the sample into a plurality of batches according to the specified batch size as model input.
The step S2 is described in detail as follows:
ELMo embedding is derived from a pre-trained two-layer bi-directional LSTM, which targets bi-directional language models, trained on a large corpus, and easily integrated into existing models. ELMo uses multi-layer LSTM, the upper layer LSTM state extracts context semantic information, and the lower layer LSTM state extracts grammar information. The final ELMo representation is a linear combination of the LSTM states of each layer. The obtained ELMo embedding, character embedding and glove word embedding are spliced to be used as model input, and fine adjustment is carried out on the model to improve the performance, which means that the ELMo embedding is updated in the training process. ELMo allows the vector representation of the vocabulary to consider both context and grammar, addressing the word-ambiguous case.
The detailed process by the step S5 is as follows:
concatenating the ELMo embedded representation, the word level representation and the character embedded representation as an input of a two-layer highway network to obtain a d-dimensional vector of each word, wherein the highway network formula is as follows:
y=F(x,WH)·G(x,WG)+x·(1-G(x,WG))
wherein F represents a forward neural network and G represents gating of an input;
thereby obtaining a context vector matrix
Figure BDA0002852732790000031
And a problem vector matrix
Figure BDA0002852732790000038
Wherein T is the number of article words, J is the number of question words, d is the number of one-dimensional convolution filters, then the matrix X and the matrix Q are respectively input into an LSTM with d-dimensional output to summarize articles and questions from two directions, and two matrices are obtained:
Figure BDA0002852732790000033
the step S7 bidirectional attention matching mechanism is as follows:
this layer matches the articles and question vectors in both directions using an attention mechanism and generates a context representation matrix G from the inputs H and U that fuses the question information for each word in the article. The following equation is given:
Figure BDA0002852732790000034
the formula a for calculating the attention score between the formula matrix H and the matrix U is as follows:
Figure BDA0002852732790000035
and respectively obtaining attention matrixes in two directions from the obtained attention fractional matrixes:
first, the attention matrix calculation method in the direction of the question is as follows:
Figure BDA0002852732790000036
wherein
Figure BDA0002852732790000037
Representing the relevance vector of the tth context word and the question word, and finally obtaining the question representation corresponding to the word as the weighted sum of all the word representations of the question
Figure BDA0002852732790000041
The attention matrix calculation method of the question to article direction is as follows:
Figure BDA0002852732790000042
obtaining weighted and vector representation of article words most relevant to the question
Figure BDA0002852732790000043
Then the vector is tiled for T times according to columns to obtain
Figure BDA0002852732790000044
Finally, a contextual representation fused with the problem information is obtained by:
Figure BDA0002852732790000045
Figure BDA0002852732790000046
the gate control attention in the step S9 is described in detail as follows:
the reason for this layer is that some of the problems involve longer contextual contexts, requiring more complex reasoning, and to alleviate these problems, the contextual representations of the words obtained by the modeling layer are directly matched with all other contextual word representations, i.e. each word representation M is first calculated from the text matrix representation obtained by the S8 modeling layertWith all other words representing MjAttention to the score of (1), after normalizing the scoreThe final weighted sum representation for each word is calculated:
Figure BDA0002852732790000047
Figure BDA0002852732790000048
Figure BDA0002852732790000049
in addition, a gate function is used to reduce the concern for less relevant information, resulting in the final representation P*
g=sigmoid(Wg[P;M])
P*=g⊙[P;M]
The answer layer in said step S10 is described in detail as follows:
using a feature multiplexing method, i.e. using a bi-directional attention layer representation G, a modeling layer representation M and a self-attention layer representation P simultaneously*To obtain the probability of each word as the beginning and ending positions of the answer, the probability distribution of the answer starting position s is calculated by a bilinear function:
Figure BDA00028527327900000410
then, word weighting and representation are calculated according to the starting position probability, information fusion is carried out through BILSTM, representation containing starting position information is obtained, and finally probability distribution of the ending position is obtained by the same formula as the previous layer representation based on the obtained representation.
The invention has the advantages that: the invention introduces the self-attention layer with the gating function, can further carry out matching and information fusion on the long text, and carries out filtering on unimportant information to a certain degree, thereby relieving the problem of designing longer context and improving the accuracy of the model to a certain degree;
the invention combines ELMo embedding, can obtain more accurate word embedding representation through an encoder pre-trained on large data volume, can make the word embedding representation contain more context information, can effectively solve the problem that polysemous words and the like need context, and improves the model performance.
Drawings
FIG. 1 is a basic flow diagram of the present invention.
FIG. 2 is a diagram of a neural network model according to the present invention.
Detailed Description
As shown in fig. 1, a reading comprehension method based on the ELMo embedding and gating self-attention mechanism includes the following steps:
s1, performing word segmentation and pretreatment on the article and the question respectively, and establishing a glove word vocabulary and a character list in words appearing in the article and the question after word segmentation;
s2, inputting each word to obtain an ELMo embedded representation containing context information by using a pre-trained ELMo coder;
s3, mapping each word into a corresponding word vector in a glove word vocabulary to obtain the word level representation of the word;
s4, finding out the corresponding representation in the character table for each letter of the word, taking the character vector as the input of the convolutional neural network, and obtaining the character embedded representation with fixed length of each word through the maximal pooling of the output of the convolutional layer;
s5, splicing the direct vectors of the representations obtained in the steps S2, S3 and S4, and respectively carrying out primary processing on the vectors by using a Highway network to obtain primary vector representations of articles and problems;
s6, in step S5, the question vector representation and the article vector representation use a bidirectional BilSTM sharing parameters to carry out context information fusion, so that the representation of each word is adjusted according to the context information;
s7, matching the text and the question by using the bidirectional attention layer for the representation obtained in the step S6 to obtain the article word representation after the article and the question are mutually sensed;
s8: the representation obtained in the step S7 is further fused and reasoned to the text representation through the two-way double-layer LSTM modeling layer, and modeling representations of articles and problems are respectively obtained;
s9: performing long-context associative matching on the text representation obtained in the step S8 through a gated self-attention layer to obtain a self-attention representation of the word;
s10: the output layer combines the representation obtained in steps S7, S8, S9 to use bilinear function to deduce the start index and the end index of the final answer, i.e. the answer is the phrase between the indexes.
The step S1 is described in detail as follows:
firstly, word lists and character lists in words appearing in articles and problems after word segmentation are established, and the subsequent steps can obtain corresponding indexes of the words and the characters according to the two lists and then obtain corresponding embedded representations of the words and the characters according to the indexes. Secondly, taking each question-answer pair of the data set as a sample, and dividing the sample into a plurality of batches according to the specified batch size as model input.
The step S2 is described in detail as follows:
ELMo embedding is derived from a pre-trained two-layer bi-directional LSTM, which targets bi-directional language models, trained on a large corpus, and easily integrated into existing models. ELMo uses multi-layer LSTM, the upper layer LSTM state extracts context semantic information, and the lower layer LSTM state extracts grammar information. The final ELMo representation is a linear combination of the LSTM states of each layer. The obtained ELMo embedding, character embedding and glove word embedding are spliced to be used as model input, and fine adjustment is carried out on the model to improve the performance, which means that the ELMo embedding is updated in the training process. ELMo allows the vector representation of the vocabulary to consider both context and grammar, addressing the word-ambiguous case.
The detailed process by the step S5 is as follows:
concatenating the ELMo embedded representation, the word level representation and the character embedded representation as an input of a two-layer highway network to obtain a d-dimensional vector of each word, wherein the highway network formula is as follows:
y=F(x,WH)·G(x,WG)+x·(1-G(x,WG))
wherein F represents a forward neural network and G represents gating of an input;
thereby obtaining a context vector matrix
Figure BDA0002852732790000061
And a problem vector matrix
Figure BDA0002852732790000062
Wherein T is the number of article words, J is the number of question words, d is the number of one-dimensional convolution filters, then the matrix X and the matrix Q are respectively input into an LSTM with d-dimensional output to summarize articles and questions from two directions, and two matrices are obtained:
Figure BDA0002852732790000071
the step S7 bidirectional attention matching mechanism is as follows:
this layer matches the articles and question vectors in both directions using an attention mechanism and generates a context representation matrix G from the inputs H and U that fuses the question information for each word in the article. The following equation is given:
Figure BDA0002852732790000072
the formula a for calculating the attention score between the formula matrix H and the matrix U is as follows:
Figure BDA0002852732790000073
and respectively obtaining attention matrixes in two directions from the obtained attention fractional matrixes:
first, the attention matrix calculation method in the direction of the question is as follows:
Figure BDA0002852732790000074
wherein
Figure BDA0002852732790000075
Representing the relevance vector of the tth context word and the question word, and finally obtaining the question representation corresponding to the word as the weighted sum of all the word representations of the question
Figure BDA0002852732790000076
The attention matrix calculation method of the question to article direction is as follows:
Figure BDA0002852732790000077
obtaining weighted and vector representation of article words most relevant to the question
Figure BDA0002852732790000078
Then the vector is tiled for T times according to columns to obtain
Figure BDA0002852732790000079
Finally, a contextual representation fused with the problem information is obtained by:
Figure BDA00028527327900000710
Figure BDA00028527327900000711
the gate control attention in the step S9 is described in detail as follows:
the reason for this layer is that some of the problems involve longer contextual contexts, requiring more complex reasoning, and to alleviate these problems, the contextual representations of the words obtained by the modeling layer are directly matched with all other contextual word representations, i.e. each word representation M is first calculated from the text matrix representation obtained by the S8 modeling layertWith all other words representing MjAttention to (1)And (3) calculating a final weighted sum representation of each word after normalizing the scores:
Figure BDA00028527327900000712
Figure BDA00028527327900000713
Figure BDA00028527327900000714
in addition, a gate function is used to reduce the concern for less relevant information, resulting in the final representation P*
g=sigmoid(Wg[P;M])
P*=g⊙[P;M]
The answer layer in said step S10 is described in detail as follows:
using a feature multiplexing method, i.e. using a bi-directional attention layer representation G, a modeling layer representation M and a self-attention layer representation P simultaneously*To obtain the probability of each word as the beginning and ending positions of the answer, the probability distribution of the answer starting position s is calculated by a bilinear function:
Figure BDA0002852732790000081
then, word weighting and representation are calculated according to the starting position probability, information fusion is carried out through BILSTM, representation containing starting position information is obtained, and finally probability distribution of the ending position is obtained by the same formula as the previous layer representation based on the obtained representation.
The specific implementation process of the invention is as follows:
1. an appropriate data set is selected.
This section uses a Stanford question and answer dataset (SQuAD), which is created manually by crowdsourcing. The SQuAD dataset is a regional predictive reading comprehension dataset, i.e. given articles and questions, the machine needs to find the corresponding region (span) of answers in the articles and predict the start and end positions. The length of the region is generally not limited. It is constructed from 536 articles randomly selected from wikipedia, english, including 107785 answers to questions. Typically, articles vary from 50 to 250 words, and questions contain about 10 words. This data set is one of the largest MRC data sets to date.
2. And selecting a model performance evaluation index.
This section uses two indices to evaluate the model: f1 and Exact Match (EM) score. These two scores may be obtained by comparing the model predicted answer with the candidate answers using an official script, i.e., comparing each of the three candidate answers with the predicted answer and selecting the highest score. Where EM is the full correct predictive score, the F1 score is defined as the average overlap score between the predicted answer and the candidate answer.
3. And constructing a model according to the prior art scheme and experience.
As shown in fig. 2, the main core content of the present invention includes the following hierarchical structure: (1) and the ELMo embedded layer obtains an embedded representation that the word contains the context information by using a pre-trained ELMo language model. (2) The self-attention layer with the gating function matches the word representation of the article with all other words of the article, and filters unimportant information by the gating function. (3) And a bilinear function answer layer based on a feature multiplexing method is used for predicting the start and end positions of the answer.
Wherein (1), (2), and (3) are further described:
(1) the ELMo embedding in (1) is obtained by pre-training two-layer bidirectional LSTM, and the model takes a bidirectional language model as a target, is trained on a large corpus, and can be easily integrated into the existing model. ELMo uses multi-layer LSTM, the upper layer LSTM state extracts context semantic information, and the lower layer LSTM state extracts grammar information. The final ELMo representation is a linear combination of the LSTM states of each layer. And splicing the obtained ELMo embedding, character embedding and glove word embedding as model input.
(2) The method comprises the following specific steps: first, a per-word representation M is calculated from the resulting modeled text matrix representationtWith all other words representing MjIs directly multiplied by the parameter matrix and then passed through the activation function, and finally normalized using the Softmax function, and finds the final weighted sum representation of each word:
Figure BDA0002852732790000091
Figure BDA0002852732790000092
Figure BDA0002852732790000093
in addition, a gate function is used to reduce the concern for less relevant information, resulting in the final representation P*Splicing P and M, and then multiplying the spliced P and M by a parameter matrix to obtain a gate fraction:
g=sigmoid(Wg[P;M])
P*=g⊙[P;M]
(3) specifically, the detailed steps for the answer layer are as follows:
the layer uses a feature multiplexing method, i.e. a bidirectional attention layer representation G, a modeling layer representation M and a self-attention layer representation P are used simultaneously*To obtain the probability of each word as the start and end positions of the answer, the probability distribution of the answer start position s is calculated by the following bilinear function:
Figure BDA0002852732790000094
because the starting position has great relevance with the ending position, the starting position probability is adopted to weight and calculate the words and the new representation fused with the starting position information is obtained through the BILSTM fusion information to infer the ending position e:
Figure BDA0002852732790000101
Figure BDA0002852732790000102
Figure BDA0002852732790000103
the loss function finally adopted by the part is the negative log maximum likelihood of the starting position and the ending position:
Figure BDA0002852732790000104
4. and selecting experimental environment and setting parameters.
The experiment is operated in a GeForce GTX titan 12G GPU video card hardware environment and in software environments of a system python 3.5tensorflow-GPU 1.1CUDA 8.0cudnn 5 and the like of ubuntu 18.04. The experimental process is specifically set as follows: the character embedding layer employs 100 filters of width 5. Word embedding uses a pre-trained 300D word vector (840B word version). Dropout with a reject rate of 0.2 applies to all CNN, LSTM layers and all forward propagation layers. The size of the hidden state d is 100, and the number of parameters is about 400 ten thousand. Model parameter optimization was performed using an adamax optimizer with a batch size of 8, which took about 2 days to train 12 rounds of models on a graphics card with 12G of storage space. The ELMo vectors resulting from the language model trained on Benchmark are set trainable with the other parameters being default values.
Finally, it is emphasized that the above implementation examples are merely illustrative of specific procedures of the present invention and are not to be considered as limiting. Although the flow chart is illustrated in detail by way of example, it should be understood by those skilled in the art that modifications and substitutions can be made to the present invention without departing from the technical core of the present invention, and other embodiments obtained based on the present invention without inventive efforts shall fall within the scope of the present invention.

Claims (6)

1. A reading understanding method based on an ELMo embedding and gating self-attention mechanism is characterized by comprising the following steps: the method specifically comprises the following steps:
s1: respectively performing word segmentation and pretreatment on the article and the problem, and establishing a glove word list and a character list in words appearing in the article and the problem after word segmentation;
s2: inputting each word to obtain its ELMo embedded representation containing context information using a pre-trained ELMo encoder;
s3: mapping each word into a corresponding word vector in a glove word vocabulary to obtain word level representation of the word;
s4: finding out corresponding representation in a character table for each letter of each word, taking the character vector as the input of a convolutional neural network, and performing maximum pooling on the output of a convolutional layer to obtain the character embedded representation with fixed length of each word;
s5: splicing direct vectors of the representations obtained in the steps S2, S3 and S4, and respectively carrying out primary processing on the vectors by using a high way network to obtain primary vector representations of articles and problems;
s6: in step S5, the question vector representation and the article vector representation use a bi-directional BiLSTM sharing parameters for context information fusion, thereby adjusting the representation of each word according to the context information;
s7: matching the text and the question by using the bidirectional attention layer for the representation obtained in the step S6 to obtain an article word representation after the article and the question are mutually sensed;
s8: the representation obtained in the step S7 is further fused and reasoned to the text representation through the two-way double-layer LSTM modeling layer, and modeling representations of articles and problems are respectively obtained;
s9: performing long-context associative matching on the text representation obtained in the step S8 through a gated self-attention layer to obtain a self-attention representation of the word;
s10: the output layer combines the representation obtained in steps S7, S8, S9 to use bilinear function to deduce the start index and the end index of the final answer, i.e. the answer is the phrase between the indexes.
2. The reading comprehension method of claim 1 based on ELMo embedding and gated attention mechanism, wherein: the ELMo embedding described in step S2 is specifically as follows:
ELMo embedding is obtained by pre-training two-layer bidirectional LSTM, a bidirectional language model is taken as a target, the two-layer bidirectional LSTM is trained on a large corpus and integrated into the model, an ELMo encoder uses a plurality of layers of LSTM, context semantic information is extracted from a high-layer LSTM state, grammatical information is extracted from a lower-layer LSTM state, and final ELMo embedding expression is linear combination of LSTM states of each layer.
3. The reading comprehension method of claim 1 based on ELMo embedding and gated attention mechanism, wherein: the specific process of step S5 is as follows:
concatenating the ELMo embedded representation, the word level representation and the character embedded representation as an input of a two-layer highway network to obtain a d-dimensional vector of each word, wherein the highway network formula is as follows:
y=F(x,WH)·G(x,WG)+x·(1-G(x,WG))
wherein F represents a forward neural network and G represents gating of an input;
thereby obtaining a context vector matrix
Figure FDA0002852732780000021
And a problem vector matrix
Figure FDA0002852732780000022
Where T is the number of article words, J is the number of question words, and d is one dimensionThe number of convolution filters, then the matrix X and the matrix Q are input into an LSTM with d-dimensional output to summarize articles and problems from two directions, resulting in two matrices:
Figure FDA0002852732780000023
4. the reading comprehension method of claim 3 based on ELMo embedding and gated attention mechanism, wherein: the matching of the text and the question using the bidirectional attention layer described in step S7 is specifically as follows:
matching the article and question vectors in two directions by using an attention mechanism, and generating a context expression matrix G fused with question information from an input matrix H and a matrix U for each word in the article, wherein the attention score matrix is obtained by the following formula:
Figure FDA0002852732780000024
the formula a for calculating the attention score between the formula matrix H and the matrix U is as follows:
Figure FDA0002852732780000025
and respectively obtaining attention matrixes in two directions from the obtained attention fractional matrixes:
first, the attention matrix calculation method in the direction of the question is as follows:
Figure FDA0002852732780000026
wherein
Figure FDA0002852732780000027
Representing the relevance vector of the tth context word and the question word, and finally obtaining the question representation corresponding to the word as the weighted sum of all the word representations of the question
Figure FDA0002852732780000028
The attention matrix calculation method of the question to article direction is as follows:
Figure FDA0002852732780000029
obtaining weighted and vector representation of article words most relevant to the question
Figure FDA0002852732780000031
Then the vector is tiled for T times according to columns to obtain
Figure FDA0002852732780000032
Finally, a contextual representation fused with the problem information is obtained by:
Figure FDA0002852732780000033
Figure FDA0002852732780000034
5. the ELMo embedding and gated attention based reading understanding method of claim 4, wherein: the gated attention described in step S9 is specifically as follows:
first, each word representation M is calculated using the text matrix representation resulting from the modeling representation of step S8tWith all other words representing MjThe score is normalized to calculate the final weighted sum representation of each word:
Figure FDA0002852732780000035
Figure FDA0002852732780000036
Figure FDA0002852732780000037
using a gate function, the final representation P is obtained*
g=sigmoid(Wg[P;M])
P*=g⊙[P;M]。
6. The reading comprehension method of claim 5 wherein said ELMo embedding and gating self-attention mechanism is based on: the specific process of step S10 is as follows:
using a feature multiplexing method, i.e. using a bi-directional attention layer representation G, a modeling layer representation M and a self-attention layer representation P simultaneously*To obtain the probability of each word as the beginning and ending positions of the answer, the probability distribution of the answer starting position s is calculated by a bilinear function:
Figure FDA0002852732780000038
then, word weighting and representation are calculated according to the starting position probability, information fusion is carried out through BILSTM, representation containing starting position information is obtained, and finally probability distribution of the ending position is obtained by the same formula as the previous layer representation based on the obtained representation.
CN202011542671.2A 2020-12-23 2020-12-23 Reading understanding method based on ELMo embedding and gating self-attention mechanism Pending CN112579739A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011542671.2A CN112579739A (en) 2020-12-23 2020-12-23 Reading understanding method based on ELMo embedding and gating self-attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011542671.2A CN112579739A (en) 2020-12-23 2020-12-23 Reading understanding method based on ELMo embedding and gating self-attention mechanism

Publications (1)

Publication Number Publication Date
CN112579739A true CN112579739A (en) 2021-03-30

Family

ID=75139229

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011542671.2A Pending CN112579739A (en) 2020-12-23 2020-12-23 Reading understanding method based on ELMo embedding and gating self-attention mechanism

Country Status (1)

Country Link
CN (1) CN112579739A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113240098A (en) * 2021-06-16 2021-08-10 湖北工业大学 Fault prediction method and device based on hybrid gated neural network and storage medium
CN114218365A (en) * 2021-11-26 2022-03-22 华南理工大学 Machine reading understanding method, system, computer and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110929030A (en) * 2019-11-07 2020-03-27 电子科技大学 Text abstract and emotion classification combined training method
US20200175015A1 (en) * 2018-11-29 2020-06-04 Koninklijke Philips N.V. Crf-based span prediction for fine machine learning comprehension

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200175015A1 (en) * 2018-11-29 2020-06-04 Koninklijke Philips N.V. Crf-based span prediction for fine machine learning comprehension
CN110929030A (en) * 2019-11-07 2020-03-27 电子科技大学 Text abstract and emotion classification combined training method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
WEIWEI ZHANG等: ""ELMo+Gated Self-attention Network Based on BiDAF for Machine Reading Comprehension"", 《 2020 IEEE 11TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE (ICSESS)》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113240098A (en) * 2021-06-16 2021-08-10 湖北工业大学 Fault prediction method and device based on hybrid gated neural network and storage medium
CN114218365A (en) * 2021-11-26 2022-03-22 华南理工大学 Machine reading understanding method, system, computer and storage medium
CN114218365B (en) * 2021-11-26 2024-04-05 华南理工大学 Machine reading and understanding method, system, computer and storage medium

Similar Documents

Publication Publication Date Title
CN112487182B (en) Training method of text processing model, text processing method and device
CN108733792B (en) Entity relation extraction method
CN113239181B (en) Scientific and technological literature citation recommendation method based on deep learning
CN107798140B (en) Dialog system construction method, semantic controlled response method and device
CN111881291A (en) Text emotion classification method and system
CN111930942B (en) Text classification method, language model training method, device and equipment
CN110647612A (en) Visual conversation generation method based on double-visual attention network
CN108628935A (en) A kind of answering method based on end-to-end memory network
CN113297364B (en) Natural language understanding method and device in dialogue-oriented system
CN113435203A (en) Multi-modal named entity recognition method and device and electronic equipment
CN111191002A (en) Neural code searching method and device based on hierarchical embedding
CN108536735B (en) Multi-mode vocabulary representation method and system based on multi-channel self-encoder
CN112232053A (en) Text similarity calculation system, method and storage medium based on multi-keyword pair matching
Chen et al. Deep neural networks for multi-class sentiment classification
CN110597968A (en) Reply selection method and device
CN111666752A (en) Circuit teaching material entity relation extraction method based on keyword attention mechanism
CN111145914B (en) Method and device for determining text entity of lung cancer clinical disease seed bank
CN112182373A (en) Context expression learning-based personalized search method
CN112579739A (en) Reading understanding method based on ELMo embedding and gating self-attention mechanism
CN110889505A (en) Cross-media comprehensive reasoning method and system for matching image-text sequences
CN115171870A (en) Diagnosis guiding and prompting method and system based on m-BERT pre-training model
CN111428518A (en) Low-frequency word translation method and device
Dandwate et al. Comparative study of Transformer and LSTM Network with attention mechanism on Image Captioning
CN116414988A (en) Graph convolution aspect emotion classification method and system based on dependency relation enhancement
Sun et al. Rumour detection technology based on the BiGRU_capsule network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210330