CN112380326B

CN112380326B - Question answer extraction method based on multilayer perception and electronic device

Info

Publication number: CN112380326B
Application number: CN202011079727.5A
Authority: CN
Inventors: 林政�; 付鹏; 刘欢; 王伟平; 孟丹
Original assignee: Institute of Information Engineering of CAS
Current assignee: Institute of Information Engineering of CAS
Priority date: 2020-10-10
Filing date: 2020-10-10
Publication date: 2022-07-08
Anticipated expiration: 2040-10-10
Also published as: CN112380326A

Abstract

The invention provides a question answer extraction method based on multilayer perception, which comprises the following steps: splicing a question and a plurality of target documents, inputting the spliced question and the target documents into a pre-training language model to obtain a representation Q of the question and a context representation P of the target documents, and interacting the representation Q and the context representation P to obtain a question representation u related to the documents and a document representation h fusing question information; carrying out multi-layer perception classification on the problem representation u, obtaining the inference type of the problem, and generating a subproblem c through the representation Q according to the inference type, the problem representation u, the document representation h and the representation Q_tObtaining the answer attention distribution of the question in the target document, wherein t is the number of times of generating sub-questions; and obtaining an answer prediction result of the question according to the answer attention distribution. The invention answers the questions in a subproblem splitting mode, introduces the inference category classifier to control the splitting, shares answers of the questions and improves the inference reading understanding effect.

Description

Question answer extraction method based on multilayer perception and electronic device

Technical Field

The invention belongs to the field of natural language processing, and particularly relates to a question answer extraction method based on multilayer perception and an electronic device.

Background

Inferential reading understanding is a plurality of related documents of a given user's question from which answers to the question and related evidence sentences are found. Reasoning, reading and understanding the question requires a model to combine with the question to reason about the semantic meaning of the text and find the relevant evidence sentences and the final answers to the question. Inferential reading models can be divided into three broad categories of methods as a whole. One is a method of memory network, which simulates the reasoning process by continuously iteratively updating the reasoning state; the other is a mode based on a graph neural network, and reasoning is carried out through updating of the graph neural network; there are also other methods based on deep learning. The frame of the reasoning reading understanding model based on the graph neural network can be integrally divided into three parts: 1) a semantic coding stage; 2) a reasoning modeling stage; 3) evidence and answer prediction phase. The semantic coding stage codes the questions and the documents into word vectors with context semantic information, the reasoning modeling stage models a reasoning process by using a graph neural network technology, and the answer predicting stage predicts relevant evidence sentences and answer segments after obtaining word representations. For some data with more candidate paragraphs, paragraph selection is also needed, and the paragraph selection stage selects relevant paragraphs from the candidate paragraphs for input of subsequent semantic coding.

A typical memory Network-based method is a Dynamic Co-attribute Network (simulation Xiong, Victor Zhong, Richard Socher: Dynamic Coattribute Networks For Question answering. ICLR,2017), the method divides a model into two parts of encoding and decoding, on one hand, a Co-attribute mechanism is used in an encoding stage to encode a Question and a document to obtain a stable representation related to the Question; on the other hand, in the decoding stage, iteration is carried out by using the result of answer prediction, the answer is predicted in each round according to the current state value, the current wheel state value is updated according to the answer prediction result, the iteration is continuously updated, and the result of the last round is used as the final answer.

A typical Graph-based neural Network method comparison is a DFGN model (Lin Qiu, Yunxuuan Xiao, Yanru Qu, Hao Zhou, Lei Li, Weinan Zhang, Yong Yu: dynamic Fused Graph Network for Multi-hop learning. ACL 2019: 6140-. The DFGN model firstly uses Bert to independently classify documents and select paragraphs, a semantic coding stage uses the Bert to obtain context word representation of the documents and questions, an inference modeling stage is realized by adopting a GAT graph neural network, a BilSTM modeling graph and word representation bidirectional fusion process is used, node information obtained after graph inference is fused into word representation, and bidirectional fusion of graph information and text information is completed by continuously iterating the graph inference process, so that extraction answers are predicted; in addition, the DFGN also models the effect of the problem in the graph construction process, the BiAttentention is adopted to update the problem representation, a dynamic graph is constructed according to the matching degree of the problem representation and the node representation, and meanwhile, the problem representation is continuously updated in the iteration process.

In other non-graphical neural network methods, Jianxing Yu, Zhengjun Zha, Jian Yin and the like design an inference neuron (inference neuron comparison: answer Questions by recovery from the evaluation Chain from text. ACL 2019: 2241-. The inference neuron comprises a memory vector, a read operation unit, a write operation unit and a controller operation unit, wherein the controller unit generates a series of attention-based operations based on problems, the read operation unit reads related contents according to operation instructions of the controller, the write unit generates a new result according to the controller operations and the read unit results and updates the memory vector, the inference neuron is recursively linked together, and the output of the previous step is the result of the next step; in addition, due to the uncertainty of the inference depth of different samples, the termination action of the inference process is dynamically decided, and the whole network is trained through reinforcement learning.

The models proposed by Sewon, Victor Zhong, Luke Zettlemoyer, Hannaneh Hajishirzi et al decompose the problem into a number of simple sub-problems (Multi-hop Reading composition through Question and recoring. ACL 2019: 6097-. In order to easily acquire labeled data for decomposing sub-problems, the sub-problems are formed into fragments of the original problem, the sub-problem generation problem is changed into a fragment prediction problem, and the training of the part can be as effective as human labeling by only using 400 pieces of manually labeled data. In addition, a method of re-scoring the sub-questions and answers to select the best answer is also proposed. Self-integrated modeling Networks (Self-Assembling Modular Networks for inter-predictive Multi-Hop learning. EMNLP/IJCNLP 2019:4473-4483) proposed by Yiche Jiang, Mohit Bansal et al, adopt a neural network mode simulation stack to build a Self-integrated Modular neural network, and can completely and automatically integrate sub-problems together for disassembly and combination.

However, most of the current models do not process inference types of different categories respectively, and most of the inference processes of the models for modeling are complex.

The method is mainly characterized in that different subproblems are generated and split into different subtasks according to the characteristics of different reasoning problems in a data set, and the subtasks are completed in a hierarchical progressive mode to predict answers.

Disclosure of Invention

In order to solve the above problems, the invention provides a problem answer extraction method based on multilayer perception and an electronic device, which control different sub-modules to perform hierarchical structure combination through a simple problem classification mechanism, are simpler to realize, and are convenient to combine with other parts in a module mode.

In order to achieve the purpose, the invention adopts the following technical scheme:

a question answer extraction method based on multilayer perception comprises the following steps:

1) splicing a question and a plurality of target documents, inputting the spliced question and the target documents into a pre-training language model to obtain a representation Q of the question and a context representation P of the target documents, and interacting the representation Q and the context representation P to obtain a question representation u related to the documents and a document representation h fusing question information;

2) carrying out multi-layer perception classification on the problem representation u, obtaining the inference type of the problem, and generating a subproblem c through the representation Q according to the inference type, the problem representation u, the document representation h and the representation Q_tObtaining the answer attention distribution of the question in the target document, wherein t is the number of times of generating sub-questions;

3) and obtaining an answer prediction result of the question according to the answer attention distribution.

Further, the target document is obtained by the following steps:

1) inputting a plurality of original documents into a paragraph selection model consisting of a BERT model and a layer of linear classifiers;

2) and selecting paragraphs related to the problems in each original document according to a threshold value to obtain a plurality of target documents.

Further, the pre-trained language model includes a BERT model.

Further, the method of interacting the representation Q with the context representation P comprises: using a bidirectional attention mechanism; the step of generating the sub-questions comprises:

1) inputting the representation Q through a BilSTM network to obtain a problem vector qv;

2) passing problem vector qv, sub-problem c_t-1And problem representation u, get sub-problem c_t。

Further, the inference types include: bridging entity class or comparison class problems.

Further, if the inference type is the bridging entity class, obtaining the answer attention distribution by the following steps:

1) calling a Find function according to the problem representation u, the document representation h and the subproblems to generate an intermediate bridging entity att₁；

2) Att according to an intermediate bridging entity₁Question representation u, document representation h and subproblem c_tAnd calling a Transfer function to obtain the attention distribution of the answer.

Further, if the inference type is a comparison type question, obtaining an answer attention distribution by the following steps:

1) according to the question representation u, the document representation h and the subproblem c_tCalling two Find functions respectively to generate an intermediate bridging entity att₁Att with intermediate bridging entity₂；

2) Comparing intermediate bridging entities att by calling Compare function₁Att with intermediate bridging entity₂The answer attention distribution is obtained.

Further, the method for obtaining the predicted result of the answer to the question comprises the following steps: denote context by C'^(t)Inputting a plurality of LSTM layers which are stacked layer by layer and do not share parameters; the answer prediction result comprises the following steps: one or more of a related evidence sentence, an answer start position, an answer end position, and an answer type.

A storage medium having stored therein a computer program, wherein the computer program is arranged to perform the above-mentioned method when executed.

An electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer to perform the method as described above.

Compared with the prior art, the invention has the following positive effects:

1) the simple form of splitting the subproblems is provided to answer the questions in a hierarchical progressive mode, the problem splitting does not need to be supervised, and the effect of reasoning, reading and understanding is improved.

2) The inference category classifier is introduced to control the splitting, and the subtask module is used for sharing and answering the questions, so that the effect of reasoning, reading and understanding is improved.

Drawings

FIG. 1 is a schematic flow chart of the present invention.

FIG. 2 is a block diagram of a bridge entity class problem resolution framework according to the present invention.

FIG. 3 is a schematic diagram of a comparative problem disassembly frame of the present invention.

Detailed Description

In order to make the aforementioned and other features and advantages of the invention more comprehensible, embodiments accompanied with figures are described in detail below.

The invention classifies inference categories in the hotspot qa dataset. There are two main classes, the bridging entity class and the comparison class. If the question is of a bridge entity type, the model processes the answer prediction process into two hierarchically stacked subtasks, wherein the first layer is to search for an intermediate bridge entity through a search module, and the second layer is to lock a final answer through a conversion module by using a bridge entity representation output by the first layer and question and context content; if the question is of a comparative type, the model processes the answer prediction process into two hierarchically stacked subtasks, wherein the first layer is to find two related entity parts in the model through two searching modules, and the second layer is to compare entity representations output in the first layer through a comparison module to predict a final answer. The related subtasks are realized through three functions, the Find function realizes the finding of the subtasks, the Transfer function realizes the locking of answers through a bridging entity, and the Compare function realizes the comparison of two entity representations to obtain the answers. Problem disassembly frame as illustrated in the figure:

as shown in fig. 1, the frame adopted by the present invention is integrally divided into three parts: 1) a paragraph selection module; 2) a semantic coding module; 3) and a hierarchical answer prediction module. The paragraph selection module screens a plurality of documents to filter out irrelevant documents and avoid overlarge input document length. The semantic coding module codes the question and the document into vector representation with context semantic information. And the hierarchical answer prediction module is used for respectively processing the questions with different reasoning types and predicting the final evidence sentence and the answer. The invention is mainly characterized in that the hierarchical answer prediction module can be divided into a classification controller, a subproblem generator and three subtask executors.

The first process is as follows: a paragraph selection module.

The paragraph selection module uses a BERT model (acob Derlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova: BERT: Pre-training of Deep Bidirectional transformations for navigation unrestanding. NAACL-HLT 2019:4171-4186) and a layer of linear classifier to fine-tune to obtain a paragraph selection model, independently judges whether a problem is related to a paragraph, and sets a threshold value to be 0.3 to select the more related paragraph. This is a selection under the condition of ensuring the recall rate, and the total length of the recalled relevant documents substantially satisfies the maximum input length 512 of the next stage.

And a second process: and a semantic coding module.

The semantic coding layer codes the question and context documents into a vector representation with context semantic information. The question and all relevant documents of the question are spliced together to form the input to the coding module, which uses a pre-trained BERT model. After encoding, a representation is obtained

And

where R represents a set of real numbers, L and N are the lengths of the question and context, respectively, d₁Is the dimension size of the BERT hidden layer.

Then, utilizeA two-way Attention mechanism (Min Joon Seo, Aniruddha Kembhavi, Ali Farhadi, Hannaneh Hajisi: Bidirectional Attention Flow for Machine compatibility. ICLR 2017) interactively models problems and contexts. The model uses a two-way attention mechanism to interactively model the problem and the context and learn the problem representation related to the document

And problem-related document representation

Wherein d is₂It is the word of the output that represents the dimension size.

The third process: and a hierarchical answer prediction module.

The input of the question reasoning type discriminator is a question representation u, and the question word representation obtained in the coding stage is subjected to two classifications by a multilayer perceptron to obtain the reasoning type of the question.

Further, if the inference type is the bridging entity class, as shown in fig. 2, the model first calls the Find function to generate an intermediate bridging entity att₁Then, a Transfer function is called to obtain the attention distribution of the relevant answer according to the bridging entity. If the inference type is a comparison type problem, as shown in FIG. 3, the model will invoke the Find function twice to obtain two related entities att₁And att₂The Compare function is then invoked to get the attention distribution of the final answer by comparing the two related entities.

Find function first solves sub-problem c_tInjected into the problem-related document representation h to obtain h' ═ h |, c_tNext, a problem-related contextual representation is obtained through a two-way attention mechanism

The specific process is as follows:

M_j，s＝W₁u_j+W₂h’_s+W₃(u_j⊙h′_s)

further, M is a similarity matrix in a bidirectional attention mechanism, W is a trainable parameter, c_qAnd q is_cRespectively, the context attention related to the problem and the problem obtained by calculation in the bidirectional attention mechanism, p is the calculated attention weight, s is the number of contexts, J is the dimension implicitly represented in the context sequence, M is the maximum one-dimensional value of M, u is the maximum one-dimensional value of M_jIs the jth word representation in the question representation.

Finally, the original dimension of the input problem is compressed by linear transformation to be used as the final output, and the final output is used as the attention distribution of the related entity and is marked as att₁。

The Transfer function firstly obtains a bridge entity representation b through an attention mechanism calculation, and then injects the bridge entity representation into the context representation to obtain h_bThen, the Find in Transfer is reused_transFunction finding definiteAnd (4) locating the position of the final answer, wherein the function is designed to be identical to the Find function, so that the related textual context expression is obtained. C of the same section_tA subproblem generator is also required to generate. The specific process is as follows:

h_b＝h⊙b

further, att₁Is the attention distribution of the bridging entity, h_sIs the s-th word in the context representation h.

Inputs to the Compare function are attentions att related to two entities₁And att₂Respectively, the attention distributions of the related entities generated by the Find function from the two subproblems. It is therefore necessary here to obtain representation information hs relating to two entities₁And hs₂Finally, the two pieces of representation information and the subproblems are spliced and combined to obtain information o required by comparison_inThese information are then input into the multi-level perceptron for comparison. The overall idea is to aggregate the two attention distributions and compare them according to the sub-question representations to obtain the answer. From the above, we can obtain a final attention distribution, which is used to predict the start and end positions of the answer segment. The specific formula is as follows:

o_in＝[c_t；hs₁；hs₂；c_t·(hs₁-hs₂)]

h_c＝W₁·(ReLU(W₂·o_in))

further, W is a trainable model parameter.

Meanwhile, each time the model calls the Find, Transfer and Compare functions, the subproblem solved by the current function is calculated by the subproblem generator, and the calculation process represented by the subproblem is as follows:

q_t＝W_1，t·qv+b_1，t

cq_t＝W₂·[c_t-1；q_t]+b₂

e_t，j＝W₄·(cq_t·u_j)+b₄

cv_t＝Softmax(e_t)

where qv represents the problem vector, produced by BilSTM. As is known from the second process,

qv ═ bilstm (q), and the concatenation of the implied variables end-to-end is the value of qv. c. C_tRepresenting the sub-problem representation of the current computation. Where both W and b are trainable parameters. In the calculation process, the problem representation and the previous subproblem representation are fused to obtain cq_tAnd thus a representation of the current sub-problem is computed by means of an attention mechanism.

The resulting probability distribution representing the computed evidence and answers is then used as input to the prediction layer. When the problem is a bridging entity class, the input of the prediction layer is the output of the Transfer function; when the problem is a comparison type problem, the input of the prediction layer is the output of the Compare function. The output of the prediction layer has four dimensions, including the relevant evidence sentences, the start position of the answer, the end position of the answer, and the type of the answer. The prediction layer uses a vertical structure design to solve the problem between outputsDependency, four LSTM layers that do not share parameters are stacked together by layer. The context representation of the last round of reasoning module is the input of the first layer of LSTM, each layer of LSTM will output a probability distribution

Cross entropy is then calculated using these probability distributions. The specific LSTM stacking is as follows:

O_sup＝F₀(C^(t))

O_start＝F₁([C^(t)，O_sup])

O_end＝F₂([C^(t)，O_sup，O_start])

O_type＝F₃([C^(t)，O_sup，O_start])

further, F₀，F₁，F₂，F₃Respectively four multi-layer sensors, O_supIs used to predict the evidence-representing probability distribution, O_startAnd O_endProbability distributions, O, for predicting the start and end positions of the answer, respectively_typeIs the probability distribution used to predict the answer type.

The four cross entropy loss functions are finally jointly optimized.

L＝L_start+L_end+λ_sL_sup+λ_tL_type

Further, L_start，L_end，L_sup，L_typeAre each O_sup，O_start，O_end，O_typeA loss function, lambda, obtained by calculating a cross entropy loss function with the real label_sAnd λ_tAre the hyper-parameters for calculating evidence prediction loss and answer type loss, respectively.

In experimental effect, the present invention performed experiments on the HotpotQA inferential reading comprehension data set (Zhilin Yang, Pen Qi, Saizheng Zhang, Yoshua Bengio, William W. Cohen, Ruslan Salakhutdinov, Christopher D. manning: HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question answering. EMNLP 2018: 2369-. There were 90247 samples for training data and 7405 samples for validation data. Statistical results of the bridge class and comparative class theory problems in the data set are shown in the table:

data set	Bridging class	Comparison class	All are provided with
				Training set	17456	72991	90247
Verification set	1487	5918	7405

Table 1: statistical results of bridging-class and comparison-class problems in HotpotQA

The evaluation indexes of the present invention are the EM value and the F1 value. The EM value is the proportion condition of completely consistent comparison between the predicted answer and the real answer, and the F1 value comprehensively measures the accuracy and the recall rate of the predicted result and the real result.

The problem classifier performance of the present invention is as follows:

	correct sample	Error samples	Rate of accuracy
				Question classifier	7375	30	99.59％

Table 2: performance evaluation of problem classifiers

The invention was compared to the mainstream method, where the last line is the model proposed by the invention, and the specific results are shown in table 1. It can be seen that the model proposed by the invention exceeds the performance of many mainstream models, and the effectiveness of the method proposed by the invention is proved.

The method of the present invention has been described in detail by way of the form expression and examples, but the specific form of implementation of the present invention is not limited thereto. Various obvious changes and modifications can be made by one skilled in the art without departing from the spirit and principles of the process of the invention. The protection scope of the present invention shall be subject to the claims.

Claims

1. A question answer extraction method based on multilayer perception comprises the following steps:

1) a problem is solved withSplicing the dry target documents, inputting the spliced dry target documents into a pre-training language model to obtain the representation of the problem

Contextual representation with target document

Will represent

And context representation

Interacting to obtain document-related problem representation

Document representation with fused problem information

；

2) To problem representation

Carrying out multi-layer perception classification, and obtaining inference types of the problems, wherein the inference types comprise: bridging entity class or comparison class problems;

3) if the inference type is the bridging entity class, expressing according to the problem

Document representation

And subproblems, calling Find function, generating intermediate bridging entity

And according to intermediate bridging entities

Problem representation

Document representation

And sub-problems

Calling a Transfer function to obtain answer attention distribution;

if the inference type is a comparative problem, expressing the problem according to the problem

Document representation

And sub-problems

Respectively calling the Find function twice to generate an intermediate bridging entity

With intermediate bridging entities

And comparing intermediate bridging entities by calling Compare function

With intermediate bridging entities

Obtaining the attention distribution of the answer;

4) and obtaining an answer prediction result of the question according to the answer attention distribution.

2. The method of claim 1, wherein the target document is obtained by:

3. A method as recited in claim 1, the pre-training language model comprising a BERT model.

4. The method of claim 1, wherein the representation is to be represented

And context representation

The interaction method comprises the following steps: a bidirectional attention mechanism is used; the step of generating the sub-questions comprises:

1) will represent

Inputting through a BilSTM network to obtain a problem vector

；

2) Passing problem vectors

Sub-problems of

And problem representation

Get a sub-problem

。

5. The method of claim 1, wherein obtaining the predicted result of the answer to the question comprises: representing context

Inputting a plurality of LSTM layers which are stacked layer by layer and do not share parameters; the answer prediction results include: one or more of a related evidence sentence, an answer start position, an answer end position, and an answer type.

6. A storage medium having a computer program stored thereon, wherein the computer program is arranged to, when run, perform the method of any of claims 1-5.

7. An electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the method according to any of claims 1-5.