WO2023225858A1 - 一种基于常识推理的阅读型考题生成系统及方法 - Google Patents

一种基于常识推理的阅读型考题生成系统及方法 Download PDF

Info

Publication number
WO2023225858A1
WO2023225858A1 PCT/CN2022/094741 CN2022094741W WO2023225858A1 WO 2023225858 A1 WO2023225858 A1 WO 2023225858A1 CN 2022094741 W CN2022094741 W CN 2022094741W WO 2023225858 A1 WO2023225858 A1 WO 2023225858A1
Authority
WO
WIPO (PCT)
Prior art keywords
graph
model
vector
text
entity
Prior art date
Application number
PCT/CN2022/094741
Other languages
English (en)
French (fr)
Inventor
余建兴
林妙培
王世祺
印鉴
Original Assignee
中山大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中山大学 filed Critical 中山大学
Priority to PCT/CN2022/094741 priority Critical patent/WO2023225858A1/zh
Publication of WO2023225858A1 publication Critical patent/WO2023225858A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology

Definitions

  • the present invention relates to the field of artificial intelligence, and more specifically, to a reading test question generation system based on common sense reasoning.
  • questioning studies the machine's ability to understand the semantics of a given text from another perspective, and therefore can support many useful applications.
  • questioning can be used as a data enhancement strategy to reduce the cost of manual annotation of question and answer corpora.
  • Interactive questions can also open up new topics for cold starts in the dialogue system; and question-based feedback can also promote information acquisition in search engines.
  • Questioning is a cognitively demanding process that requires varying levels of understanding. Simple questions often only touch on the shallow meaning of the text, which can be well solved by context-based word matching. In practical applications, complex comprehensive examination questions have greater application value.
  • test questions need to be solvable, and the solution results need to be consistent with the given answers, or these solution processes are not simple literal matching, but need to involve common sense reasoning.
  • Traditional methods lack common sense modeling of this key reasoning process and implications, and research on how to use this knowledge to guide questioning directions is weak. This leads to a logical gap, that is, the machine cannot know what to ask and how to ask. Ask, in the end it can only output superficial and simple questions.
  • test questions should also meet many types of language requirements. For example, the generated results need to have correct grammar, accurate semantics, and the questions must be valid and solvable, etc. Otherwise, generating test questions with a lot of typos, lack of logic, and even unreasonable or unsolvable questions will lead to a poor user experience.
  • These languages require representation in discrete form, and conventional methods are difficult to integrate them into neural networks with continuous representation.
  • the invention provides a reading-type test question generation system based on common sense reasoning, which can generate test questions with correct grammar and consistent content.
  • Another object of the present invention is to provide a method for generating reading test questions based on common sense reasoning.
  • a reading test question generation system based on common sense reasoning which is characterized by including:
  • the inference clue graph extraction module is used to derive the inference clue graph based on the text context starting from the given answer: identify all entities and relationships of the input system text, and use context dependencies to construct the entity graph; select the question content from the entity graph , while retrieving relevant entities from the external common sense database to expand the entity graph to form a reasoning clue graph;
  • the graph-guided question generation model module is used to generate high-quality test questions by combining all text and entity graphs, as well as multi-hop and common sense knowledge in the entity graphs;
  • the posterior constraint learning module of language knowledge is used to train the graph-guided question generation model and learn the optimal model parameters.
  • a reading test question generation method based on common sense reasoning including the following steps:
  • the inference clue graph extraction module starts from the given answer and deduces the inference clue graph according to the text context: identifies all entities and relationships of the input system text, uses context dependencies to construct the entity graph; selects the question content from the entity graph, At the same time, relevant entities are retrieved from the external common sense database to expand the entity graph to form a reasoning clue graph;
  • the graph-guided question generation model module combines all text and entity graphs, as well as multi-hop and common sense knowledge in entity graphs to generate high-quality exam questions;
  • S3 The posterior constraint learning module of language knowledge trains the graph-guided question generation model to learn the optimal model parameters.
  • the inference clue graph extraction module represents each input text sentence as a parsing tree.
  • Each parsing tree node contains several entities and edges, where the edges represent contextual relationships and the punctuation marks in each node are filtered out. and stop words, aggregating equivalent nodes and coreference nodes in the parsing tree; adding inter-tree connecting edges between similar nodes in adjacent sentences to obtain an entity graph with potential clues.
  • step S1 the process of selecting question content from the entity diagram is:
  • Sentences containing answer keywords are identified through exact word matching or related sentences identified using a classifier based on the Rouge-L similarity metric, where the keywords are the remaining words after filtering the answer to stop words.
  • step S1 the method of retrieving relevant entities from an external common sense database to expand the solid map is to use the entities of the input text as query conditions and retrieve relevant entities from an external open source common sense database through word matching.
  • a graph-enhanced encoder is designed to combine all texts and entity graphs to complete the fusion of heterogeneous features of texts and graphs.
  • the graph-enhanced encoder consists of six layers of cascade, each layer includes :
  • Text self-attention sub-layer responsible for encoding text content and performing non-linear transformation on the given text input representation vector to obtain a new vector as the input of the next sub-layer:
  • Graph attention sub-layer Considering that the nodes of the graph contain multiple words, through Characterize each node, where is the distributed embedding representation of the j-th word of the i-th node, m and n are respectively the start and end positions of the text fragment in the node, Represents the attention distribution of nodes and is used to express the importance of nodes; Defined as softmax(ReLU(W R [G; w j ])), where gi is the i-th column of matrix G and WR is a trainable parameter; subsequently, nodes are enriched by weighted aggregation of relevant semantics of neighboring nodes Contextual representation, where the weight is dynamically determined by the attention mechanism; in order to obtain this structural context information, the present invention obtains the edge correlation score by calculating the dot product between adjacent nodes i and j, Among them, ⁇ ij represents the relationship of nodes, which is learned from the corresponding relationship type, and is a trainable parameter; by normalizing the correlation
  • Feedforward sublayer convert text vector z i and image vector Fusion together, since splicing will introduce a lot of noise, use a gating mechanism to obtain significant features and reduce noise, such as formula (3), where ⁇ represents element multiplication, f is a fusion vector, and eta is a learnable gate Control, used to selectively control features at different angles:
  • the output code h i obtained through this transformation is regarded as the input of the next layer; after multiple layers of operations, the final characterization.
  • step S2 the output matrix h of the last layer of the graph-enhanced encoder is regarded as a representation vector that integrates all input contents, and another converter is used to perform autoregressive decoding to generate test questions; the test questions are based on Probability Generated word by word, similar to the graph-augmented encoder, the decoder consists of multiple layers cascaded.
  • the output of the l th layer will consider both parts, including those from the previous step
  • the decoding results in are from the attention, and from the encoding content
  • the attention representation of these two parts are fused through a multi-head attention layer and are expressed as After multiple layers of nonlinear transformation, the output vector can be obtained By normalizing this vector on the predefined vocabulary V through the logistic regression function Softmax, the output probability of the word can be obtained.
  • W o and bo represent trainable parameters
  • a copy mechanism is used according to the distribution Generate test questions by transcribing new words from the input text, where ⁇ is the attention of the input text; define a balance factor
  • the words of the test questions can be generated one by one, where k is the balance factor, f( ⁇ ) is the feedforward neural network with Sigmoid activation function, y t-1 represents the generation in the t-1th step
  • Distributed embedding representation vectors of words in order to avoid the problem of semantic drift, that is, questions where the answer is inconsistent with the question, answer encoding is used
  • a special tag ⁇ eos> is also introduced to indicate the termination time of the generation process:
  • step S3 supervised learning is used to train the graph-guided question generation model, that is, optimal model parameters are learned by maximizing the log-likelihood probability, refer to formula (5), where represents a training set containing N samples, and K represents the size of the words in the test questions.
  • K represents the size of the words in the test questions.
  • the present invention introduces a series of linguistic knowledge as regularization constraints to standardize the output probability distribution of the results. Regularization is achieved by satisfying the expectations of the constraints. This is achieved by the KL divergence between the distribution d( ⁇ ) and the model’s output distribution p ⁇ ( ⁇ ); a hybrid objective is used to fuse the supervisory loss and the posterior loss into formula (6), where is in the form
  • step S3 three constraint functions are designed to improve the quality of model generation: common sense answerability, content association consistency, and expression grammar accuracy:
  • the construction process of the common sense answerability constraint function is: extract the paragraph sentences most relevant to the question by matching similarity, use the Spacy toolkit to extract entities such as verbs and noun phrases from the answers and the extracted sentences, and combine the results of the question Group into pairs, and by introducing the learnable parameter ⁇ 1 , the entity constraint knowledge is expressed as If the kth entity pair is semantically similar, then vice versa, is the weight of each entity pair obtained from the attention network, where the network parameter is ⁇ 1 ; through this representation, when there are no similar entities between the answer and the sentence related to the question, then f 1 ( ⁇ ) is positive; Otherwise it is negative;
  • the function F outputs a positive value, otherwise it outputs a negative value; by referring to the performance ranking of the evaluation data set,
  • the Unicorn model currently the best performing question and answer model, is selected to predict answers. By penalizing samples with inconsistent answers, the model can be prompted to generate results with consistent answers, including in It is a judgment function that verifies the consistency between the prediction results and the annotated answers;
  • the construction process of expressing the grammatical accuracy constraint function is to measure the smoothness of the generated results by calculating the perplexity of the language model.
  • PLM is based on the pre-trained Roberta language model, and K is the number of question words; for the same answer, the generated results are often similar to the annotated results in terms of semantic structure and grammatical pattern.
  • the word movement distance WMD is used to measure the two paragraphs.
  • step S3 the KL divergence optimization objective function in formula (6) is regarded as a knowledge refining problem, that is, transferring knowledge from the teacher model d containing constraints to the student's question generation model p ⁇ , so , this objective function is solved using the expectation-maximization EM algorithm; in the tth expectation calculation step, through the formula Calculate the probability distribution d of the teacher model; then, calculate the maximum expectation, that is, update the probability distribution ⁇ of the student model through formula (8) to approximate the probability distribution d of the teacher model, where is the trade-off factor, o is the annotated question probability distribution, and E is the expected accumulated error.
  • This distillation goal can strike a balance between the soft prediction of simulated d and the prediction of real results;
  • the present invention finds that the contextual association between sentences helps to connect the reasoning clues scattered in the sentences. With the help of the context of the text, the clue chain of reasoning is deduced from the answer, and various potential common sense knowledge is integrated into this chain. . Then the clue chain is used to guide the generation of test questions that can be reasoned; in addition, the present invention also integrates various grammatical and semantic knowledge into the model as a posteriori constraints, thereby generating test questions with correct grammar and consistent content; by grasping potential common sense entities and relationships a priori The reasoning process is used to generate logically reasonable test questions; these constraints are used as posterior knowledge, and various language knowledge is flexibly integrated into the generative model through regularization, making the results smoother, more consistent, and more answerable. .
  • Figure 1 shows an example of test questions that require common sense reasoning
  • FIG. 2 is a system block diagram of the present invention
  • Figure 3 is a flow chart of the method of the present invention.
  • a reading test question generation system based on common sense reasoning is characterized by:
  • the inference clue graph extraction module is used to derive the inference clue graph based on the text context starting from the given answer: identify all entities and relationships of the input system text, and use context dependencies to construct the entity graph; select the question content from the entity graph , while retrieving relevant entities from the external common sense database to expand the entity graph to form a reasoning clue graph;
  • the graph-guided question generation model module is used to combine all text and entity graphs, as well as multi-hop and common sense knowledge in entity graphs to generate high-quality test questions;
  • the posterior constraint learning module of language knowledge is used to train the graph-guided question generation model and learn the optimal model parameters.
  • a reading test question generation method based on common sense reasoning includes the following steps:
  • the inference clue graph extraction module starts from the given answer and deduces the inference clue graph according to the text context: identifies all entities and relationships of the input system text, uses context dependencies to construct the entity graph; selects the question content from the entity graph, At the same time, relevant entities are retrieved from the external common sense database to expand the entity graph to form a reasoning clue graph;
  • the graph-guided question generation model module combines all text and entity graphs, as well as multi-hop and common sense knowledge in entity graphs to generate high-quality exam questions;
  • S3 The posterior constraint learning module of language knowledge trains the graph-guided question generation model to learn the optimal model parameters.
  • the reasoning clue graph extraction module represents each input text sentence as a parsing tree.
  • Each parsing tree node contains several entities and edges, where the edges represent contextual relationships, and punctuation marks and stop words in each node are filtered out. Aggregate equivalent nodes and coreference nodes in the parse tree; add connecting edges between trees between similar nodes in adjacent sentences to obtain an entity graph with potential clues.
  • step S1 the process of selecting question content from the entity diagram is:
  • Sentences containing answer keywords are identified through exact word matching or related sentences identified using a classifier based on the Rouge-L similarity metric, where the keywords are the remaining words after filtering the answer to stop words.
  • step S1 the method of retrieving relevant entities from the external common sense database to expand the solid graph is: using the entities of the input text as query conditions, and retrieving relevant entities from the external open source common sense database through word matching.
  • a graph-enhanced encoder is designed to combine all texts and entity graphs to complete the fusion of heterogeneous features of texts and graphs.
  • the graph-enhanced encoder consists of six layers of cascades, each layer including:
  • Text self-attention sub-layer responsible for encoding text content and performing non-linear transformation on the given text input representation vector to obtain a new vector as the input of the next sub-layer:
  • Graph attention sub-layer Considering that the nodes of the graph contain multiple words, through Characterize each node, where is the distributed embedding representation of the j-th word of the i-th node, m and n are respectively the start and end positions of the text fragment in the node, Represents the attention distribution of nodes and is used to express the importance of nodes; Defined as softmax(ReLU(W R [G; w j ])), where gi is the i-th column of matrix G and WR is a trainable parameter; subsequently, nodes are enriched by weighted aggregation of relevant semantics of neighboring nodes Contextual representation, where the weight is dynamically determined by the attention mechanism; in order to obtain this structural context information, the present invention obtains the edge correlation score by calculating the dot product between adjacent nodes i and j, Among them, ⁇ ij represents the relationship of nodes, which is learned from the corresponding relationship type, and is a trainable parameter; by normalizing the correlation
  • Feedforward sublayer convert text vector z i and image vector Fusion together, since splicing will introduce a lot of noise, use a gating mechanism to obtain significant features and reduce noise, such as formula (3), where ⁇ represents element multiplication, f is a fusion vector, and eta is a learnable gate Control, used to selectively control features at different angles:
  • the output code h i obtained through this transformation is regarded as the input of the next layer; after multiple layers of operations, the final characterization.
  • step S2 the output matrix h of the last layer of the graph-enhanced encoder is regarded as a representation vector that integrates all input contents, and another converter is used to perform autoregressive decoding to generate test questions; the test questions are based on probability Generated word by word, similar to the graph-augmented encoder, the decoder consists of multiple layers cascaded.
  • the output of the lth layer will consider both parts, including those from the previous step
  • the decoding results in are from the attention, and from the encoding content
  • the attention representation of these two parts are fused through a multi-head attention layer and are expressed as After multiple layers of nonlinear transformation, the output vector can be obtained By normalizing this vector on the predefined vocabulary V through the logistic regression function Softmax, the output probability of the word can be obtained.
  • W o and bo represent trainable parameters
  • a copy mechanism is used according to the distribution Generate test questions by transcribing new words from the input text, where ⁇ is the attention of the input text; define a balance factor
  • the words of the test questions can be generated one by one, where k is the balance factor, f( ⁇ ) is the feedforward neural network with Sigmoid activation function, y t-1 represents the generation in the t-1th step
  • Distributed embedding representation vectors of words in order to avoid the problem of semantic drift, that is, questions where the answer is inconsistent with the question, answer encoding is used
  • a special tag ⁇ eos> is also introduced to indicate the termination time of the generation process:
  • step S3 supervised learning is used to train the graph-guided question generation model, that is, the optimal model parameters are learned by maximizing the log-likelihood probability, refer to formula (5), where represents a training set containing N samples, and K represents the size of the words in the test questions.
  • the results generated by the model are close to the manual annotation in the samples:
  • step S3 three constraint functions, namely common sense answerability, content association consistency and expression grammar accuracy, are designed to improve the quality of model generation:
  • the construction process of the common sense answerability constraint function is: extract the paragraph sentences most relevant to the question by matching similarity, use the Spacy toolkit to extract entities such as verbs and noun phrases from the answers and the extracted sentences, and combine the results of the question Group into pairs, and by introducing the learnable parameter ⁇ 1 , the entity constraint knowledge is expressed as If the kth entity pair is semantically similar, then vice versa, is the weight of each entity pair obtained from the attention network, where the network parameter is ⁇ 1 ; through this representation, when there are no similar entities between the answer and the sentence related to the question, then f 1 ( ⁇ ) is positive; Otherwise it is negative;
  • the function F outputs a positive value, otherwise it outputs a negative value; by referring to the performance ranking of the evaluation data set,
  • the Unicorn model currently the best performing question and answer model, is selected to predict answers. By penalizing samples with inconsistent answers, the model can be prompted to generate results with consistent answers, including in It is a judgment function that verifies the consistency between the prediction results and the annotated answers;
  • the construction process of expressing the grammatical accuracy constraint function is to measure the smoothness of the generated results by calculating the perplexity of the language model.
  • PLM is based on the pre-trained Roberta language model, and K is the number of question words; for the same answer, the generated results are often similar to the annotated results in terms of semantic structure and grammatical pattern.
  • the word movement distance WMD is used to measure the two paragraphs.
  • step S3 the KL divergence optimization objective function in formula (6) is regarded as a knowledge extraction problem, that is, transferring knowledge from the constrained teacher model d to the student's question generation model p ⁇ . Therefore, this objective function uses Expectation-maximization EM algorithm to solve; in the tth expectation calculation step, through the formula Calculate the probability distribution d of the teacher model; then, calculate the maximum expectation, that is, update the probability distribution ⁇ of the student model through formula (8) to approximate the probability distribution d of the teacher model, where is the trade-off factor, o is the annotated question probability distribution, and E is the expected accumulated error.
  • This distillation goal can strike a balance between the soft prediction of simulated d and the prediction of real results;
  • a reading test question generation system based on common sense reasoning is characterized by:
  • the inference clue graph extraction module is used to derive the inference clue graph based on the text context starting from the given answer: identify all entities and relationships of the input system text, and use context dependencies to construct the entity graph; select the question content from the entity graph , while retrieving relevant entities from the external common sense database to expand the entity graph to form a reasoning clue graph;
  • the graph-guided question generation model module is used to generate high-quality test questions by combining all text and entity graphs, as well as multi-hop and common sense knowledge in the entity graphs;
  • the posterior constraint learning module of language knowledge is used to train the graph-guided question generation model and learn the optimal model parameters.
  • a method applied to the above-mentioned reading test question generation system based on common sense reasoning includes the following steps:
  • the inference clue graph extraction module starts from the given answer and deduces the inference clue graph according to the text context: identifies all entities and relationships of the input system text, uses context dependencies to construct the entity graph; selects the question content from the entity graph, At the same time, relevant entities are retrieved from the external common sense database to expand the entity graph to form a reasoning clue graph;
  • the graph-guided question generation model module combines all text and entity graphs, as well as multi-hop and common sense knowledge in entity graphs to generate high-quality exam questions;
  • S3 The posterior constraint learning module of language knowledge trains the graph-guided question generation model to learn the optimal model parameters.
  • the reasoning clue graph extraction module represents each input text sentence as a parsing tree.
  • Each parsing tree node contains several entities and edges, where the edges represent contextual relationships, and punctuation marks and stop words in each node are filtered out. Aggregate equivalent nodes and coreference nodes in the parse tree; add connecting edges between trees between similar nodes in adjacent sentences to obtain an entity graph with potential clues.
  • step S1 the process of selecting question content from the entity diagram is:
  • Sentences containing answer keywords are identified through exact word matching or related sentences identified using a classifier based on the Rouge-L similarity metric, where the keywords are the remaining words after filtering the answer to stop words.
  • step S1 the method of retrieving relevant entities from the external common sense database to expand the solid graph is: using the entities of the input text as query conditions, and retrieving relevant entities from the external open source common sense database through word matching.
  • a graph-enhanced encoder is designed to combine all texts and entity graphs to complete the fusion of heterogeneous features of texts and graphs.
  • the graph-enhanced encoder consists of six layers of cascades, each layer including:
  • Text self-attention sub-layer responsible for encoding text content and performing non-linear transformation on the given text input representation vector to obtain a new vector as the input of the next sub-layer:
  • Graph attention sub-layer Considering that the nodes of the graph contain multiple words, through Characterize each node, where is the distributed embedding representation of the j-th word of the i-th node, m and n are respectively the start and end positions of the text fragment in the node, Represents the attention distribution of nodes and is used to express the importance of nodes; Defined as softmax(ReLU(W R [G; w j ])), where gi is the i-th column of matrix G and WR is a trainable parameter; subsequently, nodes are enriched by weighted aggregation of relevant semantics of neighboring nodes Contextual representation, where the weight is dynamically determined by the attention mechanism; in order to obtain this structural context information, the present invention obtains the edge correlation score by calculating the dot product between adjacent nodes i and j, Among them, ⁇ ij represents the relationship of nodes, which is learned from the corresponding relationship type, and is a trainable parameter; by normalizing the correlation
  • Feedforward sublayer convert text vector z i and image vector Fusion together, since splicing will introduce a lot of noise, use a gating mechanism to obtain significant features and reduce noise, such as formula (3), where ⁇ represents element multiplication, f is a fusion vector, and eta is a learnable gate Control, used to selectively control features at different angles:
  • the output code h i obtained through this transformation is regarded as the input of the next layer; after multiple layers of operations, the final characterization.
  • step S2 the output matrix h of the last layer of the graph-enhanced encoder is regarded as a representation vector that integrates all input contents, and another converter is used to perform autoregressive decoding to generate test questions; the test questions are based on probability Generated word by word, similar to the graph-augmented encoder, the decoder consists of multiple layers cascaded.
  • the output of the l th layer will consider both parts, including those from the previous step
  • the decoding results in are from the attention, and from the encoding content
  • the attention representation of these two parts are fused through a multi-head attention layer and are expressed as After multiple layers of nonlinear transformation, the output vector can be obtained By normalizing this vector on the predefined vocabulary V through the logistic regression function Softmax, the output probability of the word can be obtained.
  • W o and bo represent trainable parameters
  • a copy mechanism is used according to the distribution Generate test questions by transcribing new words from the input text, where ⁇ is the attention of the input text; define a balance factor
  • the words of the test questions can be generated one by one, where k is the balance factor, f( ⁇ ) is the feedforward neural network with Sigmoid activation function, y t-1 represents the generation in the t-1th step
  • Distributed embedding representation vectors of words in order to avoid the problem of semantic drift, that is, questions where the answer is inconsistent with the question, answer encoding is used
  • a special tag ⁇ eos> is also introduced to indicate the termination time of the generation process:
  • step S3 supervised learning is used to train the graph-guided question generation model, that is, the optimal model parameters are learned by maximizing the log-likelihood probability, refer to formula (5), where represents a training set containing N samples, and K represents the size of the words in the test questions.
  • the results generated by the model are close to the manual annotation in the samples:
  • step S3 three constraint functions, namely common sense answerability, content association consistency and expression grammar accuracy, are designed to improve the quality of model generation:
  • the construction process of the common sense answerability constraint function is: extract the paragraph sentences most relevant to the question by matching similarity, use the Spacy toolkit to extract entities such as verbs and noun phrases from the answers and the extracted sentences, and combine the results of the question Group into pairs, and by introducing the learnable parameter ⁇ 1 , the entity constraint knowledge is expressed as If the kth entity pair is semantically similar, then vice versa, is the weight of each entity pair obtained from the attention network, where the network parameter is ⁇ 1 ; through this representation, when there are no similar entities between the answer and the sentence related to the question, then f 1 ( ⁇ ) is positive; Otherwise it is negative;
  • the function F outputs a positive value, otherwise it outputs a negative value; by referring to the performance ranking of the evaluation data set,
  • the Unicorn model currently the best performing question and answer model, is selected to predict answers. By penalizing samples with inconsistent answers, the model can be prompted to generate results with consistent answers, including in It is a judgment function that verifies the consistency between the prediction results and the annotated answers;
  • the construction process of expressing the grammatical accuracy constraint function is to measure the smoothness of the generated results by calculating the perplexity of the language model.
  • PLM is based on the pre-trained Roberta language model, and K is the number of question words; for the same answer, the generated results are often similar to the annotated results in terms of semantic structure and grammatical pattern.
  • the word movement distance WMD is used to measure the two paragraphs.
  • step S3 the KL divergence optimization objective function in formula (6) is regarded as a knowledge extraction problem, that is, transferring knowledge from the constrained teacher model d to the student's question generation model p ⁇ . Therefore, this objective function uses Expectation-maximization EM algorithm to solve; in the tth expectation calculation step, through the formula Calculate the probability distribution d of the teacher model; then, calculate the maximum expectation, that is, update the probability distribution ⁇ of the student model through formula (8) to approximate the probability distribution d of the teacher model, where is the trade-off factor, o is the annotated question probability distribution, and E is the expected accumulated error.
  • This distillation goal can strike a balance between the soft prediction of simulated d and the prediction of real results;

Abstract

本发明提供一种基于常识推理的阅读型考题生成系统及方法,本发明首先通过答案为起点反推出推理线索图,以及其所涉及的相关实体和常识关系;然后以这个带有上下文结构的线索图为先验知识来指导生成考题,从而提高结果的逻辑合理性。此外,本发明还引入多种语言知识作为后验约束来正则化生成器;促使提高在常识可回答性、内容相关性和语法有效性等方面的考题质量;通过先验和后验的正则化联合学习,进而生成更加通顺和可推理的结果。

Description

一种基于常识推理的阅读型考题生成系统及方法 技术领域
本发明涉及人工智能领域,更具体地,涉及一种基于常识推理的阅读型考题生成系统。
背景技术
出考题是一件非常耗费人力物力的事情,特别是涉及如高考等社会关注度高的考试,人们对出题者在廉政方面的要求很高,对泄题等违纪行为零容忍。而且人力出题的主观性较大。这种需求就促使了机器自动出题的快速发展。这种任务不但能够极大地减低人力成本,而且保密性和客观性更好,有利于考试的公平性。机器出考题任务逐步成为人工智能和自然语言处理领域的研究热点。它需要根据给定的文本来生成连贯的和与答案相关的问题。作为问答的双重任务,提问从另一个角度研究机器对给定文本语义的理解能力,因此可以支持许多有用的应用。譬如提问可以作为一种数据增强策略来降低问答语料库的人工标注成本。通过交互提问还可以在对话系统中为冷启动开辟新的话题;而且通过提问式的反馈还能以促进搜索引擎中的信息获取。提问是一个对认知要求很高的过程,需要不同水平的理解能力。简单的提问通常只会涉及文本的浅层含义,这可以通过基于上下文的词匹配得到了很好的解决。在实际应用中,复杂的综合考题具有更大的应用价值,譬如在教育领域,为了实现素质教育,教育部要求在小学的考试题中简单的字面匹配题占比不得超过5%,鼓励增加逻辑思考的综合题,特别是常识推理考题。相比于简单的字面匹配题,这种逻辑和常识相关的考题能够更好地评估学习效果,激发学生的自主学习能力。然而,要自动生成这种常识推理题并不容易,机器需要对散布在文本中的多个实体线索进行深入的思考和推理,甚至还需要对机器所缺乏的外部世界知识和人们生产生活中所约定俗成的常识进行理解。譬如图1所示,考题询问与公园中的喷泉有关的一个地方。与简单的字面匹配题不同,该考题的提问和答案之间并没有字面相似的之间关联。然而,通过关联给定文本中的多个证据线索(即喷泉、自由女神像)以及外部常识关系(即自由女神像、位于、纽约市)、(纽约市、部分、美国)、(美国、首都、华盛顿)、(白宫、 位于、华盛顿),却可以从逻辑关联提问和答案。这样的多跳推理链对于提问方向和回答过程至关重要。
传统的生成方法主要依靠人工制订的规则或模板将输入的文本转换为有提问。这些规则和模板很容易过度设计,导致模型的通用性和可伸缩性很差。因此,目前主流的方法均采用数据驱动的神经模型。该模型将生成任务视为一个类似翻译的序列映射问题。即通过从大量的训练数据中学习序列的映射模式,把输入的文本映射或者说翻译成提问。然而,这种方法适合生成简单的字面匹配题,却很难生成需要综合理解的常识推理题。因为常识推理题不是将给定的内容转换或归纳为语义等价的形式,而是一个受到各种语法和语义限制的生成。生成的考题除了需要通顺流利之外,还需要通过具有可答性和可推理性。也就是说,提问需要可解,而且解答结果需要与给定的答案一致,或者这些解答过程不是简单的字面匹配,而不是需要涉及常识推理。传统方法缺乏对这种关键的推理过程和蕴含的常识建模,而且对如何使用这些知识来引导提问方向的研究很薄弱,这就会导致逻辑上的鸿沟,即机器无法知道该问什么和如何问,最终只能输出肤浅的简单题。此外,考题还应该满足许多类型的语言要求,如生成的结果需要有正确的语法,准确的语义,题目还是有效可解的等。否则,生成错别字一大堆,缺乏逻辑,甚至题目不通甚至不可解的考题会导致用户体验很差。而这些语言要求以离散的形式表示,常规的方法很难把其集成到具有连续表示的神经网络中。
发明内容
本发明提供一种基于常识推理的阅读型考题生成系统,该系统可生成语法正确和内容一致的考题。
本发明的又一目的在于提供一种基于常识推理的阅读型考题生成方法。
为了达到上述技术效果,本发明的技术方案如下:
一种基于常识推理的阅读型考题生成系统,其特征在于,包括:
推理线索图抽取模块,用于从给定的答案开始,根据文本上下文反推出推理线索图:识别输入系统文本的所有实体和关系,利用上下文依赖关联来构建实体图;从实体图中选择发问内容,同时从外部常识库中检索相关实体以扩充实体图来形成推理线索图;
图引导的提问生成模型模块,用于结合所有文本和实体图,以及实体图中的多跳和常识知识来生成高质量的考题;
语言知识的后验约束学习模块,用于对图引导的提问生成模型进行训练学习出最优的模型参数。
一种基于常识推理的阅读型考题生成方法,包括以下步骤:
S1:推理线索图抽取模块从给定的答案开始,根据文本上下文反推出推理线索图:识别输入系统文本的所有实体和关系,利用上下文依赖关联来构建实体图;从实体图中选择发问内容,同时从外部常识库中检索相关实体以扩充实体图来形成推理线索图;
S2:图引导的提问生成模型模块结合所有文本和实体图,以及实体图中的多跳和常识知识来生成高质量的考题;
S3:语言知识的后验约束学习模块对图引导的提问生成模型进行训练学习出最优的模型参数。
进一步地,所述步骤S1中,推理线索图抽取模块将每个输入文本句子表示成解析树,每个解析树节点包含若干实体和边,其中边表示上下文关系,过滤掉每个节点中标点符号和停用词,聚合解析树中的等价节点以及共指节点;在相邻句子中的相似节点之间增加树间的连接边,获得具有潜在线索的实体图。
进一步地,所述步骤S1中,从实体图中选择发问内容的过程是:
标记两种与答案相关的句子,从实体图中删除其余的句子:
通过单词精确匹配来识别含有答案关键字的句子或者使用分类器基于Rouge-L的相似度量指标来识别出的相关句子,其中关键词是对答案过滤停用词后剩余的词。
进一步地,所述步骤S1中,从外部常识库中检索相关实体以扩充实体图的方式是:使用输入文本的实体作为查询条件,通过单词匹配从外部开源的常识库中检索出相关实体。
进一步地,所述步骤S2中,设计图增强的编码器来结合所有文本和实体图,完成融合文本和图的异构特征,该图增强的编码器由六层级联组成,每一层都包括:
文本自注意子层:负责对文本内容进行编码,对给定的文本输入表征向量进行非线性变换,以获得一个新的向量作为下一子层的输入:
给定文本输入X=(x 1,…x T),其中x是每个单词的分布式嵌入表示,每个子层通过线性变换得到三个向量,包括键向量k i=x iW K、值向量v i=x iW V和查询 向量q i=x iW Q,其中W K、W V和W Q是可学习的矩阵参数;然后,通过点积计算查询向量与键向量之间的交互分数r ij,即
Figure PCTCN2022094741-appb-000001
通过逻辑回归函数Softmax对分数进行归一化,并通过公式(1)计算出注意力系数α ij,其中d x表示键向量的维度;通过对关注度和值向量的加权求和,就能得到上下文感知输出z i;第一个子层通过输入文本的表示向量来初始化,其中对每个文本的单词从预训练向量库中检索出对应的向量,把每个单词的向量组成一个向量来表示文本;收集最后一层的输出
Figure PCTCN2022094741-appb-000002
Figure PCTCN2022094741-appb-000003
来分别表示输入的文本段落和答案:
Figure PCTCN2022094741-appb-000004
Figure PCTCN2022094741-appb-000005
图注意力子层:考虑到图的节点包含多个单词,通过
Figure PCTCN2022094741-appb-000006
表征每个节点,其中
Figure PCTCN2022094741-appb-000007
是第i个节点的第j个词的分布式嵌入表示,m和n分别是节点中文本片段的开始和结束位置,
Figure PCTCN2022094741-appb-000008
表示节点的注意力分布,用于表示节点的重要性;将
Figure PCTCN2022094741-appb-000009
定义为softmax(ReLU(W R[G;w j])),其中g i是矩阵G的第i列,W R是可训练的参数;随后,通过加权聚合相邻节点的相关语义来丰富节点的上下文表征,其中权重由注意力机制来动态确定;为了获得这种结构上下文信息,本发明通过计算相邻节点i和j之间的点积来获得边的相关度分数,
Figure PCTCN2022094741-appb-000010
其中,τ ij表示节点的关系,这种关系从对应的关系类型中学习获得,
Figure PCTCN2022094741-appb-000011
Figure PCTCN2022094741-appb-000012
是可训练的参数;通过归一化节点所有连通边的相关分数,就能够计算每个节点的关注度
Figure PCTCN2022094741-appb-000013
参考公式(2),其中
Figure PCTCN2022094741-appb-000014
表示节点i的临近节点;通过对关注度进行加权求和,得到图结构感知的输出
Figure PCTCN2022094741-appb-000015
其中d x是键向量的维度大小,
Figure PCTCN2022094741-appb-000016
Figure PCTCN2022094741-appb-000017
表示可学习的参数:
Figure PCTCN2022094741-appb-000018
Figure PCTCN2022094741-appb-000019
前馈子层:将文本向量z i和图向量
Figure PCTCN2022094741-appb-000020
融合在一起,由于拼接会引入大量的噪 声,使用门控机制来获得显著的特征并降低噪声,如公式(3),其中⊙表示元素乘法,f是一个融合向量,η是一个可学习的门控,用于选择性地控制不同角度的特征:
Figure PCTCN2022094741-appb-000021
z i=η⊙f+(1-η)⊙z i
Figure PCTCN2022094741-appb-000022
此外,为了加强模型的泛化能力,引入了一个非线性变换,该变换通过带RELU激活函数的双层多层感知器MLP来实现,h i=max(0,z iW 1+b 1)W 2+b 2,其中W 1、W 2、b 1和b 2是可训练的参数,通过该变换获得的输出编码h i被视为下一层的输入;经过多层的运算来获得最终的表征。
进一步地,所述步骤S2中,图增强的编码器最后一层的输出矩阵h看成是融合了所有输入内容的表征向量,采用另一种转换器来做自回归解码生成考题;考题是基于概率
Figure PCTCN2022094741-appb-000023
逐个字地生成的,与图增强的编码器类似,解码器由多个层级联组成,在第t步,第l th层的输出
Figure PCTCN2022094741-appb-000024
会考虑了两个部分,包括来自于前一步骤
Figure PCTCN2022094741-appb-000025
中的解码结果自注意力,和来自编码内容
Figure PCTCN2022094741-appb-000026
的注意力表示,即
Figure PCTCN2022094741-appb-000027
Figure PCTCN2022094741-appb-000028
这两个部分通过一个多头的注意力层融合表示为
Figure PCTCN2022094741-appb-000029
在经过多个层的非线性变换后,就能得到输出向量
Figure PCTCN2022094741-appb-000030
通过逻辑回归函数Softmax在预定义的词汇表V上对该向量进行归一化,就能获得单词的输出概率
Figure PCTCN2022094741-appb-000031
其中W o和b o表示可训练的参数;
为了解决词库外新词的生成,采用复制机制根据分布
Figure PCTCN2022094741-appb-000032
从输入的文本中通过抄录新词的方式来生成考题,其中α是输入文本的关注度;定义一个平衡因子
Figure PCTCN2022094741-appb-000033
通过对该分布进行抽样来就能逐个生成考题的单词,其中k是平衡因子,f(·)是带S型激活函数Sigmoid的前馈神经网络,y t-1表示第t-1步骤中生成的单词的分布式嵌入表示向量;为了避免语义漂移问题,即答案与提问不一致的问题,使用答案编码
Figure PCTCN2022094741-appb-000034
来初始化解码器;还引入了一个特殊的标记<eos>来指示生成过程的终止时机:
Figure PCTCN2022094741-appb-000035
进一步地,所述步骤S3中,用监督学习来训练图引导的提问生成模型,即 通过最大化对数似然概率来学习出最优的模型参数,参考公式(5),其中
Figure PCTCN2022094741-appb-000036
表示含有N个样本的训练集合,K表示考题提问中单词的规模,通过有监督的教师指导学习模式,促使模型生成的结果逼近样本中的人工标注:
Figure PCTCN2022094741-appb-000037
但这种方式却不能保证生成结果是常识可推理的考题,为了解决这个问题,本发明引入一系列的语言学知识作为正则化约束来规范结果的输出概率分布,正则化是通过满足约束的期望分布d(·)和模型的输出分布p θ(·)之间的KL散度来实现;采用了混合目标将监督损失和后验损失融合为公式(6),其中
Figure PCTCN2022094741-appb-000038
是形式为
Figure PCTCN2022094741-appb-000039
的约束集合;φ(·)是以b为界的约束特征函数;a,c,y分别表示答案、文本段落和提问;λ是权衡置信度的参数:
Figure PCTCN2022094741-appb-000040
Figure PCTCN2022094741-appb-000041
由于上述优化目标是凸函数,所以它有一个闭式解,参考公式(7),其中Z是归一化因子,δ是正则化因子,f j(a,c,y)=b jj(a,c,q)表示约束函数;也就是说,当(a,c,y)满足约束时,f j(·)>0;最优分布d *(·)不仅接近于从标注的训练数据中学习的分布p θ(·),而且满足大多数的约束条件,这种后验正则化可灵活地将离散的约束知识注入到连续的模型中:
Figure PCTCN2022094741-appb-000042
进一步地,所述步骤S3中,设计常识可答性、内容关联一致性和表述语法准确性三种约束函数来提高模型的生成质量:
常识可答性约束函数的构建过程是:通过匹配相似度来抽取与提问最相关的段落句子,利用Spacy工具包从答案和所抽取的句子中提取如动词、名词短语实体,并将提问的结果分组成对,通过引入可学习参数ω 1,本实体约束知识表示为
Figure PCTCN2022094741-appb-000043
如果第k th个实体对在语义上相似,则
Figure PCTCN2022094741-appb-000044
反之亦然,
Figure PCTCN2022094741-appb-000045
是从注意力网络中获得的每个实体对的权重,其中网络参数是ω 1;通过这种表示,当答案和提问相关的句子之 间没有相似的实体,则f 1(·)为正;否则为负;
内容关联一致性约束函数的构建过程是:采用数据驱动的分类器,f 2(a,c,y)=F(v c,v y;ω 2),其中v c表示从段落c中提取的实体,v y是从提问y中提取的实体,ω 2是参数,当这两对实体是语义相似的时候,函数F输出正值,否则输出负值;通过参考评测数据集的性能排行榜,选择了目前性能最好的问答模型Unicorn模型来预测答案,通过对答案不一致的样本进行惩罚,就能促使模型生成答案一致的结果,包括
Figure PCTCN2022094741-appb-000046
其中
Figure PCTCN2022094741-appb-000047
是验证预测结果与标注答案一致性的判断函数;
表述语法准确性约束函数的构建过程是:通过计算语言模型的困惑度来衡量生成结果的通顺程度,
Figure PCTCN2022094741-appb-000048
其中P LM是基于预训练的Roberta语言模型,K是提问词的数量;对于同一个答案,生成的结果往往在语义结构和语法模式方面与标注的结果相似,利用单词移动距离WMD来度量两段文本之间的语义相似度,f 5(a,c,y)=WMD(y,y *)/Length(y),其中长度Length(·)是归一化函数,y *是标注的结果;还通过依存句法树(DPTS)来计算语法结构的相似度,采用关注度向量树核(ACVT)来计算两棵句法分析树之间的共同子结构数量,从而计算出语法的相关度,f 6(a,c,y)=DPTS ACVT(y,y *)。
进一步地,所述步骤S3中,公式(6)中的KL散度优化目标函数看作是一个知识提炼问题,即将知识从含有约束的教师模型d转移到学生的提问生成模型p θ中,因此,这个目标函数使用期望-最大化EM算法来解决;在第t个期望计算步骤中,通过公式
Figure PCTCN2022094741-appb-000049
计算教师模型的概率分布d;随后,计算最大期望,即通过公式(8)来更新学生模型的概率分布θ来逼近教师模型的概率分布d,其中
Figure PCTCN2022094741-appb-000050
是权衡因子,o是标注的提问概率分布,E是期望的积累误差,这个精馏目标可在模拟d的软预测和预测真实结果之间取得平衡;
Figure PCTCN2022094741-appb-000051
除了学生的提问生成模型θ之外,还需要学习约束f的参数
Figure PCTCN2022094741-appb-000052
及其置信度
Figure PCTCN2022094741-appb-000053
公式(6)中的目标
Figure PCTCN2022094741-appb-000054
发现当y是标注结果的时候,约束期望h(a,c,y;ω)=exp{δ·∑ lλ lf l(a,c,y;ω l)}应该更大;h(·)看成是指示结果质量的似然函数,这使得目标类似于对应模型的变分下限,这就通过基于均方误差损失MSE的标注结果h *来训练出ω,参考公式(9),此外,根据教师模型的概率分布d 来学习约束置信度λ,参考公式(10):
Figure PCTCN2022094741-appb-000055
Figure PCTCN2022094741-appb-000056
与现有技术相比,本发明技术方案的有益效果是:
本发明发现句子间的上下文关联有助于把分散在句子中的推理线索连接起来,借助文本的上下文,从答案反推出推理的线索链,并在这条链中融入了各类潜在的常识知识。然后利用线索链引导生成可推理的考题;此外,本发明还把各类语法和语义知识作为后验约束融入模型,进而生成语法正确和内容一致的考题;通过掌握潜在常识实体和关系等先验推理过程来产生逻辑上合理的考题;把这些约束作为后验知识,通过正则化将各种语言知识灵活地整合到生成模型中,让结果更通顺,一致性更好,可答性也更强。
附图说明
图1为需要常识推理的考题示例图;
图2为本发明系统框图;
图3为本方明方法流程图。
具体实施方式
附图仅用于示例性说明,不能理解为对本专利的限制;
为了更好说明本实施例,附图某些部件会有省略、放大或缩小,并不代表实际产品的尺寸;
对于本领域技术人员来说,附图中某些公知结构及其说明可能省略是理解的。
下面结合附图和实施例对本发明的技术方案做进一步的说明。
实施例1
如图2所示,一种基于常识推理的阅读型考题生成系统,其特征在于,包括:
推理线索图抽取模块,用于从给定的答案开始,根据文本上下文反推出推理线索图:识别输入系统文本的所有实体和关系,利用上下文依赖关联来构建实体图;从实体图中选择发问内容,同时从外部常识库中检索相关实体以扩充实体图来形成推理线索图;
图引导的提问生成模型模块,用于结合所有文本和实体图,以及实体图中的 多跳和常识知识来生成高质量的考题;
语言知识的后验约束学习模块,用于对图引导的提问生成模型进行训练学习出最优的模型参数。
实施例2
如图3所示,一种基于常识推理的阅读型考题生成方法,包括以下步骤:
S1:推理线索图抽取模块从给定的答案开始,根据文本上下文反推出推理线索图:识别输入系统文本的所有实体和关系,利用上下文依赖关联来构建实体图;从实体图中选择发问内容,同时从外部常识库中检索相关实体以扩充实体图来形成推理线索图;
S2:图引导的提问生成模型模块结合所有文本和实体图,以及实体图中的多跳和常识知识来生成高质量的考题;
S3:语言知识的后验约束学习模块对图引导的提问生成模型进行训练学习出最优的模型参数。
步骤S1中,推理线索图抽取模块将每个输入文本句子表示成解析树,每个解析树节点包含若干实体和边,其中边表示上下文关系,过滤掉每个节点中标点符号和停用词,聚合解析树中的等价节点以及共指节点;在相邻句子中的相似节点之间增加树间的连接边,获得具有潜在线索的实体图。
步骤S1中,从实体图中选择发问内容的过程是:
标记两种与答案相关的句子,从实体图中删除其余的句子:
通过单词精确匹配来识别含有答案关键字的句子或者使用分类器基于Rouge-L的相似度量指标来识别出的相关句子,其中关键词是对答案过滤停用词后剩余的词。
步骤S1中,从外部常识库中检索相关实体以扩充实体图的方式是:使用输入文本的实体作为查询条件,通过单词匹配从外部开源的常识库中检索出相关实体。
步骤S2中,设计图增强的编码器来结合所有文本和实体图,完成融合文本和图的异构特征,该图增强的编码器由六层级联组成,每一层都包括:
文本自注意子层:负责对文本内容进行编码,对给定的文本输入表征向量进行非线性变换,以获得一个新的向量作为下一子层的输入:
给定文本输入X=(x 1,…x T),其中x是每个单词的分布式嵌入表示,每个子 层通过线性变换得到三个向量,包括键向量k i=x iW K、值向量v i=x iW V和查询向量q i=x iW Q,其中W K、W V和W Q是可学习的矩阵参数;然后,通过点积计算查询向量与键向量之间的交互分数r ij,即
Figure PCTCN2022094741-appb-000057
通过逻辑回归函数Softmax对分数进行归一化,并通过公式(1)计算出注意力系数α ij,其中d x表示键向量的维度;通过对关注度和值向量的加权求和,就能得到上下文感知输出z i;第一个子层通过输入文本的表示向量来初始化,其中对每个文本的单词从预训练向量库中检索出对应的向量,把每个单词的向量组成一个向量来表示文本;收集最后一层的输出
Figure PCTCN2022094741-appb-000058
Figure PCTCN2022094741-appb-000059
来分别表示输入的文本段落和答案:
Figure PCTCN2022094741-appb-000060
Figure PCTCN2022094741-appb-000061
图注意力子层:考虑到图的节点包含多个单词,通过
Figure PCTCN2022094741-appb-000062
表征每个节点,其中
Figure PCTCN2022094741-appb-000063
是第i个节点的第j个词的分布式嵌入表示,m和n分别是节点中文本片段的开始和结束位置,
Figure PCTCN2022094741-appb-000064
表示节点的注意力分布,用于表示节点的重要性;将
Figure PCTCN2022094741-appb-000065
定义为softmax(ReLU(W R[G;w j])),其中g i是矩阵G的第i列,W R是可训练的参数;随后,通过加权聚合相邻节点的相关语义来丰富节点的上下文表征,其中权重由注意力机制来动态确定;为了获得这种结构上下文信息,本发明通过计算相邻节点i和j之间的点积来获得边的相关度分数,
Figure PCTCN2022094741-appb-000066
其中,τ ij表示节点的关系,这种关系从对应的关系类型中学习获得,
Figure PCTCN2022094741-appb-000067
Figure PCTCN2022094741-appb-000068
是可训练的参数;通过归一化节点所有连通边的相关分数,就能够计算每个节点的关注度
Figure PCTCN2022094741-appb-000069
参考公式(2),其中
Figure PCTCN2022094741-appb-000070
表示节点i的临近节点;通过对关注度进行加权求和,得到图结构感知的输出
Figure PCTCN2022094741-appb-000071
其中d x是键向量的维度大小,
Figure PCTCN2022094741-appb-000072
Figure PCTCN2022094741-appb-000073
表示可学习的参数:
Figure PCTCN2022094741-appb-000074
Figure PCTCN2022094741-appb-000075
前馈子层:将文本向量z i和图向量
Figure PCTCN2022094741-appb-000076
融合在一起,由于拼接会引入大量的噪声,使用门控机制来获得显著的特征并降低噪声,如公式(3),其中⊙表示元素乘法,f是一个融合向量,η是一个可学习的门控,用于选择性地控制不同角度的特征:
Figure PCTCN2022094741-appb-000077
z i=η⊙f+(1-η)⊙z i
Figure PCTCN2022094741-appb-000078
此外,为了加强模型的泛化能力,引入了一个非线性变换,该变换通过带RELU激活函数的双层多层感知器MLP来实现,h i=max(0,z iW 1+b 1)W 2+b 2,其中W 1、W 2、b 1和b 2是可训练的参数,通过该变换获得的输出编码h i被视为下一层的输入;经过多层的运算来获得最终的表征。
步骤S2中,图增强的编码器最后一层的输出矩阵h看成是融合了所有输入内容的表征向量,采用另一种转换器来做自回归解码生成考题;考题是基于概率
Figure PCTCN2022094741-appb-000079
逐个字地生成的,与图增强的编码器类似,解码器由多个层级联组成,在第t步,第l th层的输出
Figure PCTCN2022094741-appb-000080
会考虑了两个部分,包括来自于前一步骤
Figure PCTCN2022094741-appb-000081
中的解码结果自注意力,和来自编码内容
Figure PCTCN2022094741-appb-000082
的注意力表示,即
Figure PCTCN2022094741-appb-000083
这两个部分通过一个多头的注意力层融合表示为
Figure PCTCN2022094741-appb-000084
在经过多个层的非线性变换后,就能得到输出向量
Figure PCTCN2022094741-appb-000085
通过逻辑回归函数Softmax在预定义的词汇表V上对该向量进行归一化,就能获得单词的输出概率
Figure PCTCN2022094741-appb-000086
其中W o和b o表示可训练的参数;
为了解决词库外新词的生成,采用复制机制根据分布
Figure PCTCN2022094741-appb-000087
从输入的文本中通过抄录新词的方式来生成考题,其中α是输入文本的关注度;定义一个平衡因子
Figure PCTCN2022094741-appb-000088
通过对该分布进行抽样来就能逐个生成考题的单词,其中k是平衡因子,f(·)是带S型激活函数Sigmoid的前馈神经网络,y t-1表示第t-1步骤中生成的单词的分布式嵌入表示向量;为了避免语义漂移问题,即答案与提问不一致的问题,使用答案编码
Figure PCTCN2022094741-appb-000089
来初始化解码器;还引入了一个特殊的标记<eos>来指示生成过程的终止时机:
Figure PCTCN2022094741-appb-000090
步骤S3中,用监督学习来训练图引导的提问生成模型,即通过最大化对数似然概率来学习出最优的模型参数,参考公式(5),其中
Figure PCTCN2022094741-appb-000091
表示含有N个样本的训练集合,K表示考题提问中单词的规模,通过有监督的教师指导学习模式,促使模型生成的结果逼近样本中的人工标注:
Figure PCTCN2022094741-appb-000092
但这种方式却不能保证生成结果是常识推理考题,引入一系列的语言学知识作为附加约束来规范结果的输出概率分布,正则化是通过满足约束的期望分布d(·)和模型的输出分布p θ(·)之间的KL散度来实现;采用了混合目标将监督损失和后验损失融合为公式(6),其中
Figure PCTCN2022094741-appb-000093
是形式为
Figure PCTCN2022094741-appb-000094
的约束集合;φ(·)是以b为界的约束特征函数;a,c,y分别表示答案、文本段落和提问;λ是权衡置信度的参数:
Figure PCTCN2022094741-appb-000095
Figure PCTCN2022094741-appb-000096
由于上述优化目标是凸函数,所以它有一个闭式解,参考公式(7),其中Z是归一化因子,δ是正则化因子,f j(a,c,y)=b jj(a,c,q)表示约束函数;也就是说,当(a,c,y)满足约束时,f j(·)>0;最优分布d *(·)不仅接近于从标注的训练数据中学习的分布p θ(·),而且满足大多数的约束条件,这种后验正则化可灵活地将离散的约束知识注入到连续的模型中:
Figure PCTCN2022094741-appb-000097
步骤S3中,设计常识可答性、内容关联一致性和表述语法准确性三种约束函数来提高模型的生成质量:
常识可答性约束函数的构建过程是:通过匹配相似度来抽取与提问最相关的段落句子,利用Spacy工具包从答案和所抽取的句子中提取如动词、名词短语实体,并将提问的结果分组成对,通过引入可学习参数ω 1,本实体约束知识表示为
Figure PCTCN2022094741-appb-000098
如果第k th个实体对在语义上相似,则
Figure PCTCN2022094741-appb-000099
反之亦然,
Figure PCTCN2022094741-appb-000100
是从注意力网络中获得的每个实 体对的权重,其中网络参数是ω 1;通过这种表示,当答案和提问相关的句子之间没有相似的实体,则f 1(·)为正;否则为负;
内容关联一致性约束函数的构建过程是:采用数据驱动的分类器,f 2(a,c,y)=F(v c,v y;ω 2),其中v c表示从段落c中提取的实体,v y是从提问y中提取的实体,ω 2是参数,当这两对实体是语义相似的时候,函数F输出正值,否则输出负值;通过参考评测数据集的性能排行榜,选择了目前性能最好的问答模型Unicorn模型来预测答案,通过对答案不一致的样本进行惩罚,就能促使模型生成答案一致的结果,包括
Figure PCTCN2022094741-appb-000101
其中
Figure PCTCN2022094741-appb-000102
是验证预测结果与标注答案一致性的判断函数;
表述语法准确性约束函数的构建过程是:通过计算语言模型的困惑度来衡量生成结果的通顺程度,
Figure PCTCN2022094741-appb-000103
其中P LM是基于预训练的Roberta语言模型,K是提问词的数量;对于同一个答案,生成的结果往往在语义结构和语法模式方面与标注的结果相似,利用单词移动距离WMD来度量两段文本之间的语义相似度,f 5(a,c,y)=WMD(y,y *)/Length(y),其中长度Length(·)是归一化函数,y *是标注的结果;还通过依存句法树(DPTS)来计算语法结构的相似度,采用关注度向量树核(ACVT)来计算两棵句法分析树之间的共同子结构数量,从而计算出语法的相关度,f 6(a,c,y)=DPTS ACVT(y,y *)。
步骤S3中,公式(6)中的KL散度优化目标函数看作是一个知识提炼问题,即将知识从含有约束的教师模型d转移到学生的提问生成模型p θ中,因此,这个目标函数使用期望-最大化EM算法来解决;在第t个期望计算步骤中,通过公式
Figure PCTCN2022094741-appb-000104
计算教师模型的概率分布d;随后,计算最大期望,即通过公式(8)来更新学生模型的概率分布θ来逼近教师模型的概率分布d,其中
Figure PCTCN2022094741-appb-000105
是权衡因子,o是标注的提问概率分布,E是期望的积累误差,这个精馏目标可在模拟d的软预测和预测真实结果之间取得平衡;
Figure PCTCN2022094741-appb-000106
除了学生的提问生成模型θ之外,还需要学习约束f的参数
Figure PCTCN2022094741-appb-000107
及其置信度
Figure PCTCN2022094741-appb-000108
公式(6)中的目标
Figure PCTCN2022094741-appb-000109
发现当y是标注结果的时候,约束期望h(a,c,y;ω)=exp{δ·∑ lλ lf l(a,c,y;ω l)}应该更大;h(·)看成是指示结果质量的似然函数,这使得目标类似于对应模型的变分下限,这就通过基于均方误差损失 MSE的标注结果h *来训练出ω,参考公式(9),此外,根据教师模型的概率分布d来学习约束置信度λ,参考公式(10):
Figure PCTCN2022094741-appb-000110
Figure PCTCN2022094741-appb-000111
实施例3
如图2所示,一种基于常识推理的阅读型考题生成系统,其特征在于,包括:
推理线索图抽取模块,用于从给定的答案开始,根据文本上下文反推出推理线索图:识别输入系统文本的所有实体和关系,利用上下文依赖关联来构建实体图;从实体图中选择发问内容,同时从外部常识库中检索相关实体以扩充实体图来形成推理线索图;
图引导的提问生成模型模块,用于结合所有文本和实体图,以及实体图中的多跳和常识知识来生成高质量的考题;
语言知识的后验约束学习模块,用于对图引导的提问生成模型进行训练学习出最优的模型参数。
如图3所示,一种应用在上述基于常识推理的阅读型考题生成系统的方法,包括以下步骤:
S1:推理线索图抽取模块从给定的答案开始,根据文本上下文反推出推理线索图:识别输入系统文本的所有实体和关系,利用上下文依赖关联来构建实体图;从实体图中选择发问内容,同时从外部常识库中检索相关实体以扩充实体图来形成推理线索图;
S2:图引导的提问生成模型模块结合所有文本和实体图,以及实体图中的多跳和常识知识来生成高质量的考题;
S3:语言知识的后验约束学习模块对图引导的提问生成模型进行训练学习出最优的模型参数。
步骤S1中,推理线索图抽取模块将每个输入文本句子表示成解析树,每个解析树节点包含若干实体和边,其中边表示上下文关系,过滤掉每个节点中标点符号和停用词,聚合解析树中的等价节点以及共指节点;在相邻句子中的相似节点之间增加树间的连接边,获得具有潜在线索的实体图。
步骤S1中,从实体图中选择发问内容的过程是:
标记两种与答案相关的句子,从实体图中删除其余的句子:
通过单词精确匹配来识别含有答案关键字的句子或者使用分类器基于Rouge-L的相似度量指标来识别出的相关句子,其中关键词是对答案过滤停用词后剩余的词。
步骤S1中,从外部常识库中检索相关实体以扩充实体图的方式是:使用输入文本的实体作为查询条件,通过单词匹配从外部开源的常识库中检索出相关实体。
步骤S2中,设计图增强的编码器来结合所有文本和实体图,完成融合文本和图的异构特征,该图增强的编码器由六层级联组成,每一层都包括:
文本自注意子层:负责对文本内容进行编码,对给定的文本输入表征向量进行非线性变换,以获得一个新的向量作为下一子层的输入:
给定文本输入X=(x 1,…x T),其中x是每个单词的分布式嵌入表示,每个子层通过线性变换得到三个向量,包括键向量k i=x iW K、值向量v i=x iW V和查询向量q i=x iW Q,其中W K、W V和W Q是可学习的矩阵参数;然后,通过点积计算查询向量与键向量之间的交互分数r ij,即
Figure PCTCN2022094741-appb-000112
通过逻辑回归函数Softmax对分数进行归一化,并通过公式(1)计算出注意力系数α ij,其中d x表示键向量的维度;通过对关注度和值向量的加权求和,就能得到上下文感知输出z i;第一个子层通过输入文本的表示向量来初始化,其中对每个文本的单词从预训练向量库中检索出对应的向量,把每个单词的向量组成一个向量来表示文本;收集最后一层的输出
Figure PCTCN2022094741-appb-000113
Figure PCTCN2022094741-appb-000114
来分别表示输入的文本段落和答案:
Figure PCTCN2022094741-appb-000115
Figure PCTCN2022094741-appb-000116
图注意力子层:考虑到图的节点包含多个单词,通过
Figure PCTCN2022094741-appb-000117
表征每个节点,其中
Figure PCTCN2022094741-appb-000118
是第i个节点的第j个词的分布式嵌入表示,m和n分别是节点中文本片段的开始和结束位置,
Figure PCTCN2022094741-appb-000119
表示节点的注意力分布,用于表示节点的重 要性;将
Figure PCTCN2022094741-appb-000120
定义为softmax(ReLU(W R[G;w j])),其中g i是矩阵G的第i列,W R是可训练的参数;随后,通过加权聚合相邻节点的相关语义来丰富节点的上下文表征,其中权重由注意力机制来动态确定;为了获得这种结构上下文信息,本发明通过计算相邻节点i和j之间的点积来获得边的相关度分数,
Figure PCTCN2022094741-appb-000121
其中,τ ij表示节点的关系,这种关系从对应的关系类型中学习获得,
Figure PCTCN2022094741-appb-000122
Figure PCTCN2022094741-appb-000123
是可训练的参数;通过归一化节点所有连通边的相关分数,就能够计算每个节点的关注度
Figure PCTCN2022094741-appb-000124
参考公式(2),其中
Figure PCTCN2022094741-appb-000125
表示节点i的临近节点;通过对关注度进行加权求和,得到图结构感知的输出
Figure PCTCN2022094741-appb-000126
其中d x是键向量的维度大小,
Figure PCTCN2022094741-appb-000127
Figure PCTCN2022094741-appb-000128
表示可学习的参数:
Figure PCTCN2022094741-appb-000129
Figure PCTCN2022094741-appb-000130
前馈子层:将文本向量z i和图向量
Figure PCTCN2022094741-appb-000131
融合在一起,由于拼接会引入大量的噪声,使用门控机制来获得显著的特征并降低噪声,如公式(3),其中⊙表示元素乘法,f是一个融合向量,η是一个可学习的门控,用于选择性地控制不同角度的特征:
Figure PCTCN2022094741-appb-000132
z i=η⊙f+(1-η)⊙z i
Figure PCTCN2022094741-appb-000133
此外,为了加强模型的泛化能力,引入了一个非线性变换,该变换通过带RELU激活函数的双层多层感知器MLP来实现,h i=max(0,z iW 1+b 1)W 2+b 2,其中W 1、W 2、b 1和b 2是可训练的参数,通过该变换获得的输出编码h i被视为下一层的输入;经过多层的运算来获得最终的表征。
步骤S2中,图增强的编码器最后一层的输出矩阵h看成是融合了所有输入内容的表征向量,采用另一种转换器来做自回归解码生成考题;考题是基于概率
Figure PCTCN2022094741-appb-000134
逐个字地生成的,与图增强的编码器类似,解码器由多个层级联组成,在第t步,第l th层的输出
Figure PCTCN2022094741-appb-000135
会考虑了两个部分,包括来自于前一步骤
Figure PCTCN2022094741-appb-000136
中的解码结果自注意力,和来自编码内容
Figure PCTCN2022094741-appb-000137
的注意力表示,即
Figure PCTCN2022094741-appb-000138
这两个部分通过一个多头的注意力层融合表示为
Figure PCTCN2022094741-appb-000139
在经过多个层的非线性变换后,就能得到输出向量
Figure PCTCN2022094741-appb-000140
通过逻辑回归函数Softmax在预定义的词汇表V上对该向量进行归一化,就能获得单词的输出概率
Figure PCTCN2022094741-appb-000141
其中W o和b o表示可训练的参数;
为了解决词库外新词的生成,采用复制机制根据分布
Figure PCTCN2022094741-appb-000142
从输入的文本中通过抄录新词的方式来生成考题,其中α是输入文本的关注度;定义一个平衡因子
Figure PCTCN2022094741-appb-000143
通过对该分布进行抽样来就能逐个生成考题的单词,其中k是平衡因子,f(·)是带S型激活函数Sigmoid的前馈神经网络,y t-1表示第t-1步骤中生成的单词的分布式嵌入表示向量;为了避免语义漂移问题,即答案与提问不一致的问题,使用答案编码
Figure PCTCN2022094741-appb-000144
来初始化解码器;还引入了一个特殊的标记<eos>来指示生成过程的终止时机:
Figure PCTCN2022094741-appb-000145
步骤S3中,用监督学习来训练图引导的提问生成模型,即通过最大化对数似然概率来学习出最优的模型参数,参考公式(5),其中
Figure PCTCN2022094741-appb-000146
表示含有N个样本的训练集合,K表示考题提问中单词的规模,通过有监督的教师指导学习模式,促使模型生成的结果逼近样本中的人工标注:
Figure PCTCN2022094741-appb-000147
但这种方式却不能保证生成结果是常识推理考题,引入一系列的语言学知识作为附加约束来规范结果的输出概率分布,正则化是通过满足约束的期望分布d(·)和模型的输出分布p θ(·)之间的KL散度来实现;采用了混合目标将监督损失和后验损失融合为公式(6),其中
Figure PCTCN2022094741-appb-000148
是形式为
Figure PCTCN2022094741-appb-000149
的约束集合;φ(·)是以b为界的约束特征函数;a,c,y分别表示答案、文本段落和提问;λ是权衡置信度的参数:
Figure PCTCN2022094741-appb-000150
Figure PCTCN2022094741-appb-000151
由于上述优化目标是凸函数,所以它有一个闭式解,参考公式(7),其中Z是归一化因子,δ是正则化因子,f j(a,c,y)=b jj(a,c,q)表示约束函数;也就 是说,当(a,c,y)满足约束时,f j(·)>0;最优分布d *(·)不仅接近于从标注的训练数据中学习的分布p θ(·),而且满足大多数的约束条件,这种后验正则化可灵活地将离散的约束知识注入到连续的模型中:
Figure PCTCN2022094741-appb-000152
步骤S3中,设计常识可答性、内容关联一致性和表述语法准确性三种约束函数来提高模型的生成质量:
常识可答性约束函数的构建过程是:通过匹配相似度来抽取与提问最相关的段落句子,利用Spacy工具包从答案和所抽取的句子中提取如动词、名词短语实体,并将提问的结果分组成对,通过引入可学习参数ω 1,本实体约束知识表示为
Figure PCTCN2022094741-appb-000153
如果第k th个实体对在语义上相似,则
Figure PCTCN2022094741-appb-000154
反之亦然,
Figure PCTCN2022094741-appb-000155
是从注意力网络中获得的每个实体对的权重,其中网络参数是ω 1;通过这种表示,当答案和提问相关的句子之间没有相似的实体,则f 1(·)为正;否则为负;
内容关联一致性约束函数的构建过程是:采用数据驱动的分类器,f 2(a,c,y)=F(v c,v y;ω 2),其中v c表示从段落c中提取的实体,v y是从提问y中提取的实体,ω 2是参数,当这两对实体是语义相似的时候,函数F输出正值,否则输出负值;通过参考评测数据集的性能排行榜,选择了目前性能最好的问答模型Unicorn模型来预测答案,通过对答案不一致的样本进行惩罚,就能促使模型生成答案一致的结果,包括
Figure PCTCN2022094741-appb-000156
其中
Figure PCTCN2022094741-appb-000157
是验证预测结果与标注答案一致性的判断函数;
表述语法准确性约束函数的构建过程是:通过计算语言模型的困惑度来衡量生成结果的通顺程度,
Figure PCTCN2022094741-appb-000158
其中P LM是基于预训练的Roberta语言模型,K是提问词的数量;对于同一个答案,生成的结果往往在语义结构和语法模式方面与标注的结果相似,利用单词移动距离WMD来度量两段文本之间的语义相似度,f 5(a,c,y)=WMD(y,y *)/Length(y),其中长度Length(·)是归一化函数,y *是标注的结果;还通过依存句法树(DPTS)来计算语法结构的相似度,采用关注度向量树核(ACVT)来计算两棵句法分析树之间的共同子结构数量,从而计算出语法的相关度,f 6(a,c,y)=DPTS ACVT(y,y *)。
步骤S3中,公式(6)中的KL散度优化目标函数看作是一个知识提炼问题,即将知识从含有约束的教师模型d转移到学生的提问生成模型p θ中,因此,这个目标函数使用期望-最大化EM算法来解决;在第t个期望计算步骤中,通过公式
Figure PCTCN2022094741-appb-000159
计算教师模型的概率分布d;随后,计算最大期望,即通过公式(8)来更新学生模型的概率分布θ来逼近教师模型的概率分布d,其中
Figure PCTCN2022094741-appb-000160
是权衡因子,o是标注的提问概率分布,E是期望的积累误差,这个精馏目标可在模拟d的软预测和预测真实结果之间取得平衡;
Figure PCTCN2022094741-appb-000161
除了学生的提问生成模型θ之外,还需要学习约束f的参数
Figure PCTCN2022094741-appb-000162
及其置信度
Figure PCTCN2022094741-appb-000163
公式(6)中的目标
Figure PCTCN2022094741-appb-000164
发现当y是标注结果的时候,约束期望h(a,c,y;ω)=exp{δ·∑ lλ lf l(a,c,y;ω l)}应该更大;h(·)看成是指示结果质量的似然函数,这使得目标类似于对应模型的变分下限,这就通过基于均方误差损失MSE的标注结果h *来训练出ω,参考公式(9),此外,根据教师模型的概率分布d来学习约束置信度λ,参考公式(10):
Figure PCTCN2022094741-appb-000165
Figure PCTCN2022094741-appb-000166
相同或相似的标号对应相同或相似的部件;
附图中描述位置关系的用于仅用于示例性说明,不能理解为对本专利的限制;
显然,本发明的上述实施例仅仅是为清楚地说明本发明所作的举例,而并非是对本发明的实施方式的限定。对于所属领域的普通技术人员来说,在上述说明的基础上还做出其它不同形式的变化或变动。这里无需也无法对所有的实施方式予以穷举。凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等,均应包含在本发明权利要求的保护范围之内。

Claims (10)

  1. 一种基于常识推理的阅读型考题生成系统,其特征在于,包括:
    推理线索图抽取模块,用于从给定的答案开始,根据文本上下文反推出推理线索图:识别输入系统文本的所有实体和关系,利用上下文依赖关联来构建实体图;从实体图中选择发问内容,同时从外部常识库中检索相关实体以扩充实体图来形成推理线索图;
    图引导的提问生成模型模块,用于结合所有文本和实体图,以及实体图中的多跳和常识知识来生成高质量的考题;
    语言知识的后验约束学习模块,用于对图引导的提问生成模型进行训练学习出最优的模型参数。
  2. 一种应用在如权利要求1所述的基于常识推理的阅读型考题生成系统中的方法,其特征在于,包括以下步骤:
    S1:推理线索图抽取模块从给定的答案开始,根据文本上下文反推出推理线索图:识别输入系统文本的所有实体和关系,利用上下文依赖关联来构建实体图;从实体图中选择发问内容,同时从外部常识库中检索相关实体以扩充实体图来形成推理线索图;
    S2:图引导的提问生成模型模块结合所有文本和实体图,以及实体图中的多跳和常识知识来生成高质量的考题;
    S3:语言知识的后验约束学习模块对图引导的提问生成模型进行训练学习出最优的模型参数。
  3. 根据权利要求2所述的基于常识推理的阅读型考题生成方法,其特征在于,所述步骤S1中,推理线索图抽取模块将每个输入文本句子表示成解析树,每个解析树节点包含若干实体和边,其中边表示上下文关系,过滤掉每个节点中标点符号和停用词,聚合解析树中的等价节点以及共指节点;在相邻句子中的相似节点之间增加树间的连接边,获得具有潜在线索的实体图。
  4. 根据权利要求3所述的基于常识推理的阅读型考题生成方法,其特征在于,所述步骤S1中,从实体图中选择发问内容的过程是:
    标记两种与答案相关的句子,从实体图中删除其余的句子:
    通过单词精确匹配来识别含有答案关键字的句子或者使用分类器基于 Rouge-L的相似度量指标来识别出的相关句子,其中关键词是对答案过滤停用词后剩余的词。
  5. 根据权利要求4所述的基于常识推理的阅读型考题生成方法,其特征在于,所述步骤S1中,从外部常识库中检索相关实体以扩充实体图的方式是:使用输入文本的实体作为查询条件,通过单词匹配从外部开源的常识库中检索出相关实体。
  6. 根据权利要求5所述的基于常识推理的阅读型考题生成方法,其特征在于,所述步骤S2中,设计图增强的编码器来结合所有文本和实体图,完成融合文本和图的异构特征,该图增强的编码器由六层级联组成,每一层都包括:
    文本自注意子层:负责对文本内容进行编码,对给定的文本输入表征向量进行非线性变换,以获得一个新的向量作为下一子层的输入:
    给定文本输入X=(x 1,…x T),其中x是每个单词的分布式嵌入表示,每个子层通过线性变换得到三个向量,包括键向量k i=x iW K、值向量v i=x iW V和查询向量q i=x iW Q,其中W K、W V和W Q是可学习的矩阵参数;然后,通过点积计算查询向量与键向量之间的交互分数r ij,即
    Figure PCTCN2022094741-appb-100001
    通过逻辑回归函数Softmax对分数进行归一化,并通过公式(1)计算出注意力系数α ij,其中d x表示键向量的维度;通过对关注度和值向量的加权求和,就能得到上下文感知输出z i;第一个子层通过输入文本的表示向量来初始化,其中对每个文本的单词从预训练向量库中检索出对应的向量,把每个单词的向量组成一个向量来表示文本;收集最后一层的输出
    Figure PCTCN2022094741-appb-100002
    Figure PCTCN2022094741-appb-100003
    来分别表示输入的文本段落和答案:
    Figure PCTCN2022094741-appb-100004
    Figure PCTCN2022094741-appb-100005
    图注意力子层:考虑到图的节点包含多个单词,通过
    Figure PCTCN2022094741-appb-100006
    表征每个节点,其中
    Figure PCTCN2022094741-appb-100007
    是第i个节点的第j个词的分布式嵌入表示,m和n分别是节点中文本片段的开始和结束位置,
    Figure PCTCN2022094741-appb-100008
    表示节点的注意力分布,用于表示节点的重要性;将
    Figure PCTCN2022094741-appb-100009
    定义为softmax(ReLU(W R[G;w j])),其中g i是矩阵G的第i列,W R是 可训练的参数;随后,通过加权聚合相邻节点的相关语义来丰富节点的上下文表征,其中权重由注意力机制来动态确定;为了获得这种结构上下文信息,本发明通过计算相邻节点i和j之间的点积来获得边的相关度分数,
    Figure PCTCN2022094741-appb-100010
    其中,τ ij表示节点的关系,这种关系从对应的关系类型中学习获得,
    Figure PCTCN2022094741-appb-100011
    Figure PCTCN2022094741-appb-100012
    是可训练的参数;通过归一化节点所有连通边的相关分数,就能够计算每个节点的关注度
    Figure PCTCN2022094741-appb-100013
    参考公式(2),其中
    Figure PCTCN2022094741-appb-100014
    表示节点i的临近节点;通过对关注度进行加权求和,得到图结构感知的输出
    Figure PCTCN2022094741-appb-100015
    其中d x是键向量的维度大小,
    Figure PCTCN2022094741-appb-100016
    Figure PCTCN2022094741-appb-100017
    表示可学习的参数:
    Figure PCTCN2022094741-appb-100018
    Figure PCTCN2022094741-appb-100019
    前馈子层:将文本向量z i和图向量
    Figure PCTCN2022094741-appb-100020
    融合在一起,由于拼接会引入大量的噪声,使用门控机制来获得显著的特征并降低噪声,如公式(3),其中⊙表示元素乘法,f是一个融合向量,η是一个可学习的门控,用于选择性地控制不同角度的特征:
    Figure PCTCN2022094741-appb-100021
    z i=η⊙f+(1-η)⊙z i
    Figure PCTCN2022094741-appb-100022
    此外,为了加强模型的泛化能力,引入了一个非线性变换,该变换通过带RELU激活函数的双层多层感知器MLP来实现,h i=max(0,z iW 1+b 1)W 2+b 2,其中W 1、W 2、b 1和b 2是可训练的参数,通过该变换获得的输出编码h i被视为下一层的输入;经过多层的运算来获得最终的表征。
  7. 根据权利要求6所述的基于常识推理的阅读型考题生成方法,其特征在于,所述步骤S2中,图增强的编码器最后一层的输出矩阵h看成是融合了所有输入内容的表征向量,采用另一种转换器来做自回归解码生成考题;考题是基于概率
    Figure PCTCN2022094741-appb-100023
    逐个字地生成的,与图增强的编码器类似,解码器由多个层级联组成,在第t步,第l th层的输出
    Figure PCTCN2022094741-appb-100024
    会考虑了两个部分,包括来自于前一步骤
    Figure PCTCN2022094741-appb-100025
    中的解码结果自注意力,和来自编码内容
    Figure PCTCN2022094741-appb-100026
    的注意力表示,即
    Figure PCTCN2022094741-appb-100027
    这两个部分通过一个多头的注意力层融合表示为
    Figure PCTCN2022094741-appb-100028
    在经过多个层的非线性变换后,就能得到输出向量
    Figure PCTCN2022094741-appb-100029
    通过逻辑回归函数Softmax在预定义的词汇表V上对该向量进行归一化,就能获得单词的输出概率
    Figure PCTCN2022094741-appb-100030
    其中W o和b o表示可训练的参数;
    为了解决词库外新词的生成,采用复制机制根据分布
    Figure PCTCN2022094741-appb-100031
    从输入的文本中通过抄录新词的方式来生成考题,其中α是输入文本的关注度;定义一个平衡因子
    Figure PCTCN2022094741-appb-100032
    通过对该分布进行抽样来就能逐个生成考题的单词,其中k是平衡因子,f(·)是带S型激活函数Sigmoid的前馈神经网络,y t-1表示第t-1步骤中生成的单词的分布式嵌入表示向量;为了避免语义漂移问题,即答案与提问不一致的问题,使用答案编码
    Figure PCTCN2022094741-appb-100033
    来初始化解码器;还引入了一个特殊的标记<eos>来指示生成过程的终止时机:
    Figure PCTCN2022094741-appb-100034
  8. 根据权利要求7所述的基于常识推理的阅读型考题生成方法,其特征在于,所述步骤S3中,用监督学习来训练图引导的提问生成模型,即通过最大化对数似然概率来学习出最优的模型参数,参考公式(5),其中
    Figure PCTCN2022094741-appb-100035
    表示含有N个样本的训练集合,K表示考题提问中单词的规模,通过有监督的教师指导学习模式,促使模型生成的结果逼近样本中的人工标注:
    Figure PCTCN2022094741-appb-100036
    但这种方式却不能保证生成结果是常识可推理的考题。为了解决这个问题,本发明引入一系列的语言学知识作为正则化约束来规范结果的输出概率分布,其中正则化是通过满足约束的期望分布d(·)和模型的输出分布p θ(·)之间的KL散度来实现;采用了混合目标将监督损失和后验损失融合为公式(6),其中
    Figure PCTCN2022094741-appb-100037
    是形式为
    Figure PCTCN2022094741-appb-100038
    的约束集合;φ(·)是以b为界的约束特征函数;a,c,y分别表示答案、文本段落和提问;λ是权衡置信度的参数:
    Figure PCTCN2022094741-appb-100039
    Figure PCTCN2022094741-appb-100040
    由于上述优化目标是凸函数,所以它有一个闭式解,参考公式(7),其中Z 是归一化因子,δ是正则化因子,f j(a,c,y)=b jj(a,c,q)表示约束函数;也就是说,当(a,c,y)满足约束时,f j(·)>0;最优分布d *(·)不仅接近于从标注的训练数据中学习的分布p θ(·),而且满足大多数的约束条件,这种后验正则化可灵活地将离散的约束知识注入到连续的模型中:
    Figure PCTCN2022094741-appb-100041
  9. 根据权利要求8所述的基于常识推理的阅读型考题生成方法,其特征在于,所述步骤S3中,设计常识可答性、内容关联一致性和表述语法准确性三种约束函数来提高模型的生成质量:
    常识可答性约束函数的构建过程是:通过匹配相似度来抽取与提问最相关的段落句子,利用Spacy工具包从答案和所抽取的句子中提取如动词、名词短语实体,并将提问的结果分组成对,通过引入可学习参数ω 1,本实体约束知识表示为
    Figure PCTCN2022094741-appb-100042
    如果第k th个实体对在语义上相似,则
    Figure PCTCN2022094741-appb-100043
    反之亦然,
    Figure PCTCN2022094741-appb-100044
    是从注意力网络中获得的每个实体对的权重,其中网络参数是ω 1;通过这种表示,当答案和提问相关的句子之间没有相似的实体,则f 1(·)为正;否则为负;
    内容关联一致性约束函数的构建过程是:采用数据驱动的分类器,f 2(a,c,y)=F(v c,v y;ω 2),其中v c表示从段落c中提取的实体,v y是从提问y中提取的实体,ω 2是参数,当这两对实体是语义相似的时候,函数F输出正值,否则输出负值;通过参考评测数据集的性能排行榜,选择了目前性能最好的问答模型Unicorn模型来预测答案,通过对答案不一致的样本进行惩罚,就能促使模型生成答案一致的结果,包括
    Figure PCTCN2022094741-appb-100045
    其中
    Figure PCTCN2022094741-appb-100046
    是验证预测结果与标注答案一致性的判断函数;
    表述语法准确性约束函数的构建过程是:通过计算语言模型的困惑度来衡量生成结果的通顺程度,
    Figure PCTCN2022094741-appb-100047
    其中P LM是基于预训练的Roberta语言模型,K是提问词的数量;对于同一个答案,生成的结果往往在语义结构和语法模式方面与标注的结果相似,利用单词移动距离WMD来度量两段文本之间的语义相似度,f 5(a,c,y)=WMD(y,y *)/Length(y),其中长度Length(·)是归一化函数,y *是标注的结果;还通过依存句法树(DPTS) 来计算语法结构的相似度,采用关注度向量树核(ACVT)来计算两棵句法分析树之间的共同子结构数量,从而计算出语法的相关度,f 6(a,c,y)=DPTS ACVT(y,y *)。
  10. 根据权利要求9所述的基于常识推理的阅读型考题生成方法,其特征在于,所述步骤S3中,公式(6)中的KL散度优化目标函数看作是一个知识提炼问题,即将知识从含有约束的教师模型d转移到学生的提问生成模型p θ中,因此,这个目标函数使用期望-最大化EM算法来解决;在第t个期望计算步骤中,通过公式
    Figure PCTCN2022094741-appb-100048
    计算教师模型的概率分布d;随后,计算最大期望,即通过公式(8)来更新学生模型的概率分布θ来逼近教师模型的概率分布d,其中
    Figure PCTCN2022094741-appb-100049
    是权衡因子,o是标注的提问概率分布,E是期望的积累误差,这个精馏目标可在模拟d的软预测和预测真实结果之间取得平衡;
    Figure PCTCN2022094741-appb-100050
    除了学生的提问生成模型θ之外,还需要学习约束f的参数
    Figure PCTCN2022094741-appb-100051
    及其置信度
    Figure PCTCN2022094741-appb-100052
    公式(6)中的目标
    Figure PCTCN2022094741-appb-100053
    发现当y是标注结果的时候,约束期望h(a,c,y;ω)=exp{δ·∑ lλ lf l(a,c,y;ω l)}应该更大;h(·)看成是指示结果质量的似然函数,这使得目标类似于对应模型的变分下限,这就通过基于均方误差损失MSE的标注结果h *来训练出ω,参考公式(9),此外,根据教师模型的概率分布d来学习约束置信度λ,参考公式(10):
    Figure PCTCN2022094741-appb-100054
    Figure PCTCN2022094741-appb-100055
PCT/CN2022/094741 2022-05-24 2022-05-24 一种基于常识推理的阅读型考题生成系统及方法 WO2023225858A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/094741 WO2023225858A1 (zh) 2022-05-24 2022-05-24 一种基于常识推理的阅读型考题生成系统及方法

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/094741 WO2023225858A1 (zh) 2022-05-24 2022-05-24 一种基于常识推理的阅读型考题生成系统及方法

Publications (1)

Publication Number Publication Date
WO2023225858A1 true WO2023225858A1 (zh) 2023-11-30

Family

ID=88918231

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/094741 WO2023225858A1 (zh) 2022-05-24 2022-05-24 一种基于常识推理的阅读型考题生成系统及方法

Country Status (1)

Country Link
WO (1) WO2023225858A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117556381A (zh) * 2024-01-04 2024-02-13 华中师范大学 一种面向跨学科主观试题的知识水平深度挖掘方法及系统
CN117708336A (zh) * 2024-02-05 2024-03-15 南京邮电大学 一种基于主题增强和知识蒸馏的多策略情感分析方法
CN117743315A (zh) * 2024-02-20 2024-03-22 浪潮软件科技有限公司 一种为多模态大模型系统提供高质量数据的方法
CN117708336B (zh) * 2024-02-05 2024-04-19 南京邮电大学 一种基于主题增强和知识蒸馏的多策略情感分析方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109947912A (zh) * 2019-01-25 2019-06-28 四川大学 一种基于段落内部推理和联合问题答案匹配的模型方法
CN111078836A (zh) * 2019-12-10 2020-04-28 中国科学院自动化研究所 基于外部知识增强的机器阅读理解方法、系统、装置
CN111274800A (zh) * 2020-01-19 2020-06-12 浙江大学 基于关系图卷积网络的推理型阅读理解方法
CN112417104A (zh) * 2020-12-04 2021-02-26 山西大学 一种句法关系增强的机器阅读理解多跳推理模型及方法
US20210406669A1 (en) * 2020-06-25 2021-12-30 International Business Machines Corporation Learning neuro-symbolic multi-hop reasoning rules over text
WO2022036616A1 (zh) * 2020-08-20 2022-02-24 中山大学 一种基于低标注资源生成可推理问题的方法和装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109947912A (zh) * 2019-01-25 2019-06-28 四川大学 一种基于段落内部推理和联合问题答案匹配的模型方法
CN111078836A (zh) * 2019-12-10 2020-04-28 中国科学院自动化研究所 基于外部知识增强的机器阅读理解方法、系统、装置
CN111274800A (zh) * 2020-01-19 2020-06-12 浙江大学 基于关系图卷积网络的推理型阅读理解方法
US20210406669A1 (en) * 2020-06-25 2021-12-30 International Business Machines Corporation Learning neuro-symbolic multi-hop reasoning rules over text
WO2022036616A1 (zh) * 2020-08-20 2022-02-24 中山大学 一种基于低标注资源生成可推理问题的方法和装置
CN112417104A (zh) * 2020-12-04 2021-02-26 山西大学 一种句法关系增强的机器阅读理解多跳推理模型及方法

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
BAUER LISA, WANG YICHENG, BANSAL MOHIT: "Commonsense for Generative Multi-Hop Question Answering Tasks", PROCEEDINGS OF THE 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, STROUDSBURG, PA, USA, 1 January 2018 (2018-01-01), Stroudsburg, PA, USA, pages 4220 - 4230, XP093111788, DOI: 10.18653/v1/D18-1454 *
YU JIANXING, SU QINLIANG, QUAN XIAOJUN, YIN JIAN: "Multi-hop Reasoning Question Generation and Its Application", IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, IEEE SERVICE CENTRE , LOS ALAMITOS , CA, US, vol. 35, no. 1, 1 January 2021 (2021-01-01), US , pages 725 - 740, XP093111786, ISSN: 1041-4347, DOI: 10.1109/TKDE.2021.3073227 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117556381A (zh) * 2024-01-04 2024-02-13 华中师范大学 一种面向跨学科主观试题的知识水平深度挖掘方法及系统
CN117556381B (zh) * 2024-01-04 2024-04-02 华中师范大学 一种面向跨学科主观试题的知识水平深度挖掘方法及系统
CN117708336A (zh) * 2024-02-05 2024-03-15 南京邮电大学 一种基于主题增强和知识蒸馏的多策略情感分析方法
CN117708336B (zh) * 2024-02-05 2024-04-19 南京邮电大学 一种基于主题增强和知识蒸馏的多策略情感分析方法
CN117743315A (zh) * 2024-02-20 2024-03-22 浪潮软件科技有限公司 一种为多模态大模型系统提供高质量数据的方法

Similar Documents

Publication Publication Date Title
Bhutani et al. Learning to answer complex questions over knowledge bases with query composition
Li et al. Context-aware emotion cause analysis with multi-attention-based neural network
WO2023225858A1 (zh) 一种基于常识推理的阅读型考题生成系统及方法
Cai et al. Intelligent question answering in restricted domains using deep learning and question pair matching
CN109086269A (zh) 一种基于语义资源词表示和搭配关系的语义双关语识别方法
CN114429143A (zh) 一种基于强化蒸馏的跨语言属性级情感分类方法
CN111639254A (zh) 一种医疗领域的sparql查询语句的生成系统和方法
CN113779220A (zh) 一种基于三通道认知图谱和图注意力网络的蒙语多跳问答方法
Yan et al. Response selection from unstructured documents for human-computer conversation systems
Chen et al. ADOL: a novel framework for automatic domain ontology learning
Datta et al. Optimization of an automated examination generation system using hybrid recurrent neural network
Guo An automatic scoring method for Chinese-English spoken translation based on attention LSTM
Gu Corpus-driven resource recommendation algorithm for English online autonomous learning
Khandait et al. Automatic question generation through word vector synchronization using lamma
ALMUAYQIL et al. Towards an Ontology-Based Fully Integrated System for Student E-Assessment
Zhang et al. ELMo+ Gated self-attention network based on BiDAF for machine reading comprehension
Li An English Writing Grammar Error Correction Technology Based on Similarity Algorithm
Guo RETRACTED: An automatic scoring method for Chinese-English spoken translation based on attention LSTM [EAI Endorsed Scal Inf Syst (2022), Online First]
Chen et al. A Comprehensive Survey of Cognitive Graphs: Techniques, Applications, Challenges
Ringenberg Creating, testing and implementing a method for retrieving conversational inference with ontological semantics and defaults
Yuan et al. Application of Graph Convolutional Network in the Construction of Knowledge Graph for Higher Mathematics Teaching.
Pandey Modelling Alignment and Key Information for Automatic Grading
Peng et al. Readability assessment for Chinese L2 sentences: an extended knowledge base and comprehensive evaluation model-based method
Ho Question classification via machine learning techniques
Gupta Learning to Answer Multilingual and Code-Mixed Questions

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22943064

Country of ref document: EP

Kind code of ref document: A1