WO2023225858A1 - 一种基于常识推理的阅读型考题生成系统及方法 - Google Patents
一种基于常识推理的阅读型考题生成系统及方法 Download PDFInfo
- Publication number
- WO2023225858A1 WO2023225858A1 PCT/CN2022/094741 CN2022094741W WO2023225858A1 WO 2023225858 A1 WO2023225858 A1 WO 2023225858A1 CN 2022094741 W CN2022094741 W CN 2022094741W WO 2023225858 A1 WO2023225858 A1 WO 2023225858A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- graph
- model
- vector
- text
- entity
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 54
- 239000013598 vector Substances 0.000 claims description 88
- 230000006870 function Effects 0.000 claims description 64
- 238000012360 testing method Methods 0.000 claims description 59
- 238000009826 distribution Methods 0.000 claims description 52
- 230000008569 process Effects 0.000 claims description 25
- 230000009466 transformation Effects 0.000 claims description 16
- 238000000605 extraction Methods 0.000 claims description 14
- 238000012549 training Methods 0.000 claims description 13
- 238000010276 construction Methods 0.000 claims description 12
- 230000004927 fusion Effects 0.000 claims description 12
- 239000011159 matrix material Substances 0.000 claims description 12
- 230000007246 mechanism Effects 0.000 claims description 12
- 230000004913 activation Effects 0.000 claims description 8
- 238000007477 logistic regression Methods 0.000 claims description 8
- OAOSXODRWGDDCV-UHFFFAOYSA-N n,n-dimethylpyridin-4-amine;4-methylbenzenesulfonic acid Chemical compound CN(C)C1=CC=NC=C1.CC1=CC=C(S(O)(=O)=O)C=C1 OAOSXODRWGDDCV-UHFFFAOYSA-N 0.000 claims description 8
- 238000010606 normalization Methods 0.000 claims description 8
- 238000005457 optimization Methods 0.000 claims description 8
- 238000010586 diagram Methods 0.000 claims description 6
- 238000013528 artificial neural network Methods 0.000 claims description 5
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 claims description 4
- 230000002776 aggregation Effects 0.000 claims description 4
- 238000004220 aggregation Methods 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 4
- 238000012512 characterization method Methods 0.000 claims description 4
- 238000004821 distillation Methods 0.000 claims description 4
- 238000011156 evaluation Methods 0.000 claims description 4
- 238000001914 filtration Methods 0.000 claims description 4
- 239000012634 fragment Substances 0.000 claims description 4
- 230000003993 interaction Effects 0.000 claims description 4
- 238000002372 labelling Methods 0.000 claims description 4
- 238000005070 sampling Methods 0.000 claims description 4
- 239000007787 solid Substances 0.000 claims description 4
- 238000007670 refining Methods 0.000 claims description 2
- 238000013473 artificial intelligence Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000004931 aggregating effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000001427 coherent effect Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
Definitions
- the present invention relates to the field of artificial intelligence, and more specifically, to a reading test question generation system based on common sense reasoning.
- questioning studies the machine's ability to understand the semantics of a given text from another perspective, and therefore can support many useful applications.
- questioning can be used as a data enhancement strategy to reduce the cost of manual annotation of question and answer corpora.
- Interactive questions can also open up new topics for cold starts in the dialogue system; and question-based feedback can also promote information acquisition in search engines.
- Questioning is a cognitively demanding process that requires varying levels of understanding. Simple questions often only touch on the shallow meaning of the text, which can be well solved by context-based word matching. In practical applications, complex comprehensive examination questions have greater application value.
- test questions need to be solvable, and the solution results need to be consistent with the given answers, or these solution processes are not simple literal matching, but need to involve common sense reasoning.
- Traditional methods lack common sense modeling of this key reasoning process and implications, and research on how to use this knowledge to guide questioning directions is weak. This leads to a logical gap, that is, the machine cannot know what to ask and how to ask. Ask, in the end it can only output superficial and simple questions.
- test questions should also meet many types of language requirements. For example, the generated results need to have correct grammar, accurate semantics, and the questions must be valid and solvable, etc. Otherwise, generating test questions with a lot of typos, lack of logic, and even unreasonable or unsolvable questions will lead to a poor user experience.
- These languages require representation in discrete form, and conventional methods are difficult to integrate them into neural networks with continuous representation.
- the invention provides a reading-type test question generation system based on common sense reasoning, which can generate test questions with correct grammar and consistent content.
- Another object of the present invention is to provide a method for generating reading test questions based on common sense reasoning.
- a reading test question generation system based on common sense reasoning which is characterized by including:
- the inference clue graph extraction module is used to derive the inference clue graph based on the text context starting from the given answer: identify all entities and relationships of the input system text, and use context dependencies to construct the entity graph; select the question content from the entity graph , while retrieving relevant entities from the external common sense database to expand the entity graph to form a reasoning clue graph;
- the graph-guided question generation model module is used to generate high-quality test questions by combining all text and entity graphs, as well as multi-hop and common sense knowledge in the entity graphs;
- the posterior constraint learning module of language knowledge is used to train the graph-guided question generation model and learn the optimal model parameters.
- a reading test question generation method based on common sense reasoning including the following steps:
- the inference clue graph extraction module starts from the given answer and deduces the inference clue graph according to the text context: identifies all entities and relationships of the input system text, uses context dependencies to construct the entity graph; selects the question content from the entity graph, At the same time, relevant entities are retrieved from the external common sense database to expand the entity graph to form a reasoning clue graph;
- the graph-guided question generation model module combines all text and entity graphs, as well as multi-hop and common sense knowledge in entity graphs to generate high-quality exam questions;
- S3 The posterior constraint learning module of language knowledge trains the graph-guided question generation model to learn the optimal model parameters.
- the inference clue graph extraction module represents each input text sentence as a parsing tree.
- Each parsing tree node contains several entities and edges, where the edges represent contextual relationships and the punctuation marks in each node are filtered out. and stop words, aggregating equivalent nodes and coreference nodes in the parsing tree; adding inter-tree connecting edges between similar nodes in adjacent sentences to obtain an entity graph with potential clues.
- step S1 the process of selecting question content from the entity diagram is:
- Sentences containing answer keywords are identified through exact word matching or related sentences identified using a classifier based on the Rouge-L similarity metric, where the keywords are the remaining words after filtering the answer to stop words.
- step S1 the method of retrieving relevant entities from an external common sense database to expand the solid map is to use the entities of the input text as query conditions and retrieve relevant entities from an external open source common sense database through word matching.
- a graph-enhanced encoder is designed to combine all texts and entity graphs to complete the fusion of heterogeneous features of texts and graphs.
- the graph-enhanced encoder consists of six layers of cascade, each layer includes :
- Text self-attention sub-layer responsible for encoding text content and performing non-linear transformation on the given text input representation vector to obtain a new vector as the input of the next sub-layer:
- Graph attention sub-layer Considering that the nodes of the graph contain multiple words, through Characterize each node, where is the distributed embedding representation of the j-th word of the i-th node, m and n are respectively the start and end positions of the text fragment in the node, Represents the attention distribution of nodes and is used to express the importance of nodes; Defined as softmax(ReLU(W R [G; w j ])), where gi is the i-th column of matrix G and WR is a trainable parameter; subsequently, nodes are enriched by weighted aggregation of relevant semantics of neighboring nodes Contextual representation, where the weight is dynamically determined by the attention mechanism; in order to obtain this structural context information, the present invention obtains the edge correlation score by calculating the dot product between adjacent nodes i and j, Among them, ⁇ ij represents the relationship of nodes, which is learned from the corresponding relationship type, and is a trainable parameter; by normalizing the correlation
- Feedforward sublayer convert text vector z i and image vector Fusion together, since splicing will introduce a lot of noise, use a gating mechanism to obtain significant features and reduce noise, such as formula (3), where ⁇ represents element multiplication, f is a fusion vector, and eta is a learnable gate Control, used to selectively control features at different angles:
- the output code h i obtained through this transformation is regarded as the input of the next layer; after multiple layers of operations, the final characterization.
- step S2 the output matrix h of the last layer of the graph-enhanced encoder is regarded as a representation vector that integrates all input contents, and another converter is used to perform autoregressive decoding to generate test questions; the test questions are based on Probability Generated word by word, similar to the graph-augmented encoder, the decoder consists of multiple layers cascaded.
- the output of the l th layer will consider both parts, including those from the previous step
- the decoding results in are from the attention, and from the encoding content
- the attention representation of these two parts are fused through a multi-head attention layer and are expressed as After multiple layers of nonlinear transformation, the output vector can be obtained By normalizing this vector on the predefined vocabulary V through the logistic regression function Softmax, the output probability of the word can be obtained.
- W o and bo represent trainable parameters
- a copy mechanism is used according to the distribution Generate test questions by transcribing new words from the input text, where ⁇ is the attention of the input text; define a balance factor
- the words of the test questions can be generated one by one, where k is the balance factor, f( ⁇ ) is the feedforward neural network with Sigmoid activation function, y t-1 represents the generation in the t-1th step
- Distributed embedding representation vectors of words in order to avoid the problem of semantic drift, that is, questions where the answer is inconsistent with the question, answer encoding is used
- a special tag ⁇ eos> is also introduced to indicate the termination time of the generation process:
- step S3 supervised learning is used to train the graph-guided question generation model, that is, optimal model parameters are learned by maximizing the log-likelihood probability, refer to formula (5), where represents a training set containing N samples, and K represents the size of the words in the test questions.
- K represents the size of the words in the test questions.
- the present invention introduces a series of linguistic knowledge as regularization constraints to standardize the output probability distribution of the results. Regularization is achieved by satisfying the expectations of the constraints. This is achieved by the KL divergence between the distribution d( ⁇ ) and the model’s output distribution p ⁇ ( ⁇ ); a hybrid objective is used to fuse the supervisory loss and the posterior loss into formula (6), where is in the form
- step S3 three constraint functions are designed to improve the quality of model generation: common sense answerability, content association consistency, and expression grammar accuracy:
- the construction process of the common sense answerability constraint function is: extract the paragraph sentences most relevant to the question by matching similarity, use the Spacy toolkit to extract entities such as verbs and noun phrases from the answers and the extracted sentences, and combine the results of the question Group into pairs, and by introducing the learnable parameter ⁇ 1 , the entity constraint knowledge is expressed as If the kth entity pair is semantically similar, then vice versa, is the weight of each entity pair obtained from the attention network, where the network parameter is ⁇ 1 ; through this representation, when there are no similar entities between the answer and the sentence related to the question, then f 1 ( ⁇ ) is positive; Otherwise it is negative;
- the function F outputs a positive value, otherwise it outputs a negative value; by referring to the performance ranking of the evaluation data set,
- the Unicorn model currently the best performing question and answer model, is selected to predict answers. By penalizing samples with inconsistent answers, the model can be prompted to generate results with consistent answers, including in It is a judgment function that verifies the consistency between the prediction results and the annotated answers;
- the construction process of expressing the grammatical accuracy constraint function is to measure the smoothness of the generated results by calculating the perplexity of the language model.
- PLM is based on the pre-trained Roberta language model, and K is the number of question words; for the same answer, the generated results are often similar to the annotated results in terms of semantic structure and grammatical pattern.
- the word movement distance WMD is used to measure the two paragraphs.
- step S3 the KL divergence optimization objective function in formula (6) is regarded as a knowledge refining problem, that is, transferring knowledge from the teacher model d containing constraints to the student's question generation model p ⁇ , so , this objective function is solved using the expectation-maximization EM algorithm; in the tth expectation calculation step, through the formula Calculate the probability distribution d of the teacher model; then, calculate the maximum expectation, that is, update the probability distribution ⁇ of the student model through formula (8) to approximate the probability distribution d of the teacher model, where is the trade-off factor, o is the annotated question probability distribution, and E is the expected accumulated error.
- This distillation goal can strike a balance between the soft prediction of simulated d and the prediction of real results;
- the present invention finds that the contextual association between sentences helps to connect the reasoning clues scattered in the sentences. With the help of the context of the text, the clue chain of reasoning is deduced from the answer, and various potential common sense knowledge is integrated into this chain. . Then the clue chain is used to guide the generation of test questions that can be reasoned; in addition, the present invention also integrates various grammatical and semantic knowledge into the model as a posteriori constraints, thereby generating test questions with correct grammar and consistent content; by grasping potential common sense entities and relationships a priori The reasoning process is used to generate logically reasonable test questions; these constraints are used as posterior knowledge, and various language knowledge is flexibly integrated into the generative model through regularization, making the results smoother, more consistent, and more answerable. .
- Figure 1 shows an example of test questions that require common sense reasoning
- FIG. 2 is a system block diagram of the present invention
- Figure 3 is a flow chart of the method of the present invention.
- a reading test question generation system based on common sense reasoning is characterized by:
- the inference clue graph extraction module is used to derive the inference clue graph based on the text context starting from the given answer: identify all entities and relationships of the input system text, and use context dependencies to construct the entity graph; select the question content from the entity graph , while retrieving relevant entities from the external common sense database to expand the entity graph to form a reasoning clue graph;
- the graph-guided question generation model module is used to combine all text and entity graphs, as well as multi-hop and common sense knowledge in entity graphs to generate high-quality test questions;
- the posterior constraint learning module of language knowledge is used to train the graph-guided question generation model and learn the optimal model parameters.
- a reading test question generation method based on common sense reasoning includes the following steps:
- the inference clue graph extraction module starts from the given answer and deduces the inference clue graph according to the text context: identifies all entities and relationships of the input system text, uses context dependencies to construct the entity graph; selects the question content from the entity graph, At the same time, relevant entities are retrieved from the external common sense database to expand the entity graph to form a reasoning clue graph;
- the graph-guided question generation model module combines all text and entity graphs, as well as multi-hop and common sense knowledge in entity graphs to generate high-quality exam questions;
- S3 The posterior constraint learning module of language knowledge trains the graph-guided question generation model to learn the optimal model parameters.
- the reasoning clue graph extraction module represents each input text sentence as a parsing tree.
- Each parsing tree node contains several entities and edges, where the edges represent contextual relationships, and punctuation marks and stop words in each node are filtered out. Aggregate equivalent nodes and coreference nodes in the parse tree; add connecting edges between trees between similar nodes in adjacent sentences to obtain an entity graph with potential clues.
- step S1 the process of selecting question content from the entity diagram is:
- Sentences containing answer keywords are identified through exact word matching or related sentences identified using a classifier based on the Rouge-L similarity metric, where the keywords are the remaining words after filtering the answer to stop words.
- step S1 the method of retrieving relevant entities from the external common sense database to expand the solid graph is: using the entities of the input text as query conditions, and retrieving relevant entities from the external open source common sense database through word matching.
- a graph-enhanced encoder is designed to combine all texts and entity graphs to complete the fusion of heterogeneous features of texts and graphs.
- the graph-enhanced encoder consists of six layers of cascades, each layer including:
- Text self-attention sub-layer responsible for encoding text content and performing non-linear transformation on the given text input representation vector to obtain a new vector as the input of the next sub-layer:
- Graph attention sub-layer Considering that the nodes of the graph contain multiple words, through Characterize each node, where is the distributed embedding representation of the j-th word of the i-th node, m and n are respectively the start and end positions of the text fragment in the node, Represents the attention distribution of nodes and is used to express the importance of nodes; Defined as softmax(ReLU(W R [G; w j ])), where gi is the i-th column of matrix G and WR is a trainable parameter; subsequently, nodes are enriched by weighted aggregation of relevant semantics of neighboring nodes Contextual representation, where the weight is dynamically determined by the attention mechanism; in order to obtain this structural context information, the present invention obtains the edge correlation score by calculating the dot product between adjacent nodes i and j, Among them, ⁇ ij represents the relationship of nodes, which is learned from the corresponding relationship type, and is a trainable parameter; by normalizing the correlation
- Feedforward sublayer convert text vector z i and image vector Fusion together, since splicing will introduce a lot of noise, use a gating mechanism to obtain significant features and reduce noise, such as formula (3), where ⁇ represents element multiplication, f is a fusion vector, and eta is a learnable gate Control, used to selectively control features at different angles:
- the output code h i obtained through this transformation is regarded as the input of the next layer; after multiple layers of operations, the final characterization.
- step S2 the output matrix h of the last layer of the graph-enhanced encoder is regarded as a representation vector that integrates all input contents, and another converter is used to perform autoregressive decoding to generate test questions; the test questions are based on probability Generated word by word, similar to the graph-augmented encoder, the decoder consists of multiple layers cascaded.
- the output of the lth layer will consider both parts, including those from the previous step
- the decoding results in are from the attention, and from the encoding content
- the attention representation of these two parts are fused through a multi-head attention layer and are expressed as After multiple layers of nonlinear transformation, the output vector can be obtained By normalizing this vector on the predefined vocabulary V through the logistic regression function Softmax, the output probability of the word can be obtained.
- W o and bo represent trainable parameters
- a copy mechanism is used according to the distribution Generate test questions by transcribing new words from the input text, where ⁇ is the attention of the input text; define a balance factor
- the words of the test questions can be generated one by one, where k is the balance factor, f( ⁇ ) is the feedforward neural network with Sigmoid activation function, y t-1 represents the generation in the t-1th step
- Distributed embedding representation vectors of words in order to avoid the problem of semantic drift, that is, questions where the answer is inconsistent with the question, answer encoding is used
- a special tag ⁇ eos> is also introduced to indicate the termination time of the generation process:
- step S3 supervised learning is used to train the graph-guided question generation model, that is, the optimal model parameters are learned by maximizing the log-likelihood probability, refer to formula (5), where represents a training set containing N samples, and K represents the size of the words in the test questions.
- the results generated by the model are close to the manual annotation in the samples:
- step S3 three constraint functions, namely common sense answerability, content association consistency and expression grammar accuracy, are designed to improve the quality of model generation:
- the construction process of the common sense answerability constraint function is: extract the paragraph sentences most relevant to the question by matching similarity, use the Spacy toolkit to extract entities such as verbs and noun phrases from the answers and the extracted sentences, and combine the results of the question Group into pairs, and by introducing the learnable parameter ⁇ 1 , the entity constraint knowledge is expressed as If the kth entity pair is semantically similar, then vice versa, is the weight of each entity pair obtained from the attention network, where the network parameter is ⁇ 1 ; through this representation, when there are no similar entities between the answer and the sentence related to the question, then f 1 ( ⁇ ) is positive; Otherwise it is negative;
- the function F outputs a positive value, otherwise it outputs a negative value; by referring to the performance ranking of the evaluation data set,
- the Unicorn model currently the best performing question and answer model, is selected to predict answers. By penalizing samples with inconsistent answers, the model can be prompted to generate results with consistent answers, including in It is a judgment function that verifies the consistency between the prediction results and the annotated answers;
- the construction process of expressing the grammatical accuracy constraint function is to measure the smoothness of the generated results by calculating the perplexity of the language model.
- PLM is based on the pre-trained Roberta language model, and K is the number of question words; for the same answer, the generated results are often similar to the annotated results in terms of semantic structure and grammatical pattern.
- the word movement distance WMD is used to measure the two paragraphs.
- step S3 the KL divergence optimization objective function in formula (6) is regarded as a knowledge extraction problem, that is, transferring knowledge from the constrained teacher model d to the student's question generation model p ⁇ . Therefore, this objective function uses Expectation-maximization EM algorithm to solve; in the tth expectation calculation step, through the formula Calculate the probability distribution d of the teacher model; then, calculate the maximum expectation, that is, update the probability distribution ⁇ of the student model through formula (8) to approximate the probability distribution d of the teacher model, where is the trade-off factor, o is the annotated question probability distribution, and E is the expected accumulated error.
- This distillation goal can strike a balance between the soft prediction of simulated d and the prediction of real results;
- a reading test question generation system based on common sense reasoning is characterized by:
- the inference clue graph extraction module is used to derive the inference clue graph based on the text context starting from the given answer: identify all entities and relationships of the input system text, and use context dependencies to construct the entity graph; select the question content from the entity graph , while retrieving relevant entities from the external common sense database to expand the entity graph to form a reasoning clue graph;
- the graph-guided question generation model module is used to generate high-quality test questions by combining all text and entity graphs, as well as multi-hop and common sense knowledge in the entity graphs;
- the posterior constraint learning module of language knowledge is used to train the graph-guided question generation model and learn the optimal model parameters.
- a method applied to the above-mentioned reading test question generation system based on common sense reasoning includes the following steps:
- the inference clue graph extraction module starts from the given answer and deduces the inference clue graph according to the text context: identifies all entities and relationships of the input system text, uses context dependencies to construct the entity graph; selects the question content from the entity graph, At the same time, relevant entities are retrieved from the external common sense database to expand the entity graph to form a reasoning clue graph;
- the graph-guided question generation model module combines all text and entity graphs, as well as multi-hop and common sense knowledge in entity graphs to generate high-quality exam questions;
- S3 The posterior constraint learning module of language knowledge trains the graph-guided question generation model to learn the optimal model parameters.
- the reasoning clue graph extraction module represents each input text sentence as a parsing tree.
- Each parsing tree node contains several entities and edges, where the edges represent contextual relationships, and punctuation marks and stop words in each node are filtered out. Aggregate equivalent nodes and coreference nodes in the parse tree; add connecting edges between trees between similar nodes in adjacent sentences to obtain an entity graph with potential clues.
- step S1 the process of selecting question content from the entity diagram is:
- Sentences containing answer keywords are identified through exact word matching or related sentences identified using a classifier based on the Rouge-L similarity metric, where the keywords are the remaining words after filtering the answer to stop words.
- step S1 the method of retrieving relevant entities from the external common sense database to expand the solid graph is: using the entities of the input text as query conditions, and retrieving relevant entities from the external open source common sense database through word matching.
- a graph-enhanced encoder is designed to combine all texts and entity graphs to complete the fusion of heterogeneous features of texts and graphs.
- the graph-enhanced encoder consists of six layers of cascades, each layer including:
- Text self-attention sub-layer responsible for encoding text content and performing non-linear transformation on the given text input representation vector to obtain a new vector as the input of the next sub-layer:
- Graph attention sub-layer Considering that the nodes of the graph contain multiple words, through Characterize each node, where is the distributed embedding representation of the j-th word of the i-th node, m and n are respectively the start and end positions of the text fragment in the node, Represents the attention distribution of nodes and is used to express the importance of nodes; Defined as softmax(ReLU(W R [G; w j ])), where gi is the i-th column of matrix G and WR is a trainable parameter; subsequently, nodes are enriched by weighted aggregation of relevant semantics of neighboring nodes Contextual representation, where the weight is dynamically determined by the attention mechanism; in order to obtain this structural context information, the present invention obtains the edge correlation score by calculating the dot product between adjacent nodes i and j, Among them, ⁇ ij represents the relationship of nodes, which is learned from the corresponding relationship type, and is a trainable parameter; by normalizing the correlation
- Feedforward sublayer convert text vector z i and image vector Fusion together, since splicing will introduce a lot of noise, use a gating mechanism to obtain significant features and reduce noise, such as formula (3), where ⁇ represents element multiplication, f is a fusion vector, and eta is a learnable gate Control, used to selectively control features at different angles:
- the output code h i obtained through this transformation is regarded as the input of the next layer; after multiple layers of operations, the final characterization.
- step S2 the output matrix h of the last layer of the graph-enhanced encoder is regarded as a representation vector that integrates all input contents, and another converter is used to perform autoregressive decoding to generate test questions; the test questions are based on probability Generated word by word, similar to the graph-augmented encoder, the decoder consists of multiple layers cascaded.
- the output of the l th layer will consider both parts, including those from the previous step
- the decoding results in are from the attention, and from the encoding content
- the attention representation of these two parts are fused through a multi-head attention layer and are expressed as After multiple layers of nonlinear transformation, the output vector can be obtained By normalizing this vector on the predefined vocabulary V through the logistic regression function Softmax, the output probability of the word can be obtained.
- W o and bo represent trainable parameters
- a copy mechanism is used according to the distribution Generate test questions by transcribing new words from the input text, where ⁇ is the attention of the input text; define a balance factor
- the words of the test questions can be generated one by one, where k is the balance factor, f( ⁇ ) is the feedforward neural network with Sigmoid activation function, y t-1 represents the generation in the t-1th step
- Distributed embedding representation vectors of words in order to avoid the problem of semantic drift, that is, questions where the answer is inconsistent with the question, answer encoding is used
- a special tag ⁇ eos> is also introduced to indicate the termination time of the generation process:
- step S3 supervised learning is used to train the graph-guided question generation model, that is, the optimal model parameters are learned by maximizing the log-likelihood probability, refer to formula (5), where represents a training set containing N samples, and K represents the size of the words in the test questions.
- the results generated by the model are close to the manual annotation in the samples:
- step S3 three constraint functions, namely common sense answerability, content association consistency and expression grammar accuracy, are designed to improve the quality of model generation:
- the construction process of the common sense answerability constraint function is: extract the paragraph sentences most relevant to the question by matching similarity, use the Spacy toolkit to extract entities such as verbs and noun phrases from the answers and the extracted sentences, and combine the results of the question Group into pairs, and by introducing the learnable parameter ⁇ 1 , the entity constraint knowledge is expressed as If the kth entity pair is semantically similar, then vice versa, is the weight of each entity pair obtained from the attention network, where the network parameter is ⁇ 1 ; through this representation, when there are no similar entities between the answer and the sentence related to the question, then f 1 ( ⁇ ) is positive; Otherwise it is negative;
- the function F outputs a positive value, otherwise it outputs a negative value; by referring to the performance ranking of the evaluation data set,
- the Unicorn model currently the best performing question and answer model, is selected to predict answers. By penalizing samples with inconsistent answers, the model can be prompted to generate results with consistent answers, including in It is a judgment function that verifies the consistency between the prediction results and the annotated answers;
- the construction process of expressing the grammatical accuracy constraint function is to measure the smoothness of the generated results by calculating the perplexity of the language model.
- PLM is based on the pre-trained Roberta language model, and K is the number of question words; for the same answer, the generated results are often similar to the annotated results in terms of semantic structure and grammatical pattern.
- the word movement distance WMD is used to measure the two paragraphs.
- step S3 the KL divergence optimization objective function in formula (6) is regarded as a knowledge extraction problem, that is, transferring knowledge from the constrained teacher model d to the student's question generation model p ⁇ . Therefore, this objective function uses Expectation-maximization EM algorithm to solve; in the tth expectation calculation step, through the formula Calculate the probability distribution d of the teacher model; then, calculate the maximum expectation, that is, update the probability distribution ⁇ of the student model through formula (8) to approximate the probability distribution d of the teacher model, where is the trade-off factor, o is the annotated question probability distribution, and E is the expected accumulated error.
- This distillation goal can strike a balance between the soft prediction of simulated d and the prediction of real results;
Abstract
本发明提供一种基于常识推理的阅读型考题生成系统及方法,本发明首先通过答案为起点反推出推理线索图,以及其所涉及的相关实体和常识关系;然后以这个带有上下文结构的线索图为先验知识来指导生成考题,从而提高结果的逻辑合理性。此外,本发明还引入多种语言知识作为后验约束来正则化生成器;促使提高在常识可回答性、内容相关性和语法有效性等方面的考题质量;通过先验和后验的正则化联合学习,进而生成更加通顺和可推理的结果。
Description
本发明涉及人工智能领域,更具体地,涉及一种基于常识推理的阅读型考题生成系统。
出考题是一件非常耗费人力物力的事情,特别是涉及如高考等社会关注度高的考试,人们对出题者在廉政方面的要求很高,对泄题等违纪行为零容忍。而且人力出题的主观性较大。这种需求就促使了机器自动出题的快速发展。这种任务不但能够极大地减低人力成本,而且保密性和客观性更好,有利于考试的公平性。机器出考题任务逐步成为人工智能和自然语言处理领域的研究热点。它需要根据给定的文本来生成连贯的和与答案相关的问题。作为问答的双重任务,提问从另一个角度研究机器对给定文本语义的理解能力,因此可以支持许多有用的应用。譬如提问可以作为一种数据增强策略来降低问答语料库的人工标注成本。通过交互提问还可以在对话系统中为冷启动开辟新的话题;而且通过提问式的反馈还能以促进搜索引擎中的信息获取。提问是一个对认知要求很高的过程,需要不同水平的理解能力。简单的提问通常只会涉及文本的浅层含义,这可以通过基于上下文的词匹配得到了很好的解决。在实际应用中,复杂的综合考题具有更大的应用价值,譬如在教育领域,为了实现素质教育,教育部要求在小学的考试题中简单的字面匹配题占比不得超过5%,鼓励增加逻辑思考的综合题,特别是常识推理考题。相比于简单的字面匹配题,这种逻辑和常识相关的考题能够更好地评估学习效果,激发学生的自主学习能力。然而,要自动生成这种常识推理题并不容易,机器需要对散布在文本中的多个实体线索进行深入的思考和推理,甚至还需要对机器所缺乏的外部世界知识和人们生产生活中所约定俗成的常识进行理解。譬如图1所示,考题询问与公园中的喷泉有关的一个地方。与简单的字面匹配题不同,该考题的提问和答案之间并没有字面相似的之间关联。然而,通过关联给定文本中的多个证据线索(即喷泉、自由女神像)以及外部常识关系(即自由女神像、位于、纽约市)、(纽约市、部分、美国)、(美国、首都、华盛顿)、(白宫、 位于、华盛顿),却可以从逻辑关联提问和答案。这样的多跳推理链对于提问方向和回答过程至关重要。
传统的生成方法主要依靠人工制订的规则或模板将输入的文本转换为有提问。这些规则和模板很容易过度设计,导致模型的通用性和可伸缩性很差。因此,目前主流的方法均采用数据驱动的神经模型。该模型将生成任务视为一个类似翻译的序列映射问题。即通过从大量的训练数据中学习序列的映射模式,把输入的文本映射或者说翻译成提问。然而,这种方法适合生成简单的字面匹配题,却很难生成需要综合理解的常识推理题。因为常识推理题不是将给定的内容转换或归纳为语义等价的形式,而是一个受到各种语法和语义限制的生成。生成的考题除了需要通顺流利之外,还需要通过具有可答性和可推理性。也就是说,提问需要可解,而且解答结果需要与给定的答案一致,或者这些解答过程不是简单的字面匹配,而不是需要涉及常识推理。传统方法缺乏对这种关键的推理过程和蕴含的常识建模,而且对如何使用这些知识来引导提问方向的研究很薄弱,这就会导致逻辑上的鸿沟,即机器无法知道该问什么和如何问,最终只能输出肤浅的简单题。此外,考题还应该满足许多类型的语言要求,如生成的结果需要有正确的语法,准确的语义,题目还是有效可解的等。否则,生成错别字一大堆,缺乏逻辑,甚至题目不通甚至不可解的考题会导致用户体验很差。而这些语言要求以离散的形式表示,常规的方法很难把其集成到具有连续表示的神经网络中。
发明内容
本发明提供一种基于常识推理的阅读型考题生成系统,该系统可生成语法正确和内容一致的考题。
本发明的又一目的在于提供一种基于常识推理的阅读型考题生成方法。
为了达到上述技术效果,本发明的技术方案如下:
一种基于常识推理的阅读型考题生成系统,其特征在于,包括:
推理线索图抽取模块,用于从给定的答案开始,根据文本上下文反推出推理线索图:识别输入系统文本的所有实体和关系,利用上下文依赖关联来构建实体图;从实体图中选择发问内容,同时从外部常识库中检索相关实体以扩充实体图来形成推理线索图;
图引导的提问生成模型模块,用于结合所有文本和实体图,以及实体图中的多跳和常识知识来生成高质量的考题;
语言知识的后验约束学习模块,用于对图引导的提问生成模型进行训练学习出最优的模型参数。
一种基于常识推理的阅读型考题生成方法,包括以下步骤:
S1:推理线索图抽取模块从给定的答案开始,根据文本上下文反推出推理线索图:识别输入系统文本的所有实体和关系,利用上下文依赖关联来构建实体图;从实体图中选择发问内容,同时从外部常识库中检索相关实体以扩充实体图来形成推理线索图;
S2:图引导的提问生成模型模块结合所有文本和实体图,以及实体图中的多跳和常识知识来生成高质量的考题;
S3:语言知识的后验约束学习模块对图引导的提问生成模型进行训练学习出最优的模型参数。
进一步地,所述步骤S1中,推理线索图抽取模块将每个输入文本句子表示成解析树,每个解析树节点包含若干实体和边,其中边表示上下文关系,过滤掉每个节点中标点符号和停用词,聚合解析树中的等价节点以及共指节点;在相邻句子中的相似节点之间增加树间的连接边,获得具有潜在线索的实体图。
进一步地,所述步骤S1中,从实体图中选择发问内容的过程是:
标记两种与答案相关的句子,从实体图中删除其余的句子:
通过单词精确匹配来识别含有答案关键字的句子或者使用分类器基于Rouge-L的相似度量指标来识别出的相关句子,其中关键词是对答案过滤停用词后剩余的词。
进一步地,所述步骤S1中,从外部常识库中检索相关实体以扩充实体图的方式是:使用输入文本的实体作为查询条件,通过单词匹配从外部开源的常识库中检索出相关实体。
进一步地,所述步骤S2中,设计图增强的编码器来结合所有文本和实体图,完成融合文本和图的异构特征,该图增强的编码器由六层级联组成,每一层都包括:
文本自注意子层:负责对文本内容进行编码,对给定的文本输入表征向量进行非线性变换,以获得一个新的向量作为下一子层的输入:
给定文本输入X=(x
1,…x
T),其中x是每个单词的分布式嵌入表示,每个子层通过线性变换得到三个向量,包括键向量k
i=x
iW
K、值向量v
i=x
iW
V和查询 向量q
i=x
iW
Q,其中W
K、W
V和W
Q是可学习的矩阵参数;然后,通过点积计算查询向量与键向量之间的交互分数r
ij,即
通过逻辑回归函数Softmax对分数进行归一化,并通过公式(1)计算出注意力系数α
ij,其中d
x表示键向量的维度;通过对关注度和值向量的加权求和,就能得到上下文感知输出z
i;第一个子层通过输入文本的表示向量来初始化,其中对每个文本的单词从预训练向量库中检索出对应的向量,把每个单词的向量组成一个向量来表示文本;收集最后一层的输出
和
来分别表示输入的文本段落和答案:
图注意力子层:考虑到图的节点包含多个单词,通过
表征每个节点,其中
是第i个节点的第j个词的分布式嵌入表示,m和n分别是节点中文本片段的开始和结束位置,
表示节点的注意力分布,用于表示节点的重要性;将
定义为softmax(ReLU(W
R[G;w
j])),其中g
i是矩阵G的第i列,W
R是可训练的参数;随后,通过加权聚合相邻节点的相关语义来丰富节点的上下文表征,其中权重由注意力机制来动态确定;为了获得这种结构上下文信息,本发明通过计算相邻节点i和j之间的点积来获得边的相关度分数,
其中,τ
ij表示节点的关系,这种关系从对应的关系类型中学习获得,
和
是可训练的参数;通过归一化节点所有连通边的相关分数,就能够计算每个节点的关注度
参考公式(2),其中
表示节点i的临近节点;通过对关注度进行加权求和,得到图结构感知的输出
其中d
x是键向量的维度大小,
和
表示可学习的参数:
前馈子层:将文本向量z
i和图向量
融合在一起,由于拼接会引入大量的噪 声,使用门控机制来获得显著的特征并降低噪声,如公式(3),其中⊙表示元素乘法,f是一个融合向量,η是一个可学习的门控,用于选择性地控制不同角度的特征:
z
i=η⊙f+(1-η)⊙z
i;
此外,为了加强模型的泛化能力,引入了一个非线性变换,该变换通过带RELU激活函数的双层多层感知器MLP来实现,h
i=max(0,z
iW
1+b
1)W
2+b
2,其中W
1、W
2、b
1和b
2是可训练的参数,通过该变换获得的输出编码h
i被视为下一层的输入;经过多层的运算来获得最终的表征。
进一步地,所述步骤S2中,图增强的编码器最后一层的输出矩阵h看成是融合了所有输入内容的表征向量,采用另一种转换器来做自回归解码生成考题;考题是基于概率
逐个字地生成的,与图增强的编码器类似,解码器由多个层级联组成,在第t步,第l
th层的输出
会考虑了两个部分,包括来自于前一步骤
中的解码结果自注意力,和来自编码内容
的注意力表示,即
这两个部分通过一个多头的注意力层融合表示为
在经过多个层的非线性变换后,就能得到输出向量
通过逻辑回归函数Softmax在预定义的词汇表V上对该向量进行归一化,就能获得单词的输出概率
其中W
o和b
o表示可训练的参数;
为了解决词库外新词的生成,采用复制机制根据分布
从输入的文本中通过抄录新词的方式来生成考题,其中α是输入文本的关注度;定义一个平衡因子
通过对该分布进行抽样来就能逐个生成考题的单词,其中k是平衡因子,f(·)是带S型激活函数Sigmoid的前馈神经网络,y
t-1表示第t-1步骤中生成的单词的分布式嵌入表示向量;为了避免语义漂移问题,即答案与提问不一致的问题,使用答案编码
来初始化解码器;还引入了一个特殊的标记<eos>来指示生成过程的终止时机:
进一步地,所述步骤S3中,用监督学习来训练图引导的提问生成模型,即 通过最大化对数似然概率来学习出最优的模型参数,参考公式(5),其中
表示含有N个样本的训练集合,K表示考题提问中单词的规模,通过有监督的教师指导学习模式,促使模型生成的结果逼近样本中的人工标注:
但这种方式却不能保证生成结果是常识可推理的考题,为了解决这个问题,本发明引入一系列的语言学知识作为正则化约束来规范结果的输出概率分布,正则化是通过满足约束的期望分布d(·)和模型的输出分布p
θ(·)之间的KL散度来实现;采用了混合目标将监督损失和后验损失融合为公式(6),其中
是形式为
的约束集合;φ(·)是以b为界的约束特征函数;a,c,y分别表示答案、文本段落和提问;λ是权衡置信度的参数:
由于上述优化目标是凸函数,所以它有一个闭式解,参考公式(7),其中Z是归一化因子,δ是正则化因子,f
j(a,c,y)=b
j-φ
j(a,c,q)表示约束函数;也就是说,当(a,c,y)满足约束时,f
j(·)>0;最优分布d
*(·)不仅接近于从标注的训练数据中学习的分布p
θ(·),而且满足大多数的约束条件,这种后验正则化可灵活地将离散的约束知识注入到连续的模型中:
进一步地,所述步骤S3中,设计常识可答性、内容关联一致性和表述语法准确性三种约束函数来提高模型的生成质量:
常识可答性约束函数的构建过程是:通过匹配相似度来抽取与提问最相关的段落句子,利用Spacy工具包从答案和所抽取的句子中提取如动词、名词短语实体,并将提问的结果分组成对,通过引入可学习参数ω
1,本实体约束知识表示为
如果第k
th个实体对在语义上相似,则
反之亦然,
是从注意力网络中获得的每个实体对的权重,其中网络参数是ω
1;通过这种表示,当答案和提问相关的句子之 间没有相似的实体,则f
1(·)为正;否则为负;
内容关联一致性约束函数的构建过程是:采用数据驱动的分类器,f
2(a,c,y)=F(v
c,v
y;ω
2),其中v
c表示从段落c中提取的实体,v
y是从提问y中提取的实体,ω
2是参数,当这两对实体是语义相似的时候,函数F输出正值,否则输出负值;通过参考评测数据集的性能排行榜,选择了目前性能最好的问答模型Unicorn模型来预测答案,通过对答案不一致的样本进行惩罚,就能促使模型生成答案一致的结果,包括
其中
是验证预测结果与标注答案一致性的判断函数;
表述语法准确性约束函数的构建过程是:通过计算语言模型的困惑度来衡量生成结果的通顺程度,
其中P
LM是基于预训练的Roberta语言模型,K是提问词的数量;对于同一个答案,生成的结果往往在语义结构和语法模式方面与标注的结果相似,利用单词移动距离WMD来度量两段文本之间的语义相似度,f
5(a,c,y)=WMD(y,y
*)/Length(y),其中长度Length(·)是归一化函数,y
*是标注的结果;还通过依存句法树(DPTS)来计算语法结构的相似度,采用关注度向量树核(ACVT)来计算两棵句法分析树之间的共同子结构数量,从而计算出语法的相关度,f
6(a,c,y)=DPTS
ACVT(y,y
*)。
进一步地,所述步骤S3中,公式(6)中的KL散度优化目标函数看作是一个知识提炼问题,即将知识从含有约束的教师模型d转移到学生的提问生成模型p
θ中,因此,这个目标函数使用期望-最大化EM算法来解决;在第t个期望计算步骤中,通过公式
计算教师模型的概率分布d;随后,计算最大期望,即通过公式(8)来更新学生模型的概率分布θ来逼近教师模型的概率分布d,其中
是权衡因子,o是标注的提问概率分布,E是期望的积累误差,这个精馏目标可在模拟d的软预测和预测真实结果之间取得平衡;
除了学生的提问生成模型θ之外,还需要学习约束f的参数
及其置信度
公式(6)中的目标
发现当y是标注结果的时候,约束期望h(a,c,y;ω)=exp{δ·∑
lλ
lf
l(a,c,y;ω
l)}应该更大;h(·)看成是指示结果质量的似然函数,这使得目标类似于对应模型的变分下限,这就通过基于均方误差损失MSE的标注结果h
*来训练出ω,参考公式(9),此外,根据教师模型的概率分布d 来学习约束置信度λ,参考公式(10):
与现有技术相比,本发明技术方案的有益效果是:
本发明发现句子间的上下文关联有助于把分散在句子中的推理线索连接起来,借助文本的上下文,从答案反推出推理的线索链,并在这条链中融入了各类潜在的常识知识。然后利用线索链引导生成可推理的考题;此外,本发明还把各类语法和语义知识作为后验约束融入模型,进而生成语法正确和内容一致的考题;通过掌握潜在常识实体和关系等先验推理过程来产生逻辑上合理的考题;把这些约束作为后验知识,通过正则化将各种语言知识灵活地整合到生成模型中,让结果更通顺,一致性更好,可答性也更强。
图1为需要常识推理的考题示例图;
图2为本发明系统框图;
图3为本方明方法流程图。
附图仅用于示例性说明,不能理解为对本专利的限制;
为了更好说明本实施例,附图某些部件会有省略、放大或缩小,并不代表实际产品的尺寸;
对于本领域技术人员来说,附图中某些公知结构及其说明可能省略是理解的。
下面结合附图和实施例对本发明的技术方案做进一步的说明。
实施例1
如图2所示,一种基于常识推理的阅读型考题生成系统,其特征在于,包括:
推理线索图抽取模块,用于从给定的答案开始,根据文本上下文反推出推理线索图:识别输入系统文本的所有实体和关系,利用上下文依赖关联来构建实体图;从实体图中选择发问内容,同时从外部常识库中检索相关实体以扩充实体图来形成推理线索图;
图引导的提问生成模型模块,用于结合所有文本和实体图,以及实体图中的 多跳和常识知识来生成高质量的考题;
语言知识的后验约束学习模块,用于对图引导的提问生成模型进行训练学习出最优的模型参数。
实施例2
如图3所示,一种基于常识推理的阅读型考题生成方法,包括以下步骤:
S1:推理线索图抽取模块从给定的答案开始,根据文本上下文反推出推理线索图:识别输入系统文本的所有实体和关系,利用上下文依赖关联来构建实体图;从实体图中选择发问内容,同时从外部常识库中检索相关实体以扩充实体图来形成推理线索图;
S2:图引导的提问生成模型模块结合所有文本和实体图,以及实体图中的多跳和常识知识来生成高质量的考题;
S3:语言知识的后验约束学习模块对图引导的提问生成模型进行训练学习出最优的模型参数。
步骤S1中,推理线索图抽取模块将每个输入文本句子表示成解析树,每个解析树节点包含若干实体和边,其中边表示上下文关系,过滤掉每个节点中标点符号和停用词,聚合解析树中的等价节点以及共指节点;在相邻句子中的相似节点之间增加树间的连接边,获得具有潜在线索的实体图。
步骤S1中,从实体图中选择发问内容的过程是:
标记两种与答案相关的句子,从实体图中删除其余的句子:
通过单词精确匹配来识别含有答案关键字的句子或者使用分类器基于Rouge-L的相似度量指标来识别出的相关句子,其中关键词是对答案过滤停用词后剩余的词。
步骤S1中,从外部常识库中检索相关实体以扩充实体图的方式是:使用输入文本的实体作为查询条件,通过单词匹配从外部开源的常识库中检索出相关实体。
步骤S2中,设计图增强的编码器来结合所有文本和实体图,完成融合文本和图的异构特征,该图增强的编码器由六层级联组成,每一层都包括:
文本自注意子层:负责对文本内容进行编码,对给定的文本输入表征向量进行非线性变换,以获得一个新的向量作为下一子层的输入:
给定文本输入X=(x
1,…x
T),其中x是每个单词的分布式嵌入表示,每个子 层通过线性变换得到三个向量,包括键向量k
i=x
iW
K、值向量v
i=x
iW
V和查询向量q
i=x
iW
Q,其中W
K、W
V和W
Q是可学习的矩阵参数;然后,通过点积计算查询向量与键向量之间的交互分数r
ij,即
通过逻辑回归函数Softmax对分数进行归一化,并通过公式(1)计算出注意力系数α
ij,其中d
x表示键向量的维度;通过对关注度和值向量的加权求和,就能得到上下文感知输出z
i;第一个子层通过输入文本的表示向量来初始化,其中对每个文本的单词从预训练向量库中检索出对应的向量,把每个单词的向量组成一个向量来表示文本;收集最后一层的输出
和
来分别表示输入的文本段落和答案:
图注意力子层:考虑到图的节点包含多个单词,通过
表征每个节点,其中
是第i个节点的第j个词的分布式嵌入表示,m和n分别是节点中文本片段的开始和结束位置,
表示节点的注意力分布,用于表示节点的重要性;将
定义为softmax(ReLU(W
R[G;w
j])),其中g
i是矩阵G的第i列,W
R是可训练的参数;随后,通过加权聚合相邻节点的相关语义来丰富节点的上下文表征,其中权重由注意力机制来动态确定;为了获得这种结构上下文信息,本发明通过计算相邻节点i和j之间的点积来获得边的相关度分数,
其中,τ
ij表示节点的关系,这种关系从对应的关系类型中学习获得,
和
是可训练的参数;通过归一化节点所有连通边的相关分数,就能够计算每个节点的关注度
参考公式(2),其中
表示节点i的临近节点;通过对关注度进行加权求和,得到图结构感知的输出
其中d
x是键向量的维度大小,
和
表示可学习的参数:
前馈子层:将文本向量z
i和图向量
融合在一起,由于拼接会引入大量的噪声,使用门控机制来获得显著的特征并降低噪声,如公式(3),其中⊙表示元素乘法,f是一个融合向量,η是一个可学习的门控,用于选择性地控制不同角度的特征:
z
i=η⊙f+(1-η)⊙z
i;
此外,为了加强模型的泛化能力,引入了一个非线性变换,该变换通过带RELU激活函数的双层多层感知器MLP来实现,h
i=max(0,z
iW
1+b
1)W
2+b
2,其中W
1、W
2、b
1和b
2是可训练的参数,通过该变换获得的输出编码h
i被视为下一层的输入;经过多层的运算来获得最终的表征。
步骤S2中,图增强的编码器最后一层的输出矩阵h看成是融合了所有输入内容的表征向量,采用另一种转换器来做自回归解码生成考题;考题是基于概率
逐个字地生成的,与图增强的编码器类似,解码器由多个层级联组成,在第t步,第l
th层的输出
会考虑了两个部分,包括来自于前一步骤
中的解码结果自注意力,和来自编码内容
的注意力表示,即
这两个部分通过一个多头的注意力层融合表示为
在经过多个层的非线性变换后,就能得到输出向量
通过逻辑回归函数Softmax在预定义的词汇表V上对该向量进行归一化,就能获得单词的输出概率
其中W
o和b
o表示可训练的参数;
为了解决词库外新词的生成,采用复制机制根据分布
从输入的文本中通过抄录新词的方式来生成考题,其中α是输入文本的关注度;定义一个平衡因子
通过对该分布进行抽样来就能逐个生成考题的单词,其中k是平衡因子,f(·)是带S型激活函数Sigmoid的前馈神经网络,y
t-1表示第t-1步骤中生成的单词的分布式嵌入表示向量;为了避免语义漂移问题,即答案与提问不一致的问题,使用答案编码
来初始化解码器;还引入了一个特殊的标记<eos>来指示生成过程的终止时机:
步骤S3中,用监督学习来训练图引导的提问生成模型,即通过最大化对数似然概率来学习出最优的模型参数,参考公式(5),其中
表示含有N个样本的训练集合,K表示考题提问中单词的规模,通过有监督的教师指导学习模式,促使模型生成的结果逼近样本中的人工标注:
但这种方式却不能保证生成结果是常识推理考题,引入一系列的语言学知识作为附加约束来规范结果的输出概率分布,正则化是通过满足约束的期望分布d(·)和模型的输出分布p
θ(·)之间的KL散度来实现;采用了混合目标将监督损失和后验损失融合为公式(6),其中
是形式为
的约束集合;φ(·)是以b为界的约束特征函数;a,c,y分别表示答案、文本段落和提问;λ是权衡置信度的参数:
由于上述优化目标是凸函数,所以它有一个闭式解,参考公式(7),其中Z是归一化因子,δ是正则化因子,f
j(a,c,y)=b
j-φ
j(a,c,q)表示约束函数;也就是说,当(a,c,y)满足约束时,f
j(·)>0;最优分布d
*(·)不仅接近于从标注的训练数据中学习的分布p
θ(·),而且满足大多数的约束条件,这种后验正则化可灵活地将离散的约束知识注入到连续的模型中:
步骤S3中,设计常识可答性、内容关联一致性和表述语法准确性三种约束函数来提高模型的生成质量:
常识可答性约束函数的构建过程是:通过匹配相似度来抽取与提问最相关的段落句子,利用Spacy工具包从答案和所抽取的句子中提取如动词、名词短语实体,并将提问的结果分组成对,通过引入可学习参数ω
1,本实体约束知识表示为
如果第k
th个实体对在语义上相似,则
反之亦然,
是从注意力网络中获得的每个实 体对的权重,其中网络参数是ω
1;通过这种表示,当答案和提问相关的句子之间没有相似的实体,则f
1(·)为正;否则为负;
内容关联一致性约束函数的构建过程是:采用数据驱动的分类器,f
2(a,c,y)=F(v
c,v
y;ω
2),其中v
c表示从段落c中提取的实体,v
y是从提问y中提取的实体,ω
2是参数,当这两对实体是语义相似的时候,函数F输出正值,否则输出负值;通过参考评测数据集的性能排行榜,选择了目前性能最好的问答模型Unicorn模型来预测答案,通过对答案不一致的样本进行惩罚,就能促使模型生成答案一致的结果,包括
其中
是验证预测结果与标注答案一致性的判断函数;
表述语法准确性约束函数的构建过程是:通过计算语言模型的困惑度来衡量生成结果的通顺程度,
其中P
LM是基于预训练的Roberta语言模型,K是提问词的数量;对于同一个答案,生成的结果往往在语义结构和语法模式方面与标注的结果相似,利用单词移动距离WMD来度量两段文本之间的语义相似度,f
5(a,c,y)=WMD(y,y
*)/Length(y),其中长度Length(·)是归一化函数,y
*是标注的结果;还通过依存句法树(DPTS)来计算语法结构的相似度,采用关注度向量树核(ACVT)来计算两棵句法分析树之间的共同子结构数量,从而计算出语法的相关度,f
6(a,c,y)=DPTS
ACVT(y,y
*)。
步骤S3中,公式(6)中的KL散度优化目标函数看作是一个知识提炼问题,即将知识从含有约束的教师模型d转移到学生的提问生成模型p
θ中,因此,这个目标函数使用期望-最大化EM算法来解决;在第t个期望计算步骤中,通过公式
计算教师模型的概率分布d;随后,计算最大期望,即通过公式(8)来更新学生模型的概率分布θ来逼近教师模型的概率分布d,其中
是权衡因子,o是标注的提问概率分布,E是期望的积累误差,这个精馏目标可在模拟d的软预测和预测真实结果之间取得平衡;
除了学生的提问生成模型θ之外,还需要学习约束f的参数
及其置信度
公式(6)中的目标
发现当y是标注结果的时候,约束期望h(a,c,y;ω)=exp{δ·∑
lλ
lf
l(a,c,y;ω
l)}应该更大;h(·)看成是指示结果质量的似然函数,这使得目标类似于对应模型的变分下限,这就通过基于均方误差损失 MSE的标注结果h
*来训练出ω,参考公式(9),此外,根据教师模型的概率分布d来学习约束置信度λ,参考公式(10):
实施例3
如图2所示,一种基于常识推理的阅读型考题生成系统,其特征在于,包括:
推理线索图抽取模块,用于从给定的答案开始,根据文本上下文反推出推理线索图:识别输入系统文本的所有实体和关系,利用上下文依赖关联来构建实体图;从实体图中选择发问内容,同时从外部常识库中检索相关实体以扩充实体图来形成推理线索图;
图引导的提问生成模型模块,用于结合所有文本和实体图,以及实体图中的多跳和常识知识来生成高质量的考题;
语言知识的后验约束学习模块,用于对图引导的提问生成模型进行训练学习出最优的模型参数。
如图3所示,一种应用在上述基于常识推理的阅读型考题生成系统的方法,包括以下步骤:
S1:推理线索图抽取模块从给定的答案开始,根据文本上下文反推出推理线索图:识别输入系统文本的所有实体和关系,利用上下文依赖关联来构建实体图;从实体图中选择发问内容,同时从外部常识库中检索相关实体以扩充实体图来形成推理线索图;
S2:图引导的提问生成模型模块结合所有文本和实体图,以及实体图中的多跳和常识知识来生成高质量的考题;
S3:语言知识的后验约束学习模块对图引导的提问生成模型进行训练学习出最优的模型参数。
步骤S1中,推理线索图抽取模块将每个输入文本句子表示成解析树,每个解析树节点包含若干实体和边,其中边表示上下文关系,过滤掉每个节点中标点符号和停用词,聚合解析树中的等价节点以及共指节点;在相邻句子中的相似节点之间增加树间的连接边,获得具有潜在线索的实体图。
步骤S1中,从实体图中选择发问内容的过程是:
标记两种与答案相关的句子,从实体图中删除其余的句子:
通过单词精确匹配来识别含有答案关键字的句子或者使用分类器基于Rouge-L的相似度量指标来识别出的相关句子,其中关键词是对答案过滤停用词后剩余的词。
步骤S1中,从外部常识库中检索相关实体以扩充实体图的方式是:使用输入文本的实体作为查询条件,通过单词匹配从外部开源的常识库中检索出相关实体。
步骤S2中,设计图增强的编码器来结合所有文本和实体图,完成融合文本和图的异构特征,该图增强的编码器由六层级联组成,每一层都包括:
文本自注意子层:负责对文本内容进行编码,对给定的文本输入表征向量进行非线性变换,以获得一个新的向量作为下一子层的输入:
给定文本输入X=(x
1,…x
T),其中x是每个单词的分布式嵌入表示,每个子层通过线性变换得到三个向量,包括键向量k
i=x
iW
K、值向量v
i=x
iW
V和查询向量q
i=x
iW
Q,其中W
K、W
V和W
Q是可学习的矩阵参数;然后,通过点积计算查询向量与键向量之间的交互分数r
ij,即
通过逻辑回归函数Softmax对分数进行归一化,并通过公式(1)计算出注意力系数α
ij,其中d
x表示键向量的维度;通过对关注度和值向量的加权求和,就能得到上下文感知输出z
i;第一个子层通过输入文本的表示向量来初始化,其中对每个文本的单词从预训练向量库中检索出对应的向量,把每个单词的向量组成一个向量来表示文本;收集最后一层的输出
和
来分别表示输入的文本段落和答案:
图注意力子层:考虑到图的节点包含多个单词,通过
表征每个节点,其中
是第i个节点的第j个词的分布式嵌入表示,m和n分别是节点中文本片段的开始和结束位置,
表示节点的注意力分布,用于表示节点的重 要性;将
定义为softmax(ReLU(W
R[G;w
j])),其中g
i是矩阵G的第i列,W
R是可训练的参数;随后,通过加权聚合相邻节点的相关语义来丰富节点的上下文表征,其中权重由注意力机制来动态确定;为了获得这种结构上下文信息,本发明通过计算相邻节点i和j之间的点积来获得边的相关度分数,
其中,τ
ij表示节点的关系,这种关系从对应的关系类型中学习获得,
和
是可训练的参数;通过归一化节点所有连通边的相关分数,就能够计算每个节点的关注度
参考公式(2),其中
表示节点i的临近节点;通过对关注度进行加权求和,得到图结构感知的输出
其中d
x是键向量的维度大小,
和
表示可学习的参数:
前馈子层:将文本向量z
i和图向量
融合在一起,由于拼接会引入大量的噪声,使用门控机制来获得显著的特征并降低噪声,如公式(3),其中⊙表示元素乘法,f是一个融合向量,η是一个可学习的门控,用于选择性地控制不同角度的特征:
z
i=η⊙f+(1-η)⊙z
i;
此外,为了加强模型的泛化能力,引入了一个非线性变换,该变换通过带RELU激活函数的双层多层感知器MLP来实现,h
i=max(0,z
iW
1+b
1)W
2+b
2,其中W
1、W
2、b
1和b
2是可训练的参数,通过该变换获得的输出编码h
i被视为下一层的输入;经过多层的运算来获得最终的表征。
步骤S2中,图增强的编码器最后一层的输出矩阵h看成是融合了所有输入内容的表征向量,采用另一种转换器来做自回归解码生成考题;考题是基于概率
逐个字地生成的,与图增强的编码器类似,解码器由多个层级联组成,在第t步,第l
th层的输出
会考虑了两个部分,包括来自于前一步骤
中的解码结果自注意力,和来自编码内容
的注意力表示,即
这两个部分通过一个多头的注意力层融合表示为
在经过多个层的非线性变换后,就能得到输出向量
通过逻辑回归函数Softmax在预定义的词汇表V上对该向量进行归一化,就能获得单词的输出概率
其中W
o和b
o表示可训练的参数;
为了解决词库外新词的生成,采用复制机制根据分布
从输入的文本中通过抄录新词的方式来生成考题,其中α是输入文本的关注度;定义一个平衡因子
通过对该分布进行抽样来就能逐个生成考题的单词,其中k是平衡因子,f(·)是带S型激活函数Sigmoid的前馈神经网络,y
t-1表示第t-1步骤中生成的单词的分布式嵌入表示向量;为了避免语义漂移问题,即答案与提问不一致的问题,使用答案编码
来初始化解码器;还引入了一个特殊的标记<eos>来指示生成过程的终止时机:
步骤S3中,用监督学习来训练图引导的提问生成模型,即通过最大化对数似然概率来学习出最优的模型参数,参考公式(5),其中
表示含有N个样本的训练集合,K表示考题提问中单词的规模,通过有监督的教师指导学习模式,促使模型生成的结果逼近样本中的人工标注:
但这种方式却不能保证生成结果是常识推理考题,引入一系列的语言学知识作为附加约束来规范结果的输出概率分布,正则化是通过满足约束的期望分布d(·)和模型的输出分布p
θ(·)之间的KL散度来实现;采用了混合目标将监督损失和后验损失融合为公式(6),其中
是形式为
的约束集合;φ(·)是以b为界的约束特征函数;a,c,y分别表示答案、文本段落和提问;λ是权衡置信度的参数:
由于上述优化目标是凸函数,所以它有一个闭式解,参考公式(7),其中Z是归一化因子,δ是正则化因子,f
j(a,c,y)=b
j-φ
j(a,c,q)表示约束函数;也就 是说,当(a,c,y)满足约束时,f
j(·)>0;最优分布d
*(·)不仅接近于从标注的训练数据中学习的分布p
θ(·),而且满足大多数的约束条件,这种后验正则化可灵活地将离散的约束知识注入到连续的模型中:
步骤S3中,设计常识可答性、内容关联一致性和表述语法准确性三种约束函数来提高模型的生成质量:
常识可答性约束函数的构建过程是:通过匹配相似度来抽取与提问最相关的段落句子,利用Spacy工具包从答案和所抽取的句子中提取如动词、名词短语实体,并将提问的结果分组成对,通过引入可学习参数ω
1,本实体约束知识表示为
如果第k
th个实体对在语义上相似,则
反之亦然,
是从注意力网络中获得的每个实体对的权重,其中网络参数是ω
1;通过这种表示,当答案和提问相关的句子之间没有相似的实体,则f
1(·)为正;否则为负;
内容关联一致性约束函数的构建过程是:采用数据驱动的分类器,f
2(a,c,y)=F(v
c,v
y;ω
2),其中v
c表示从段落c中提取的实体,v
y是从提问y中提取的实体,ω
2是参数,当这两对实体是语义相似的时候,函数F输出正值,否则输出负值;通过参考评测数据集的性能排行榜,选择了目前性能最好的问答模型Unicorn模型来预测答案,通过对答案不一致的样本进行惩罚,就能促使模型生成答案一致的结果,包括
其中
是验证预测结果与标注答案一致性的判断函数;
表述语法准确性约束函数的构建过程是:通过计算语言模型的困惑度来衡量生成结果的通顺程度,
其中P
LM是基于预训练的Roberta语言模型,K是提问词的数量;对于同一个答案,生成的结果往往在语义结构和语法模式方面与标注的结果相似,利用单词移动距离WMD来度量两段文本之间的语义相似度,f
5(a,c,y)=WMD(y,y
*)/Length(y),其中长度Length(·)是归一化函数,y
*是标注的结果;还通过依存句法树(DPTS)来计算语法结构的相似度,采用关注度向量树核(ACVT)来计算两棵句法分析树之间的共同子结构数量,从而计算出语法的相关度,f
6(a,c,y)=DPTS
ACVT(y,y
*)。
步骤S3中,公式(6)中的KL散度优化目标函数看作是一个知识提炼问题,即将知识从含有约束的教师模型d转移到学生的提问生成模型p
θ中,因此,这个目标函数使用期望-最大化EM算法来解决;在第t个期望计算步骤中,通过公式
计算教师模型的概率分布d;随后,计算最大期望,即通过公式(8)来更新学生模型的概率分布θ来逼近教师模型的概率分布d,其中
是权衡因子,o是标注的提问概率分布,E是期望的积累误差,这个精馏目标可在模拟d的软预测和预测真实结果之间取得平衡;
除了学生的提问生成模型θ之外,还需要学习约束f的参数
及其置信度
公式(6)中的目标
发现当y是标注结果的时候,约束期望h(a,c,y;ω)=exp{δ·∑
lλ
lf
l(a,c,y;ω
l)}应该更大;h(·)看成是指示结果质量的似然函数,这使得目标类似于对应模型的变分下限,这就通过基于均方误差损失MSE的标注结果h
*来训练出ω,参考公式(9),此外,根据教师模型的概率分布d来学习约束置信度λ,参考公式(10):
相同或相似的标号对应相同或相似的部件;
附图中描述位置关系的用于仅用于示例性说明,不能理解为对本专利的限制;
显然,本发明的上述实施例仅仅是为清楚地说明本发明所作的举例,而并非是对本发明的实施方式的限定。对于所属领域的普通技术人员来说,在上述说明的基础上还做出其它不同形式的变化或变动。这里无需也无法对所有的实施方式予以穷举。凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等,均应包含在本发明权利要求的保护范围之内。
Claims (10)
- 一种基于常识推理的阅读型考题生成系统,其特征在于,包括:推理线索图抽取模块,用于从给定的答案开始,根据文本上下文反推出推理线索图:识别输入系统文本的所有实体和关系,利用上下文依赖关联来构建实体图;从实体图中选择发问内容,同时从外部常识库中检索相关实体以扩充实体图来形成推理线索图;图引导的提问生成模型模块,用于结合所有文本和实体图,以及实体图中的多跳和常识知识来生成高质量的考题;语言知识的后验约束学习模块,用于对图引导的提问生成模型进行训练学习出最优的模型参数。
- 一种应用在如权利要求1所述的基于常识推理的阅读型考题生成系统中的方法,其特征在于,包括以下步骤:S1:推理线索图抽取模块从给定的答案开始,根据文本上下文反推出推理线索图:识别输入系统文本的所有实体和关系,利用上下文依赖关联来构建实体图;从实体图中选择发问内容,同时从外部常识库中检索相关实体以扩充实体图来形成推理线索图;S2:图引导的提问生成模型模块结合所有文本和实体图,以及实体图中的多跳和常识知识来生成高质量的考题;S3:语言知识的后验约束学习模块对图引导的提问生成模型进行训练学习出最优的模型参数。
- 根据权利要求2所述的基于常识推理的阅读型考题生成方法,其特征在于,所述步骤S1中,推理线索图抽取模块将每个输入文本句子表示成解析树,每个解析树节点包含若干实体和边,其中边表示上下文关系,过滤掉每个节点中标点符号和停用词,聚合解析树中的等价节点以及共指节点;在相邻句子中的相似节点之间增加树间的连接边,获得具有潜在线索的实体图。
- 根据权利要求3所述的基于常识推理的阅读型考题生成方法,其特征在于,所述步骤S1中,从实体图中选择发问内容的过程是:标记两种与答案相关的句子,从实体图中删除其余的句子:通过单词精确匹配来识别含有答案关键字的句子或者使用分类器基于 Rouge-L的相似度量指标来识别出的相关句子,其中关键词是对答案过滤停用词后剩余的词。
- 根据权利要求4所述的基于常识推理的阅读型考题生成方法,其特征在于,所述步骤S1中,从外部常识库中检索相关实体以扩充实体图的方式是:使用输入文本的实体作为查询条件,通过单词匹配从外部开源的常识库中检索出相关实体。
- 根据权利要求5所述的基于常识推理的阅读型考题生成方法,其特征在于,所述步骤S2中,设计图增强的编码器来结合所有文本和实体图,完成融合文本和图的异构特征,该图增强的编码器由六层级联组成,每一层都包括:文本自注意子层:负责对文本内容进行编码,对给定的文本输入表征向量进行非线性变换,以获得一个新的向量作为下一子层的输入:给定文本输入X=(x 1,…x T),其中x是每个单词的分布式嵌入表示,每个子层通过线性变换得到三个向量,包括键向量k i=x iW K、值向量v i=x iW V和查询向量q i=x iW Q,其中W K、W V和W Q是可学习的矩阵参数;然后,通过点积计算查询向量与键向量之间的交互分数r ij,即 通过逻辑回归函数Softmax对分数进行归一化,并通过公式(1)计算出注意力系数α ij,其中d x表示键向量的维度;通过对关注度和值向量的加权求和,就能得到上下文感知输出z i;第一个子层通过输入文本的表示向量来初始化,其中对每个文本的单词从预训练向量库中检索出对应的向量,把每个单词的向量组成一个向量来表示文本;收集最后一层的输出 和 来分别表示输入的文本段落和答案:图注意力子层:考虑到图的节点包含多个单词,通过 表征每个节点,其中 是第i个节点的第j个词的分布式嵌入表示,m和n分别是节点中文本片段的开始和结束位置, 表示节点的注意力分布,用于表示节点的重要性;将 定义为softmax(ReLU(W R[G;w j])),其中g i是矩阵G的第i列,W R是 可训练的参数;随后,通过加权聚合相邻节点的相关语义来丰富节点的上下文表征,其中权重由注意力机制来动态确定;为了获得这种结构上下文信息,本发明通过计算相邻节点i和j之间的点积来获得边的相关度分数, 其中,τ ij表示节点的关系,这种关系从对应的关系类型中学习获得, 和 是可训练的参数;通过归一化节点所有连通边的相关分数,就能够计算每个节点的关注度 参考公式(2),其中 表示节点i的临近节点;通过对关注度进行加权求和,得到图结构感知的输出 其中d x是键向量的维度大小, 和 表示可学习的参数:前馈子层:将文本向量z i和图向量 融合在一起,由于拼接会引入大量的噪声,使用门控机制来获得显著的特征并降低噪声,如公式(3),其中⊙表示元素乘法,f是一个融合向量,η是一个可学习的门控,用于选择性地控制不同角度的特征:z i=η⊙f+(1-η)⊙z i;此外,为了加强模型的泛化能力,引入了一个非线性变换,该变换通过带RELU激活函数的双层多层感知器MLP来实现,h i=max(0,z iW 1+b 1)W 2+b 2,其中W 1、W 2、b 1和b 2是可训练的参数,通过该变换获得的输出编码h i被视为下一层的输入;经过多层的运算来获得最终的表征。
- 根据权利要求6所述的基于常识推理的阅读型考题生成方法,其特征在于,所述步骤S2中,图增强的编码器最后一层的输出矩阵h看成是融合了所有输入内容的表征向量,采用另一种转换器来做自回归解码生成考题;考题是基于概率 逐个字地生成的,与图增强的编码器类似,解码器由多个层级联组成,在第t步,第l th层的输出 会考虑了两个部分,包括来自于前一步骤 中的解码结果自注意力,和来自编码内容 的注意力表示,即 这两个部分通过一个多头的注意力层融合表示为 在经过多个层的非线性变换后,就能得到输出向量 通过逻辑回归函数Softmax在预定义的词汇表V上对该向量进行归一化,就能获得单词的输出概率 其中W o和b o表示可训练的参数;为了解决词库外新词的生成,采用复制机制根据分布 从输入的文本中通过抄录新词的方式来生成考题,其中α是输入文本的关注度;定义一个平衡因子 通过对该分布进行抽样来就能逐个生成考题的单词,其中k是平衡因子,f(·)是带S型激活函数Sigmoid的前馈神经网络,y t-1表示第t-1步骤中生成的单词的分布式嵌入表示向量;为了避免语义漂移问题,即答案与提问不一致的问题,使用答案编码 来初始化解码器;还引入了一个特殊的标记<eos>来指示生成过程的终止时机:
- 根据权利要求7所述的基于常识推理的阅读型考题生成方法,其特征在于,所述步骤S3中,用监督学习来训练图引导的提问生成模型,即通过最大化对数似然概率来学习出最优的模型参数,参考公式(5),其中 表示含有N个样本的训练集合,K表示考题提问中单词的规模,通过有监督的教师指导学习模式,促使模型生成的结果逼近样本中的人工标注:但这种方式却不能保证生成结果是常识可推理的考题。为了解决这个问题,本发明引入一系列的语言学知识作为正则化约束来规范结果的输出概率分布,其中正则化是通过满足约束的期望分布d(·)和模型的输出分布p θ(·)之间的KL散度来实现;采用了混合目标将监督损失和后验损失融合为公式(6),其中 是形式为 的约束集合;φ(·)是以b为界的约束特征函数;a,c,y分别表示答案、文本段落和提问;λ是权衡置信度的参数:由于上述优化目标是凸函数,所以它有一个闭式解,参考公式(7),其中Z 是归一化因子,δ是正则化因子,f j(a,c,y)=b j-φ j(a,c,q)表示约束函数;也就是说,当(a,c,y)满足约束时,f j(·)>0;最优分布d *(·)不仅接近于从标注的训练数据中学习的分布p θ(·),而且满足大多数的约束条件,这种后验正则化可灵活地将离散的约束知识注入到连续的模型中:
- 根据权利要求8所述的基于常识推理的阅读型考题生成方法,其特征在于,所述步骤S3中,设计常识可答性、内容关联一致性和表述语法准确性三种约束函数来提高模型的生成质量:常识可答性约束函数的构建过程是:通过匹配相似度来抽取与提问最相关的段落句子,利用Spacy工具包从答案和所抽取的句子中提取如动词、名词短语实体,并将提问的结果分组成对,通过引入可学习参数ω 1,本实体约束知识表示为 如果第k th个实体对在语义上相似,则 反之亦然, 是从注意力网络中获得的每个实体对的权重,其中网络参数是ω 1;通过这种表示,当答案和提问相关的句子之间没有相似的实体,则f 1(·)为正;否则为负;内容关联一致性约束函数的构建过程是:采用数据驱动的分类器,f 2(a,c,y)=F(v c,v y;ω 2),其中v c表示从段落c中提取的实体,v y是从提问y中提取的实体,ω 2是参数,当这两对实体是语义相似的时候,函数F输出正值,否则输出负值;通过参考评测数据集的性能排行榜,选择了目前性能最好的问答模型Unicorn模型来预测答案,通过对答案不一致的样本进行惩罚,就能促使模型生成答案一致的结果,包括 其中 是验证预测结果与标注答案一致性的判断函数;表述语法准确性约束函数的构建过程是:通过计算语言模型的困惑度来衡量生成结果的通顺程度, 其中P LM是基于预训练的Roberta语言模型,K是提问词的数量;对于同一个答案,生成的结果往往在语义结构和语法模式方面与标注的结果相似,利用单词移动距离WMD来度量两段文本之间的语义相似度,f 5(a,c,y)=WMD(y,y *)/Length(y),其中长度Length(·)是归一化函数,y *是标注的结果;还通过依存句法树(DPTS) 来计算语法结构的相似度,采用关注度向量树核(ACVT)来计算两棵句法分析树之间的共同子结构数量,从而计算出语法的相关度,f 6(a,c,y)=DPTS ACVT(y,y *)。
- 根据权利要求9所述的基于常识推理的阅读型考题生成方法,其特征在于,所述步骤S3中,公式(6)中的KL散度优化目标函数看作是一个知识提炼问题,即将知识从含有约束的教师模型d转移到学生的提问生成模型p θ中,因此,这个目标函数使用期望-最大化EM算法来解决;在第t个期望计算步骤中,通过公式 计算教师模型的概率分布d;随后,计算最大期望,即通过公式(8)来更新学生模型的概率分布θ来逼近教师模型的概率分布d,其中 是权衡因子,o是标注的提问概率分布,E是期望的积累误差,这个精馏目标可在模拟d的软预测和预测真实结果之间取得平衡;除了学生的提问生成模型θ之外,还需要学习约束f的参数 及其置信度 公式(6)中的目标 发现当y是标注结果的时候,约束期望h(a,c,y;ω)=exp{δ·∑ lλ lf l(a,c,y;ω l)}应该更大;h(·)看成是指示结果质量的似然函数,这使得目标类似于对应模型的变分下限,这就通过基于均方误差损失MSE的标注结果h *来训练出ω,参考公式(9),此外,根据教师模型的概率分布d来学习约束置信度λ,参考公式(10):
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2022/094741 WO2023225858A1 (zh) | 2022-05-24 | 2022-05-24 | 一种基于常识推理的阅读型考题生成系统及方法 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2022/094741 WO2023225858A1 (zh) | 2022-05-24 | 2022-05-24 | 一种基于常识推理的阅读型考题生成系统及方法 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023225858A1 true WO2023225858A1 (zh) | 2023-11-30 |
Family
ID=88918231
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/094741 WO2023225858A1 (zh) | 2022-05-24 | 2022-05-24 | 一种基于常识推理的阅读型考题生成系统及方法 |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2023225858A1 (zh) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117556381A (zh) * | 2024-01-04 | 2024-02-13 | 华中师范大学 | 一种面向跨学科主观试题的知识水平深度挖掘方法及系统 |
CN117708336A (zh) * | 2024-02-05 | 2024-03-15 | 南京邮电大学 | 一种基于主题增强和知识蒸馏的多策略情感分析方法 |
CN117743315A (zh) * | 2024-02-20 | 2024-03-22 | 浪潮软件科技有限公司 | 一种为多模态大模型系统提供高质量数据的方法 |
CN117708336B (zh) * | 2024-02-05 | 2024-04-19 | 南京邮电大学 | 一种基于主题增强和知识蒸馏的多策略情感分析方法 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109947912A (zh) * | 2019-01-25 | 2019-06-28 | 四川大学 | 一种基于段落内部推理和联合问题答案匹配的模型方法 |
CN111078836A (zh) * | 2019-12-10 | 2020-04-28 | 中国科学院自动化研究所 | 基于外部知识增强的机器阅读理解方法、系统、装置 |
CN111274800A (zh) * | 2020-01-19 | 2020-06-12 | 浙江大学 | 基于关系图卷积网络的推理型阅读理解方法 |
CN112417104A (zh) * | 2020-12-04 | 2021-02-26 | 山西大学 | 一种句法关系增强的机器阅读理解多跳推理模型及方法 |
US20210406669A1 (en) * | 2020-06-25 | 2021-12-30 | International Business Machines Corporation | Learning neuro-symbolic multi-hop reasoning rules over text |
WO2022036616A1 (zh) * | 2020-08-20 | 2022-02-24 | 中山大学 | 一种基于低标注资源生成可推理问题的方法和装置 |
-
2022
- 2022-05-24 WO PCT/CN2022/094741 patent/WO2023225858A1/zh unknown
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109947912A (zh) * | 2019-01-25 | 2019-06-28 | 四川大学 | 一种基于段落内部推理和联合问题答案匹配的模型方法 |
CN111078836A (zh) * | 2019-12-10 | 2020-04-28 | 中国科学院自动化研究所 | 基于外部知识增强的机器阅读理解方法、系统、装置 |
CN111274800A (zh) * | 2020-01-19 | 2020-06-12 | 浙江大学 | 基于关系图卷积网络的推理型阅读理解方法 |
US20210406669A1 (en) * | 2020-06-25 | 2021-12-30 | International Business Machines Corporation | Learning neuro-symbolic multi-hop reasoning rules over text |
WO2022036616A1 (zh) * | 2020-08-20 | 2022-02-24 | 中山大学 | 一种基于低标注资源生成可推理问题的方法和装置 |
CN112417104A (zh) * | 2020-12-04 | 2021-02-26 | 山西大学 | 一种句法关系增强的机器阅读理解多跳推理模型及方法 |
Non-Patent Citations (2)
Title |
---|
BAUER LISA, WANG YICHENG, BANSAL MOHIT: "Commonsense for Generative Multi-Hop Question Answering Tasks", PROCEEDINGS OF THE 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, STROUDSBURG, PA, USA, 1 January 2018 (2018-01-01), Stroudsburg, PA, USA, pages 4220 - 4230, XP093111788, DOI: 10.18653/v1/D18-1454 * |
YU JIANXING, SU QINLIANG, QUAN XIAOJUN, YIN JIAN: "Multi-hop Reasoning Question Generation and Its Application", IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, IEEE SERVICE CENTRE , LOS ALAMITOS , CA, US, vol. 35, no. 1, 1 January 2021 (2021-01-01), US , pages 725 - 740, XP093111786, ISSN: 1041-4347, DOI: 10.1109/TKDE.2021.3073227 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117556381A (zh) * | 2024-01-04 | 2024-02-13 | 华中师范大学 | 一种面向跨学科主观试题的知识水平深度挖掘方法及系统 |
CN117556381B (zh) * | 2024-01-04 | 2024-04-02 | 华中师范大学 | 一种面向跨学科主观试题的知识水平深度挖掘方法及系统 |
CN117708336A (zh) * | 2024-02-05 | 2024-03-15 | 南京邮电大学 | 一种基于主题增强和知识蒸馏的多策略情感分析方法 |
CN117708336B (zh) * | 2024-02-05 | 2024-04-19 | 南京邮电大学 | 一种基于主题增强和知识蒸馏的多策略情感分析方法 |
CN117743315A (zh) * | 2024-02-20 | 2024-03-22 | 浪潮软件科技有限公司 | 一种为多模态大模型系统提供高质量数据的方法 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Bhutani et al. | Learning to answer complex questions over knowledge bases with query composition | |
Li et al. | Context-aware emotion cause analysis with multi-attention-based neural network | |
WO2023225858A1 (zh) | 一种基于常识推理的阅读型考题生成系统及方法 | |
Cai et al. | Intelligent question answering in restricted domains using deep learning and question pair matching | |
CN109086269A (zh) | 一种基于语义资源词表示和搭配关系的语义双关语识别方法 | |
CN114429143A (zh) | 一种基于强化蒸馏的跨语言属性级情感分类方法 | |
CN111639254A (zh) | 一种医疗领域的sparql查询语句的生成系统和方法 | |
CN113779220A (zh) | 一种基于三通道认知图谱和图注意力网络的蒙语多跳问答方法 | |
Yan et al. | Response selection from unstructured documents for human-computer conversation systems | |
Chen et al. | ADOL: a novel framework for automatic domain ontology learning | |
Datta et al. | Optimization of an automated examination generation system using hybrid recurrent neural network | |
Guo | An automatic scoring method for Chinese-English spoken translation based on attention LSTM | |
Gu | Corpus-driven resource recommendation algorithm for English online autonomous learning | |
Khandait et al. | Automatic question generation through word vector synchronization using lamma | |
ALMUAYQIL et al. | Towards an Ontology-Based Fully Integrated System for Student E-Assessment | |
Zhang et al. | ELMo+ Gated self-attention network based on BiDAF for machine reading comprehension | |
Li | An English Writing Grammar Error Correction Technology Based on Similarity Algorithm | |
Guo | RETRACTED: An automatic scoring method for Chinese-English spoken translation based on attention LSTM [EAI Endorsed Scal Inf Syst (2022), Online First] | |
Chen et al. | A Comprehensive Survey of Cognitive Graphs: Techniques, Applications, Challenges | |
Ringenberg | Creating, testing and implementing a method for retrieving conversational inference with ontological semantics and defaults | |
Yuan et al. | Application of Graph Convolutional Network in the Construction of Knowledge Graph for Higher Mathematics Teaching. | |
Pandey | Modelling Alignment and Key Information for Automatic Grading | |
Peng et al. | Readability assessment for Chinese L2 sentences: an extended knowledge base and comprehensive evaluation model-based method | |
Ho | Question classification via machine learning techniques | |
Gupta | Learning to Answer Multilingual and Code-Mixed Questions |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22943064 Country of ref document: EP Kind code of ref document: A1 |