CN112613323A - Grammar dependence enhanced mathematic application topic semantic recognition and inference method and system - Google Patents
Grammar dependence enhanced mathematic application topic semantic recognition and inference method and system Download PDFInfo
- Publication number
- CN112613323A CN112613323A CN202011517409.2A CN202011517409A CN112613323A CN 112613323 A CN112613323 A CN 112613323A CN 202011517409 A CN202011517409 A CN 202011517409A CN 112613323 A CN112613323 A CN 112613323A
- Authority
- CN
- China
- Prior art keywords
- semantic
- clause
- node
- symbol
- representation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a method and a system for recognizing and reasoning semantics of a mathematic application question with enhanced grammar dependence. In addition, semantic vectors generated by the encoder can provide certain technical support for practices in the intelligent education field such as better test question representation, automatic test question labeling, personalized test question recommendation and the like for a plurality of education platforms, and certain potential economic benefits can be brought.
Description
Technical Field
The invention relates to the technical field of machine learning, artificial intelligence and intelligent education, in particular to a method and a system for recognizing and reasoning semantics of a mathematic application problem with enhanced grammar dependence.
Background
In intelligent education, automatic reasoning and solving of mathematical application problems is a challenging task. The math application topic comprises a story described in a natural language and an associated math topic. The problem of the mathematic application problem is given by natural language, and the answer is obtained by mathematic reasoning, so that the algorithm for solving the mathematic application problem is required to have semantic understanding capability and mathematic reasoning capability on the natural language.
In the existing method, when reasoning is carried out on a mathematic application question, the question is converted into a mathematical expression which can be calculated and solved by combining with a logic structure of the mathematical expression. However, these works only focus on how to make a computer perform mathematical reasoning, but do not deeply understand the semantics of the topic text, neglect the syntax dependency relationships (e.g., the number and the modified entities) among the elements in the topic, and are difficult to accurately identify the details of the mathematical application topic, and cannot effectively describe the semantics and details of the topic, resulting in poor effect in practical application of intelligent education.
Disclosure of Invention
The invention aims to provide a method and a system for recognizing and reasoning the semantics of a mathematic application question with enhanced grammar dependence, which can accurately recognize the details of the mathematic application question, effectively describe the semantics and the details of the question and improve the effect in the practical application of intelligent education.
The purpose of the invention is realized by the following technical scheme:
a semantic recognition and inference method for a mathematic application topic with enhanced grammar dependence comprises the following steps:
dividing a topic text into a plurality of clauses, learning semantic information representation of the topic from local to global by adopting a hierarchical word-clause-topic encoder, enhancing clause semantics by utilizing intra-clause association, establishing a syntax dependency tree to store a topic text structure, modeling structural dependency among elements in the clauses, and finally obtaining representation of global semantic information corresponding to the topic text;
and recursively generating a mathematical expression corresponding to the title text by combining a decoder based on a tree structure with the representation of the global semantic information.
Compared with the traditional serialization processing method, the technical scheme provided by the invention can more effectively describe the semantics and details of the questions, thereby improving the reasoning accuracy of the model.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.
FIG. 1 is a diagram illustrating a mathematical problem according to an embodiment of the present invention;
FIG. 2 is a frame diagram of a semantic recognition and inference method for mathematics application problems with enhanced grammar dependence according to an embodiment of the present invention;
FIG. 3 is a block diagram of a syntax dependency enhancement clause representation module according to an embodiment of the present invention;
fig. 4 is a schematic diagram of a semantic recognition and inference system for math application questions with enhanced grammar dependence according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention provides a semantic recognition and reasoning method for a mathematic application topic with enhanced grammar dependence, which mainly comprises the following steps:
dividing a topic text into a plurality of clauses, learning semantic information representation of the topic from local to global by adopting a hierarchical word-clause-topic encoder, enhancing the clause semantics by utilizing intra-clause association, establishing a syntax dependency tree to store a topic structure, modeling the structural dependency among elements in the clauses, and finally obtaining the representation of the global semantic information corresponding to the topic text;
and recursively generating a mathematical expression corresponding to the title text by combining a decoder based on a tree structure with the representation of the global semantic information.
In the scheme of the embodiment of the invention, clauses are taken as semantic units, a syntax dependent tree is used for expressing a sentence structure and modeling semantic information of a question text in a layering manner, and a recursion sequence model based on the tree structure is used for capturing the structural dependence relationship among elements (such as numbers and operators) of a mathematical expression. The accuracy of the mathematical reasoning of the algorithm is improved by fusing the local details, the global target and the dependency relationship among all elements of the topic text.
The above scheme actually covers two phases:
the first stage is a topic understanding stage: the reading and understanding habits of human beings are simulated, the topic text is divided into a plurality of clauses, and then a hierarchical word-clause-topic encoder is designed to learn the semantic information representation of the topic text from local to global. Furthermore, in order to enhance sentence semantics by using intra-sentence association, in a sentence understanding stage, a grammar dependency tree is established to store a question text structure and a module based on a dependency tree is designed to model structural dependency among elements in a sentence.
The second phase is the inference phase: a decoder based on a tree structure recursively generates a mathematical expression, firstly, a hierarchical attention mechanism is utilized to explicitly synthesize information enhancement semantics of different levels and extract related context information, and then a pointer-generation network guidance model is utilized to extract existing information and infer additional knowledge to predict each position symbol.
For the sake of understanding, the above-described scheme will be described in detail with reference to the accompanying drawings.
In the practice of the present invention, a mathematical application problem in a broad sense is used as the data set. FIG. 1 is a mathematical problem description diagram; the text content of the title needs to contain natural text (such as words) for describing title information and numerical values (such as "3" and "4" in fig. 1) for describing quantity information, each piece of data comprises title text (such as "recipe" part in fig. 1), mathematical expressions (such as "Expression" part in fig. 1) and numerical answers (such as "Answer" part in fig. 1), wherein the title text is necessary for the scheme, the mathematical expressions are only used for training of the scheme, and the numerical answers are only used for effect evaluation of the scheme, and only the title text is needed in actual use. Such data are for example the open source Math application topic data set (Math23K) published by Tencent and the open source data set (MAWPS) published by Microsoft. In addition, the input data set can also be obtained by collecting the homework or examination math question sets of primary and secondary school students through network crawling or offline.
The invention aims to model semantic information according to texts, so as to carry out deep reasoning and further generate a mathematical expression; the semantic vector in the stage can provide better services such as test question representation, test question automatic labeling, personalized test question recommendation and the like for a plurality of education platforms, and certain potential economic benefits are brought. Meanwhile, the answer may be solved according to the generated mathematical expression.
In the embodiment of the present invention, each topic text P is represented as a sequence P ═ P consisting of n words or numeric values1,p2,…,pnWherein p issAre either a word (e.g., "length" and "width" in fig. 1) or a numerical value (e.g., "3" and "4") in the subject text, s is 1, …, n; the set of values of the topic text P is defined as NP(e.g., {3, 4 }). Since the invention is not concerned with the specific values of these numbers, all values can be mapped to the special word "NUM", and thus P in the sequence P described abovesMay be understood as a word.
Expression E corresponding to topic PPCan be defined as one composed of G operators, constants or questionsSequence E of numerical values in the target textP={y1,y2,…,yGWherein each yiSet of decoded symbols V all of title PPAn element of (1). Set of symbols VPSet of operators VO(e.g., { +, -, ×, ÷ div }), and a set of constants VC(e.g., {1,2, π }) and a set of values NPComposition, defined as VP=VO∪VC∪NP(e.g., { +, -, ×, div, 1,2, π,3,4}), due to N per topicPIn contrast, V for each topicPAnd also different.
Given the input sequence of the question P, the goal of the semantic recognition and reasoning of the mathematic application question is to learn a model, read the token of the P as input, and predict the corresponding expression through the semantic recognition and reasoning
The embodiment of the invention constructs a network model to carry out semantic recognition and reasoning on the mathematical application questions.
Training a network model, wherein training data is a pre-collected data set, and the collected data set needs to be pre-processed for ensuring the effect of the model: 1) and (3) data filtering: the embodiment of the invention mainly aims at the application scene that only one question is contained in the mathematic application questions, the answer of the question is numerical value and can be directly calculated through one mathematic expression, so that the question which only contains one mathematic expression and numerical value answer in the data set needs to be selected, and the question which lacks the expression or has two or more expressions and answers is filtered. 2) Sampling: and respectively carrying out random sampling in each data set, and selecting a subset training model of the original data set.
Furthermore, similar to human problem solving, semantic recognition and reasoning for math application problems requires machines with two key capabilities: the natural language understanding ability is used for understanding the relation among a plurality of elements in the text semantics of the title, and the logical reasoning ability is used for generating a correct mathematical expression through fine-grained logical reasoning. Most of the existing research works only focus on mathematical reasoning ability to generate expressions, but only simply model topic texts as a word sequence, only focus on the direct influence of the relative positions of words (e.g.: in FIG. 1, "ringer" behind "cm" and "than" in front of), but ignore complex structures (e.g.: the relationship between "3", "ringer" and "width") that are highly semantically related among words. This is inconsistent with the way that human beings accurately understand the subject text, and lacks the accuracy of understanding and reasoning on the deep semantics of the subject text. In fact, there are many difficulties and challenges to accurately understand the text semantics of a question and improve the reasoning effect by simulating the reading habit of human beings: (1) it is a challenging problem how to simulate the process by human beings reading each part (e.g. sentence) of the subject text to pay attention to the local detail (e.g. the value in fig. 1) and then combining it to obtain the global object (e.g. the perimeter of the rectangle); (2) human beings can easily grasp semantic dependency between words to understand the detailed description and local semantics (for example, in FIG. 1, "3" describes the degree of "length" to "width" and "range"), but the machine is difficult to understand; (3) how to convert local logic into a corresponding expression (e.g., "3 cmringer" in the title text corresponds to "+ 3" in the expression) is an important task in reasoning. (4) How to utilize the human mathematical knowledge (for example, in fig. 1, besides "length" and "width" in the title text, a rectangular perimeter formula "2 x (NUM + NUM)" is also needed, wherein "2" cannot be directly generated from the title text), which is an unsolved problem, in addition to the question text.
The network model provided by the embodiment of the invention can better simulate the human reading process, improve the semantic understanding ability and enhance the reasoning effect. The whole method of the invention is used as a network model, and a mathematical application problem and a reasoning mathematical expression are understood based on a Sequence-to-Sequence (Seq 2Seq) framework, as shown in figures 2 to 3, the network model mainly comprises three parts: a Hierarchical Encoder Module (Hierarchical Encoder Module), a syntax-dependent enhanced Clause representation Module (Dependency-enhanced class Module), and a Tree-based Decoder Module (Tree-based Decoder Module).
The hierarchical encoder module is responsible for dividing the title text into clauses and learning semantic representation of the title text from local to global, the clause representation module with enhanced grammar dependence is responsible for learning representation of each clause for the hierarchical encoder module by combining structure dependence in the clauses, the decoder module based on the tree structure generates a mathematical expression through neural network recursion based on the tree structure on the basis of the title text semantic representation, and extracts existing information and deduces additional knowledge by integrating semantic information of different hierarchies and a pointer-generation network guidance model by using a hierarchical attention mechanism, so that symbols of each node are accurately predicted. The modules are specifically introduced as follows:
the hierarchical encoder module.
The layered encoder block is shown in the upper dashed box portion of FIG. 2, giving the title sequence P ═ P1,p2,…,pnAnd (5) imitating the reading and understanding habit of human from local to global, and dividing the title text into m clauses according to commas and periodsAs a unit of local semantic representation, each clause CkIs a word subsequence of topic text P (containing a number of Ps) K is 1, …, and m is the number of clauses. Then based on sentence division, the encoder learns the semantic representation of words, sentences and the whole subject text in turn through a bottom-up method from local to whole:
1. a context-enhanced word representation.
At the Word Level (the "Word-Level" Level of FIG. 2), each Word p is addedsWord vector x mapped to equal dimensionss(the word vector is obtained by word2vec pre-training), and each word p is obtained by a bidirectional Gated loop Unit (GRU)sSemantic representation of hs:
Wherein, GRUfAnd GRUbThe gated loop units that model the preceding and following information respectively,andrespectively, a word p containing preceding information and following informationsThe semantic representation of (a) is represented,andrespectively calculate for GRUSemantic characterization and computation of the last word requiredSemantic representation of the next word required, hence semantic representation hsIn contains the word psSemantic information of the user and context information in the front and back directions.
2. Grammars rely on enhanced clause representation.
At the Clause Level (the "class-Level" Level of FIG. 2), each Clause C is extracted according to the division of the ClausekEach word in psSemantic representation of hsUsing the intra-sentence association to enhance the sentence semantics, establishing a grammar dependency tree to store the question structure, modeling the structural dependency between elements in the sentence, and combining the structural dependency relationship to obtain the sentence CkSemantic characterization ofThis stage is implemented by a syntax-dependent enhanced clause representation module, which will be described in detail laterAnd (4) explanation.
3. And associating the enhanced title text representation among the clauses.
At the topic Level ("Proble-Level" Level in FIG. 2), each clause C is based onkSemantic characterization ofSentence semantics are enhanced and representation of global semantic information corresponding to the topic text is obtained by combining semantic dependency relationships between sentences (description on different aspects of the same entity, such as the value of "width" in the first sentence and the relationship between "width" and "length" in the second sentence in fig. 1).
First, for each clause CkSemantic characterization ofObtaining position-enhanced clause representations by a learnable position-coded PE modeling an order relationship between clauses
Wherein, pe (k) is a coding vector representing the position of the kth clause, and then relevant semantic information is extracted from other clauses by a self-attention mechanism to enhance clause semantics: sentence characterization using neural network modelingIs characterized by each of the other clausesCorrelation between Sas:
Wherein, ReLU is a linear rectification function, [.]Representing a stitching operation of a plurality of vectors, T being a vector transpose operation, Wss、WsaAnd bsaAnd fitting training data by adjusting the contents of learnable matrixes/vectors in the linear module during model training for learnable parameters (W is a weight parameter and b is a bias parameter) of the linear module of the neural network.
Those skilled in the art will understand that the neural network comprises a linear module and a non-linear module, wherein the linear module comprises a weight parameter W and a bias parameter b, and the non-linear module mainly comprises a ReLU; the neural networks involved in the respective portions of the present invention may be identical or different in structure.
Characterizing from each other clause according to inter-clause correlationsExtracting related semantic information according to a certain weight
Wherein s is an index of each clause when extracting semantic information, and alphasTo extract the weight of semantic information from the s-th clause, k' is the clause index needed to calculate the weight.
By correlating semantic informationEnhanced clause CkSemantic characterization ofObtaining partial semantic information and others in clausesClause representation of global semantic information of clauses
Wherein, WsoAnd bsoIs a learnable parameter of the neural network linear module, with the last clause CmAs the representation of global semantic information corresponding to the topic textThe last clause is usually the problem part of the application topic, i.e. the solution target of the application topic.
And secondly, a sentence characterization module with enhanced grammar dependence.
The module can be understood as an independent module and can also be understood as an internal sub-module of a hierarchical encoder module, and neither understanding mode influences the implementation of the invention.
Fig. 3 is a block diagram of a syntax-dependent enhanced clause representation module. Given each clause CkAnd each word p in the clausesSemantic representation of hsEach clause C is built by a syntactic analysis toolkSyntax dependency tree T ofk={tk1,tk2,…,tkLCapturing the grammar dependency relationship among words in clauses (the upper left corner of figure 3), and a dependency tree TkIn each node tklAll represent a word p in a clauseskL k1, … kL, L being clause CkK 1-kL represent the dependency tree T of the kth clausekAn index of each node in the set; defining the parent-child relationship among the nodes, wherein the child nodes depend on the parent nodes and provide detailed description for the parent nodes; such as: "retangle" in fig. 3 depends on the parent node "perimeter", and describes the type of "perimeter".
In the embodiment of the invention, the dependency tree T is based on grammarkUsing child nodesThe details provided enhance the semantics of the parent node.
For leaf node tf(e.g., "retangle"), f ∈ [ k1, kL]Symbolizing h by the corresponding wordfNode semantic representation as enhancement
For non-leaf nodes tp(e.g., "perimeter"), p ∈ [ k1, kL]To characterize h in the semantics of the corresponding wordpInitializing node semantic representations, and extracting relevant detail description semantic enhancement node semantic representations from child nodes through an attention mechanism:
first, a non-leaf node t is calculatedpWith each of its child nodes tcEnhanced node semantic representationSemantic relevance of Sac:
Wherein Wcs、WcaAnd bcaIs a learnable parameter of a neural network linear module, based on semantic relevance SacExtracting relevant detail semantics from child nodes according to certain weight
Wherein c is the index of each child node when extracting detail semantics, alphacFor extracting the weight of detail semantic from the c sub-nodeAnd p' is the index of the child node required to compute the weight.
Fusing the detail semantics of the child nodes with the initialized node semantic representations to obtain enhanced node semantic representations
Wherein, WcoAnd bcoLearnable parameters for neural network linear modules based on a syntax dependency tree TkThe enhanced representation node semantic representation of each node is calculated according to the bottom-up sequence from the leaf node to the root node; finally, enhanced node semantic representation of the root node (e.g., "is" in FIG. 3)With local semantics of all nodes in clauses and structural information of the syntactic dependency tree, characterized by enhanced node semantics of the root nodeLocal semantic representation as clauses
And thirdly, a decoder module based on a tree structure.
The tree structure based decoder module is shown in the lower two dotted box parts of fig. 2, and its input is semantic representation of each level of the title text P. Since each valid mathematical expression can be uniquely converted into a corresponding expression tree, the decoder converts the prediction and generation equivalence of the mathematical expression into the prediction and generation of the expression tree. First, each node of a desired expression tree is predicted and generated sequentially from top to bottom recursively in order from a root node to leaf nodes by a tree structure-based method(including symbols on nodes)) Form a complete and effective expression treeThe symbol on each leaf node in the expression tree is a predicted constant or a numerical value in a title text, and the symbol on each non-leaf node is a predicted operator and represents the operation result of two child nodes; then, the symbols on each node are arranged in the order of generationIs obtained byEquivalent required prefix mathematical expression
In the embodiment of the invention, each nodeContaining the corresponding symbolsTarget vector qpAnd a context vector cp(ii) a Representation h of global semantic information corresponding to target vector of root node through topic textsInitialization, target vectors of the remaining nodes being generated by their parent nodes, context vector cpBy means of a target vector qpObtaining the symbol of the node by combining the hierarchy attention mechanismThe target vector q through the nodepAnd the context toQuantity cpPredicting, comprehensively predicting symbols of all nodes, and arranging according to sequence to obtain prefix mathematical expressionThe preferred embodiments of the above parts are as follows:
1. and generating a target vector.
Representation h of global semantic information corresponding to target vector of root node through topic textsInitializing, wherein the target vectors of the other nodes are generated by the father node of the other nodes, and the process of generating the target vectors of the child nodes by the father node is as follows: for any parent nodeThe target vector is denoted as qpConsidering the order of node generation, qpHas already been made byIs generated. The decoder first bases on the target vector qpGenerating a nodeContext vector c ofPAnd predict the sign thereof(this process will be described below) if the predicted signIs an operator, then generateAnd performing target decomposition (called node expansion) according to the target vector qpGenerating target vectors for two child nodes: firstly, according to nodes through a gated neural networkTarget vector q ofpContext vector cpSymbol, symbolGenerating a target vector q of the left child nodelSymbol prediction and expansion of the left child node are carried out, and the process is carried out recursively; after the left subtree is completely predicted, generating a semantic representation vector t of a part of mathematical expression corresponding to the left subtree by using all nodes on the left subtree in a bottom-up mode through a gated neural networkl,tlGenerating a target vector for a parent node right child node; then according toTarget vector qpContext vector cpSymbol, symbolSemantic representation vector t of word vector and left subtreelGeneration of target vector q of right child node by gated neural networkrAnd predicting and expanding the right child node. When the symbol predicted by a node is not an operator (constant or numerical value), the target decomposition process on the node is finished, the node can be corresponding to a left sub-tree of a certain father node (if the node is a left sub-node, the node corresponds to a left sub-tree of the father node of the node, if the node is a right sub-node, the node corresponds to a left sub-tree of a higher-level father node), and all predictions of the left sub-tree are finished, the father node is positioned through a backtracking target decomposition process, a semantic representation of the left sub-tree is established according to the process, and the prediction and expansion of the right sub-tree of the father node are started.
2. And extracting the context.
Node-based target vector qpExtracting related local semantic information from semantic representations of different levels of a title through a Hierarchical Attention mechanism (Hierarchical Attention) to obtain a context vector c of comprehensive related clauses and word meaning informationp。
First, a target vector q is evaluatedpIs characterized by each clause in the subject textCorrelation of (2)c:
Wherein, Wsc、WacAnd bacIs a learnable parameter of the neural network linear model, and then, the target vector q is evaluatedpWith each clause C of the subject textkSemantic representation h of each word ink,t(semantic representations of words are derived by the word layer of the hierarchical encoder module, as indicated by the arrows in FIG. 2) relevance Sw:
Wherein, Wsw、WawAnd bawIs a learnable parameter of a neural network linear module based on a target vector qpWith each clause CkThe semantic representation of the related clauses and words is extracted from different clauses by different weights to generate a context vector c integrating the information of the related clauses and the semantic representation of the wordsp:
Wherein the content of the first and second substances,to characterize from the kth clauseThe weight of the semantic information is extracted,characterising h for the t word from the kth clausek,tWherein k is the index of the clause when semantic information is extracted from each clause, k' is the clause index required for calculating the weight, and t is the clause CkThe term index when extracting semantic information from each term, t' is a clause C required for calculating the weightkThe word index in (1) is.
3. And (4) symbol prediction.
Node-based target vector qpAnd a context vector cpPredictive symbolIn particular, a node-based target vector qpAnd a context vector c for predicting the symbol of the node by copying the numerical variables (e.g. "3" and "4" appearing in both the Problem and Expression parts in FIG. 1) in the subject text through a Pointer-generating network (Pointer generator) while inferring additional knowledge (constants and operators in the external symbol set, e.g. "2" and "x" appearing only in the Expression part)
First, a symbol to be predictedPossibly a numerical variable in the subject text or a constant or operator in an external symbol set, and calculating a value probability P for each possible value of the symbol to be predicted by calculating the following three probabilitiesc:
Calculating a prediction to beSymbolProbability of being a constant or operator in the outer symbol set:
Pgen=σ(Wpg[qp,cp]+bpg)
in the first case: if the symbol to be predicted isIs a numerical variable from the subject text, calculating the symbol to be predicted in this caseProbability distribution of (2):
in the second case: if the symbol to be predicted isIs a constant or operator from an external set of symbols, the symbol to be predicted in this case being calculatedProbability distribution of (2):
wherein the content of the first and second substances,is the symbol to be predicted in the first caseIs used to characterize the word that is possible to take,is the symbol to be predicted in the second caseA possible valued word vector of, Wpg、bpg、Wps、Wpa、bpa、Wgs、WgaAnd bgaIs a learnable parameter of the neural network linear module;
the model being a symbol to be predictedRespectively calculates a probability P for each possible symbol value (problem variable, constant and operator)cDuring training, the model adjusts learnable parameters to maximize the probability P of correct symbol values (i.e., true symbols in the training data) yc(y) making it more likely to be selected in the prediction; probability of taking value in prediction according to all possible symbolsPredicting the sign of a nodeThe symbol with the highest probability is generally selected or one symbol is selected according to probability using a bundle search method.
4. And generating an expression.
After all the nodes of the expression tree are predicted, the symbols of all the predicted nodes are synthesized, and the mathematical expression in the form of prefix expression is generated according to the positions of the nodes or the sequence of node generation
The above is the main principle of the above scheme of the embodiment of the present invention, and in addition, the target expression E corresponding to the topic text P and the data set is pointed outP={y1,y2,…,yGAnd training a model corresponding to the whole method by minimizing the following loss function L:
the penalty function represents the sum of the negative logarithms of the probabilities of the model-computed true signs at each location of the target expression, where each y represents the true sign in one of the target expressions, Pc(yg|y1,y2,…,yg-1P) denotes the first g-1 symbols y in a given title text P1,y2,…,yg-1In the case of (1), the true symbol y calculated by the model at the g-th positiongThe objective of training is therefore to maximize the probability of the correct symbol (i.e., the true symbol in the training data), making it more likely to be selected in the prediction. When using trained model prediction, only the topic text P is used and the symbols are predicted with probabilities calculated from the modelAnd generating a mathematical expression, and simultaneously evaluating the accuracy of model prediction by using the corresponding real answers in the data set.
Those skilled in the art will appreciate that the training phase may optimize learnable parameters in the model, such as the weight parameter W and the bias parameter b in each of the neural network linear modules mentioned above, according to a loss function.
According to the scheme, information in the topic text is extracted through the tree structure, local semantics and global semantics of the topic text are fused, and the semantic understanding capability and the reasoning capability of the method are enhanced; in addition, semantic vectors generated by the encoder can provide certain technical support for practices in the intelligent education field such as better test question representation, automatic test question labeling, personalized test question recommendation and the like for a plurality of education platforms, and certain potential economic benefits can be brought.
Another embodiment of the present invention further provides a hierarchical mathematic application topic semantic recognition and inference system with enhanced grammar dependence, as shown in fig. 4, which mainly includes: a processing device and a display device; wherein:
the processing equipment adopts the method of any one of claims 1 to 9 to carry out semantic recognition and reasoning on the mathematical application questions;
the display equipment is used for displaying results obtained at each stage in the reasoning and solving process of the mathematical application problem.
In the embodiment of the present invention, the display device may be a touch screen, which not only can display results obtained at each stage, but also can output a control command to the processing device, for example, click a relevant button to perform data preprocessing, or click a relevant button to control a model to work.
In the embodiment of the present invention, the semantic recognition and inference process related to the processing device has been described in detail in the previous embodiment, and therefore, the description thereof is omitted.
It will be clear to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional modules is merely used as an example, and in practical applications, the above function distribution may be performed by different functional modules according to needs, that is, the internal structure of the system is divided into different functional modules to perform all or part of the above described functions.
Through the above description of the embodiments, it is clear to those skilled in the art that the above embodiments can be implemented by software, and can also be implemented by software plus a necessary general hardware platform. With this understanding, the technical solutions of the embodiments can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.), and includes several instructions for enabling a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods according to the embodiments of the present invention.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (10)
1. A semantic recognition and inference method for a mathematic application topic with enhanced grammar dependence is characterized by comprising the following steps:
dividing a topic text into a plurality of clauses, learning semantic information representation of the topic from local to global by adopting a hierarchical word-clause-topic encoder, enhancing clause semantics by utilizing intra-clause association, establishing a syntax dependency tree to store a topic text structure, modeling structural dependency among elements in the clauses, and finally obtaining representation of global semantic information corresponding to the topic text;
and recursively generating a mathematical expression corresponding to the title text by combining a decoder based on a tree structure with the representation of the global semantic information.
2. The method of claim 1, wherein the dividing of the topic text into a plurality of clauses, and learning the semantic information representation of the topic from local to global using a hierarchical word-clause-topic encoder comprises:
for topic text P, the sequence represented as n words P ═ P1,p2,...,pnWherein p issAre all a word, s 1.., n; dividing title P into m clauses according to comma and sentence numberEach clause CkComprising a number of words psK is 1,., m, m is the number of clauses;
at the word level, each word psWord vector x mapped to equal dimensionssAnd obtaining each word p by a bidirectional gating cyclic unitsSemantic representation of hs:
Wherein, GRUfAnd GRUbThe gated loop units that model the preceding and following information respectively,andrespectively, a word p containing preceding information and following informationsThe semantic representation of (a) is represented,andrespectively for gated loop unit calculationSemantic representation of the last word and semantic representation of the next word, semantic representation hsIn (a) contains psSemantic information of the user and context information in the front and back directions;
at the sentence level, according to the scoreDividing sentences to extract each clause CkEach word in psSemantic representation of hsBuilding a grammar dependency tree to store the question structure and modeling the structural dependency among elements in the clause by enhancing the clause semantics through the association in the clause, and obtaining the clause C by combining the structural dependency relationshipkSemantic characterization of
At the topic level, based on each clause CkSemantic characterization ofEnhancing sentence semantics by combining semantic dependency relations among the sentences and obtaining the representation of global semantic information corresponding to the subject text; first, for each clause CkSemantic characterization ofObtaining position-enhanced clause representations by a learnable position-coded PE modeling an order relationship between clauses
Wherein, pe (k) is a coding vector representing the position of the kth clause;
then, relevant semantic information is extracted from other clauses through a self-attention mechanism to enhance clause semantics: modeling inter-sentence characterizationIs characterized by each of the other clausesCorrelation of (2)Sas:
Wherein, s is 1.. multidot.m; ReLU is a linear rectification function, [.]Representing a splicing operation of vectors, T being a vector transpose operation, Wss、WsaAnd bsaIs a learnable parameter;
extracting relevant semantic information from other clauses according to the relativity between clauses
Wherein s is an index of each clause when extracting semantic information, and alphasExtracting the weight of semantic information from the s-th clause, wherein k' is a clause index required by weight calculation;
by correlating semantic informationEnhanced clause CkSemantic characterization ofSentence representation for obtaining global semantic information including local semantic information in sentences and other sentences
Wherein, WsoAnd bsoIs a learnable parameter;
3. The method for recognizing and reasoning semantics of mathematic application questions with enhanced grammar dependence according to claim 1 or 2, characterized in that sentence semantics are enhanced by using intra-sentence association, a grammar dependence tree is established to store a question structure and model structural dependence among elements in a sentence, and a sentence C is obtained by combining the structural dependence relationshipkThe semantic representations of (a) include:
given each clause CkAnd each word p in the clausesSemantic representation of hsEach clause C is built by a syntactic analysis toolkSyntax dependency tree T ofk={tk1,tk2,...,tkLCapturing the grammar dependency relationship among words in the clause, and a dependency tree TkIn each node tklAll represent a word p in a clauseskL k1kThe number of words of; defining the parent-child relationship among the nodes, wherein the child nodes depend on the parent nodes and provide detailed description for the parent nodes;
for leaf node tf,f∈[k1,kL]Symbolizing h by the corresponding wordfNode semantic representation as enhancement
For non-leaf nodes tp,p∈[k1,kL]To characterize h in the semantics of the corresponding wordpInitializing node semantic representations, and extracting relevant detail description semantic enhancement node semantic representations from child nodes through an attention mechanism:
first, a non-leaf node t is calculatedpWith each of its child nodes tcEnhanced node semantic representationSemantic relevance of Sac:
Wherein, Wcs、WcaAnd bcaIs a learnable parameter;
Wherein c is the index of each child node when extracting detail semantics, alphacExtracting the weight of detail semantics from the c-th child node, wherein p' is the index of the child node required for calculating the weight;
fusing the detail semantics of the child nodes with the initialized node semantic representations to obtain enhanced node semantic representations
Wherein, WcoAnd bcoIs a learnable parameter;
based on the syntax dependency tree TkThe enhanced representation node semantic representation of each node is calculated according to the bottom-up sequence from the leaf node to the root node; finally, enhanced node semantic representation of root nodesWith local semantics of all nodes in clauses and structural information of the syntactic dependency tree, characterized by enhanced node semantics of the root nodeLocal semantic representation as clauses
4. The syntax dependence enhanced mathematical application topic semantic recognition and inference method according to claim 1, wherein the recursively generating the mathematical expression corresponding to the topic text by using a tree structure-based decoder in combination with the representation of the global semantic information and the local semantic representations of the respective clauses comprises:
sequentially predicting and generating each node of the required expression tree from top to bottom recursively in order from the root node to the leaf nodes by a tree structure-based methodComposing a complete and efficient expression treeThe symbol on each leaf node in the expression tree is a predicted constant or a numerical value in a title text, and the symbol on each non-leaf node is a predicted operator and represents the operation result of two child nodes; arranged according to the generated orderEach node of the columnSymbol ofGet and expression treeEquivalent required prefix mathematical expression
5. The semantic recognition and inference method for mathematics application questions with enhanced grammar dependence of claim 4, characterized in that each nodeContaining the corresponding symbolsTarget vector qpAnd a context vector cp(ii) a Representation h of global semantic information corresponding to target vector of root node through topic textsInitialization, target vectors of the remaining nodes being generated by their parent nodes, context vector cpBy means of a target vector qpObtaining the symbol of the node by combining the hierarchy attention mechanismThe target vector q through the nodepAnd a context vector cpAnd (6) obtaining a prediction.
6. The semantic recognition and inference method for math application problems with grammar dependency enhancement as claimed in claim 5, wherein the manner of generating node target vectors comprises:
note the bookFor a parent node, the target vector is denoted as qpFrom the target vector qpGenerating a context vector cPAnd predict the sign thereofIf the predicted signIf the operator is an operator, generating two child nodes and carrying out target decomposition, wherein the target decomposition is carried out according to a target vector qpGenerating target vectors of two child nodes; firstly, a target vector q according to nodes is obtained through a gated neural networkpContext vector cpSymbol, symbolGenerating a target vector q of the left child nodelPerforming symbol prediction and target decomposition of the left child node, performing symbol prediction and target decomposition of the left child node recursively to obtain a left sub-tree, and generating a semantic representation vector t of a part of mathematical expression corresponding to the left sub-tree in a bottom-up manner through a gated neural networklThen according to the target vector qpContext vector cpSymbol, symbolWord vector and left sub-tree semantic representation vector tlGeneration of target vector q of right child node by gated neural networkrPredicting and decomposing the target of the right child node; when the symbol predicted by the node is not an operator, the target decomposition of the corresponding left sub-tree is ended, and the target decomposition of the right sub-tree is started.
7. The method of claim 5 or 6 wherein said syntax dependent enhanced mathematical application problem semantic recognition and inferenceThe method is characterized in that the target vector q is based on nodespExtracting related local semantic information from semantic representations of different levels of the title through a level attention mechanism to obtain a context vector c integrating related clauses and word meaning informationpThe method comprises the following steps:
evaluating a target vector qpIs characterized by each clause in the subject textCorrelation of (2)c:
Wherein, Wsc、WacAnd bacIs a learnable parameter;
evaluating a target vector qpWith each clause C of the subject textkSemantic representation h of each word ink,tCorrelation of (2)w:
Wherein, Wsw、WawAnd bawIs a learnable parameter;
based on the target vector qpWith each clause CkThe semantic representation of the related clauses and words is extracted from different clauses by different weights to generate a context vector c integrating the information of the related clauses and the semantic representation of the wordsp:
Wherein the content of the first and second substances,to characterize from the kth clauseThe weight of the semantic information is extracted,characterising h for the t word from the kth clausek,tWherein k is the index of the clause when semantic information is extracted from each clause, k' is the clause index required for calculating the weight, and t is the clause CkThe term index when extracting semantic information from each term, t' is a clause C required for calculating the weightkThe word index in (1).
8. The semantic recognition and inference method for mathematics application questions with grammar dependence enhancement as claimed in claim 5 or 6, characterized by that, the target vector q based on node is qpAnd a context vector c, a predicted symbolThe method comprises the following steps:
node-based target vector qpContext vector c, numerical variables in the network copy topic text are generated through pointers, and the symbols of the nodes are predicted by reasoning additional knowledge
First, a symbol to be predictedCalculating a value probability P for each possible value of the symbol to be predicted for the value in the subject text or the constant or operator in the external symbol set by calculating the following three probabilitiesc:
Computing symbols to be predictedProbability of being a constant or operator in the outer symbol set:
Pgen=σ(Wpg[qp,c]+bpg)
in the first case: if the symbol to be predicted isIs a numerical variable from the subject text, calculates the symbol to be predictedProbability distribution of (2):
in the second case: if the symbol to be predicted isIs a constant or operator from an external symbol set, calculates the symbol to be predictedProbability distribution of (2):
wherein the content of the first and second substances,is the symbol to be predicted in the first caseIs used to characterize the word that is possible to take,is the symbol to be predicted in the second caseA possible valued word vector of, Wpg、bpg、Wps、Wpa、bpa、Wgs、WgaAnd bgaIs a learnable parameter of the neural network linear module;
9. The semantic recognition and inference method for mathematics application questions with enhanced grammar dependence as claimed in claim 1, characterized in that, aiming at the question P and the corresponding target expression EP={y1,y2,...,yGThe corresponding model of the whole method is obtained by minimizing the following lossTraining a loss function L:
the penalty function represents the sum of the negative logarithms of the probabilities of the model-computed true signs at each location of the target expression, where each y represents a true sign in one of the target expressions, Pc(yg|y1,y2,...,yg-1P) denotes the first g-1 symbols y in a given title text P1,y2,...,yg-1In the case of (1), the true symbol y calculated by the model at the g-th positiongThe probability of (d);
10. A syntax dependence enhanced mathematics application topic semantic recognition and reasoning system is characterized by comprising a processing device and a display device; wherein:
the processing equipment adopts the method of any one of claims 1 to 9 to carry out semantic recognition and reasoning on the mathematical application questions;
the display equipment is used for displaying results obtained at each stage in the reasoning and solving process of the mathematical application problem.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011517409.2A CN112613323A (en) | 2020-12-21 | 2020-12-21 | Grammar dependence enhanced mathematic application topic semantic recognition and inference method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011517409.2A CN112613323A (en) | 2020-12-21 | 2020-12-21 | Grammar dependence enhanced mathematic application topic semantic recognition and inference method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112613323A true CN112613323A (en) | 2021-04-06 |
Family
ID=75243727
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011517409.2A Pending CN112613323A (en) | 2020-12-21 | 2020-12-21 | Grammar dependence enhanced mathematic application topic semantic recognition and inference method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112613323A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113139657A (en) * | 2021-04-08 | 2021-07-20 | 北京泰豪智能工程有限公司 | Method and device for realizing machine thinking |
CN113257063A (en) * | 2021-06-08 | 2021-08-13 | 北京字节跳动网络技术有限公司 | Interaction method and terminal equipment |
CN113420543A (en) * | 2021-05-11 | 2021-09-21 | 江苏大学 | Automatic mathematical test question labeling method based on improved Seq2Seq model |
CN113553835A (en) * | 2021-08-11 | 2021-10-26 | 桂林电子科技大学 | Method for automatically correcting sentence grammar errors in English text |
CN115049062A (en) * | 2022-08-16 | 2022-09-13 | 中国科学技术大学 | Intelligent problem solving method and system for mathematic application problem based on knowledge learning |
CN116680502A (en) * | 2023-08-02 | 2023-09-01 | 中国科学技术大学 | Intelligent solving method, system, equipment and storage medium for mathematics application questions |
CN117033847A (en) * | 2023-07-20 | 2023-11-10 | 华中师范大学 | Mathematical application problem solving method and system based on hierarchical recursive tree decoding model |
-
2020
- 2020-12-21 CN CN202011517409.2A patent/CN112613323A/en active Pending
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113139657B (en) * | 2021-04-08 | 2024-03-29 | 北京泰豪智能工程有限公司 | Machine thinking realization method and device |
CN113139657A (en) * | 2021-04-08 | 2021-07-20 | 北京泰豪智能工程有限公司 | Method and device for realizing machine thinking |
CN113420543B (en) * | 2021-05-11 | 2024-03-22 | 江苏大学 | Mathematical test question automatic labeling method based on improved Seq2Seq model |
CN113420543A (en) * | 2021-05-11 | 2021-09-21 | 江苏大学 | Automatic mathematical test question labeling method based on improved Seq2Seq model |
CN113257063A (en) * | 2021-06-08 | 2021-08-13 | 北京字节跳动网络技术有限公司 | Interaction method and terminal equipment |
CN113553835A (en) * | 2021-08-11 | 2021-10-26 | 桂林电子科技大学 | Method for automatically correcting sentence grammar errors in English text |
CN113553835B (en) * | 2021-08-11 | 2022-12-09 | 桂林电子科技大学 | Method for automatically correcting sentence grammar errors in English text |
CN115049062B (en) * | 2022-08-16 | 2022-12-30 | 中国科学技术大学 | Intelligent problem solving method and system for mathematic application problem based on knowledge learning |
CN115049062A (en) * | 2022-08-16 | 2022-09-13 | 中国科学技术大学 | Intelligent problem solving method and system for mathematic application problem based on knowledge learning |
CN117033847A (en) * | 2023-07-20 | 2023-11-10 | 华中师范大学 | Mathematical application problem solving method and system based on hierarchical recursive tree decoding model |
CN117033847B (en) * | 2023-07-20 | 2024-04-19 | 华中师范大学 | Mathematical application problem solving method and system based on hierarchical recursive tree decoding model |
CN116680502B (en) * | 2023-08-02 | 2023-11-28 | 中国科学技术大学 | Intelligent solving method, system, equipment and storage medium for mathematics application questions |
CN116680502A (en) * | 2023-08-02 | 2023-09-01 | 中国科学技术大学 | Intelligent solving method, system, equipment and storage medium for mathematics application questions |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112613323A (en) | Grammar dependence enhanced mathematic application topic semantic recognition and inference method and system | |
Martins et al. | Findings on teaching machine learning in high school: A ten-year systematic literature review | |
Tong et al. | Exercise hierarchical feature enhanced knowledge tracing | |
CN107633111A (en) | A kind of situation simulation support system towards mathematical education | |
CN115357719A (en) | Power audit text classification method and device based on improved BERT model | |
CN110888989A (en) | Intelligent learning platform and construction method thereof | |
CN113591482A (en) | Text generation method, device, equipment and computer readable storage medium | |
CN110765241B (en) | Super-outline detection method and device for recommendation questions, electronic equipment and storage medium | |
CN115114974A (en) | Model distillation method, device, computer equipment and storage medium | |
Romero et al. | Conceptualizing the e-learning assessment domain using an ontology network | |
Xie et al. | Virtual reality primary school mathematics teaching system based on GIS data fusion | |
Sein | Conceptual models in training novice users of computer systems: effectiveness of abstract vs. analogical models and influence of individual differences | |
CN111897955A (en) | Comment generation method, device and equipment based on coding and decoding and storage medium | |
Zhang et al. | Knowledge tracing with exercise-enhanced key-value memory networks | |
CN113157932B (en) | Metaphor calculation and device based on knowledge graph representation learning | |
Bod | The data-oriented parsing approach: Theory and application | |
Arnicans et al. | Transformation of the software testing glossary into a browsable concept map | |
CN113821610A (en) | Information matching method, device, equipment and storage medium | |
Luo et al. | Dagkt: Difficulty and attempts boosted graph-based knowledge tracing | |
Chetoui et al. | Course recommendation model based on Knowledge Graph Embedding | |
Jantke et al. | Decision Support By Learning-On-Demand. | |
Wang et al. | Textbook Enhanced Student Learning Outcome Prediction | |
Khandait et al. | Refined-Para Forming Question Generation System Using Lamma | |
Ilkou | EduKnow: A Framework for Structuring Educational Material | |
Geeganage et al. | Sentence based mathematical problem solving approach via ontology modeling |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |