CN112613323A

CN112613323A - Grammar dependence enhanced mathematic application topic semantic recognition and inference method and system

Info

Publication number: CN112613323A
Application number: CN202011517409.2A
Authority: CN
Inventors: 陈恩红; 刘淇; 林鑫; 黄振亚; 王皓
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2020-12-21
Filing date: 2020-12-21
Publication date: 2021-04-06

Abstract

The invention discloses a method and a system for recognizing and reasoning semantics of a mathematic application question with enhanced grammar dependence. In addition, semantic vectors generated by the encoder can provide certain technical support for practices in the intelligent education field such as better test question representation, automatic test question labeling, personalized test question recommendation and the like for a plurality of education platforms, and certain potential economic benefits can be brought.

Description

Grammar dependence enhanced mathematic application topic semantic recognition and inference method and system

Technical Field

The invention relates to the technical field of machine learning, artificial intelligence and intelligent education, in particular to a method and a system for recognizing and reasoning semantics of a mathematic application problem with enhanced grammar dependence.

Background

In intelligent education, automatic reasoning and solving of mathematical application problems is a challenging task. The math application topic comprises a story described in a natural language and an associated math topic. The problem of the mathematic application problem is given by natural language, and the answer is obtained by mathematic reasoning, so that the algorithm for solving the mathematic application problem is required to have semantic understanding capability and mathematic reasoning capability on the natural language.

In the existing method, when reasoning is carried out on a mathematic application question, the question is converted into a mathematical expression which can be calculated and solved by combining with a logic structure of the mathematical expression. However, these works only focus on how to make a computer perform mathematical reasoning, but do not deeply understand the semantics of the topic text, neglect the syntax dependency relationships (e.g., the number and the modified entities) among the elements in the topic, and are difficult to accurately identify the details of the mathematical application topic, and cannot effectively describe the semantics and details of the topic, resulting in poor effect in practical application of intelligent education.

Disclosure of Invention

The invention aims to provide a method and a system for recognizing and reasoning the semantics of a mathematic application question with enhanced grammar dependence, which can accurately recognize the details of the mathematic application question, effectively describe the semantics and the details of the question and improve the effect in the practical application of intelligent education.

The purpose of the invention is realized by the following technical scheme:

a semantic recognition and inference method for a mathematic application topic with enhanced grammar dependence comprises the following steps:

dividing a topic text into a plurality of clauses, learning semantic information representation of the topic from local to global by adopting a hierarchical word-clause-topic encoder, enhancing clause semantics by utilizing intra-clause association, establishing a syntax dependency tree to store a topic text structure, modeling structural dependency among elements in the clauses, and finally obtaining representation of global semantic information corresponding to the topic text;

and recursively generating a mathematical expression corresponding to the title text by combining a decoder based on a tree structure with the representation of the global semantic information.

Compared with the traditional serialization processing method, the technical scheme provided by the invention can more effectively describe the semantics and details of the questions, thereby improving the reasoning accuracy of the model.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.

FIG. 1 is a diagram illustrating a mathematical problem according to an embodiment of the present invention;

FIG. 2 is a frame diagram of a semantic recognition and inference method for mathematics application problems with enhanced grammar dependence according to an embodiment of the present invention;

FIG. 3 is a block diagram of a syntax dependency enhancement clause representation module according to an embodiment of the present invention;

fig. 4 is a schematic diagram of a semantic recognition and inference system for math application questions with enhanced grammar dependence according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the invention provides a semantic recognition and reasoning method for a mathematic application topic with enhanced grammar dependence, which mainly comprises the following steps:

dividing a topic text into a plurality of clauses, learning semantic information representation of the topic from local to global by adopting a hierarchical word-clause-topic encoder, enhancing the clause semantics by utilizing intra-clause association, establishing a syntax dependency tree to store a topic structure, modeling the structural dependency among elements in the clauses, and finally obtaining the representation of the global semantic information corresponding to the topic text;

In the scheme of the embodiment of the invention, clauses are taken as semantic units, a syntax dependent tree is used for expressing a sentence structure and modeling semantic information of a question text in a layering manner, and a recursion sequence model based on the tree structure is used for capturing the structural dependence relationship among elements (such as numbers and operators) of a mathematical expression. The accuracy of the mathematical reasoning of the algorithm is improved by fusing the local details, the global target and the dependency relationship among all elements of the topic text.

The above scheme actually covers two phases:

the first stage is a topic understanding stage: the reading and understanding habits of human beings are simulated, the topic text is divided into a plurality of clauses, and then a hierarchical word-clause-topic encoder is designed to learn the semantic information representation of the topic text from local to global. Furthermore, in order to enhance sentence semantics by using intra-sentence association, in a sentence understanding stage, a grammar dependency tree is established to store a question text structure and a module based on a dependency tree is designed to model structural dependency among elements in a sentence.

The second phase is the inference phase: a decoder based on a tree structure recursively generates a mathematical expression, firstly, a hierarchical attention mechanism is utilized to explicitly synthesize information enhancement semantics of different levels and extract related context information, and then a pointer-generation network guidance model is utilized to extract existing information and infer additional knowledge to predict each position symbol.

For the sake of understanding, the above-described scheme will be described in detail with reference to the accompanying drawings.

In the practice of the present invention, a mathematical application problem in a broad sense is used as the data set. FIG. 1 is a mathematical problem description diagram; the text content of the title needs to contain natural text (such as words) for describing title information and numerical values (such as "3" and "4" in fig. 1) for describing quantity information, each piece of data comprises title text (such as "recipe" part in fig. 1), mathematical expressions (such as "Expression" part in fig. 1) and numerical answers (such as "Answer" part in fig. 1), wherein the title text is necessary for the scheme, the mathematical expressions are only used for training of the scheme, and the numerical answers are only used for effect evaluation of the scheme, and only the title text is needed in actual use. Such data are for example the open source Math application topic data set (Math23K) published by Tencent and the open source data set (MAWPS) published by Microsoft. In addition, the input data set can also be obtained by collecting the homework or examination math question sets of primary and secondary school students through network crawling or offline.

The invention aims to model semantic information according to texts, so as to carry out deep reasoning and further generate a mathematical expression; the semantic vector in the stage can provide better services such as test question representation, test question automatic labeling, personalized test question recommendation and the like for a plurality of education platforms, and certain potential economic benefits are brought. Meanwhile, the answer may be solved according to the generated mathematical expression.

In the embodiment of the present invention, each topic text P is represented as a sequence P ═ P consisting of n words or numeric values₁,p₂,…,p_nWherein p is_sAre either a word (e.g., "length" and "width" in fig. 1) or a numerical value (e.g., "3" and "4") in the subject text, s is 1, …, n; the set of values of the topic text P is defined as N_P(e.g., {3, 4 }). Since the invention is not concerned with the specific values of these numbers, all values can be mapped to the special word "NUM", and thus P in the sequence P described above_sMay be understood as a word.

Expression E corresponding to topic P_PCan be defined as one composed of G operators, constants or questionsSequence E of numerical values in the target text_P＝{y₁,y₂,…,y_GWherein each y_iSet of decoded symbols V all of title P_PAn element of (1). Set of symbols V_PSet of operators V_O(e.g., { +, -, ×, ÷ div }), and a set of constants V_C(e.g., {1,2, π }) and a set of values N_PComposition, defined as V_P＝V_O∪V_C∪N_P(e.g., { +, -, ×, div, 1,2, π,3,4}), due to N per topic_PIn contrast, V for each topic_PAnd also different.

Given the input sequence of the question P, the goal of the semantic recognition and reasoning of the mathematic application question is to learn a model, read the token of the P as input, and predict the corresponding expression through the semantic recognition and reasoning

The embodiment of the invention constructs a network model to carry out semantic recognition and reasoning on the mathematical application questions.

Training a network model, wherein training data is a pre-collected data set, and the collected data set needs to be pre-processed for ensuring the effect of the model: 1) and (3) data filtering: the embodiment of the invention mainly aims at the application scene that only one question is contained in the mathematic application questions, the answer of the question is numerical value and can be directly calculated through one mathematic expression, so that the question which only contains one mathematic expression and numerical value answer in the data set needs to be selected, and the question which lacks the expression or has two or more expressions and answers is filtered. 2) Sampling: and respectively carrying out random sampling in each data set, and selecting a subset training model of the original data set.

Furthermore, similar to human problem solving, semantic recognition and reasoning for math application problems requires machines with two key capabilities: the natural language understanding ability is used for understanding the relation among a plurality of elements in the text semantics of the title, and the logical reasoning ability is used for generating a correct mathematical expression through fine-grained logical reasoning. Most of the existing research works only focus on mathematical reasoning ability to generate expressions, but only simply model topic texts as a word sequence, only focus on the direct influence of the relative positions of words (e.g.: in FIG. 1, "ringer" behind "cm" and "than" in front of), but ignore complex structures (e.g.: the relationship between "3", "ringer" and "width") that are highly semantically related among words. This is inconsistent with the way that human beings accurately understand the subject text, and lacks the accuracy of understanding and reasoning on the deep semantics of the subject text. In fact, there are many difficulties and challenges to accurately understand the text semantics of a question and improve the reasoning effect by simulating the reading habit of human beings: (1) it is a challenging problem how to simulate the process by human beings reading each part (e.g. sentence) of the subject text to pay attention to the local detail (e.g. the value in fig. 1) and then combining it to obtain the global object (e.g. the perimeter of the rectangle); (2) human beings can easily grasp semantic dependency between words to understand the detailed description and local semantics (for example, in FIG. 1, "3" describes the degree of "length" to "width" and "range"), but the machine is difficult to understand; (3) how to convert local logic into a corresponding expression (e.g., "3 cmringer" in the title text corresponds to "+ 3" in the expression) is an important task in reasoning. (4) How to utilize the human mathematical knowledge (for example, in fig. 1, besides "length" and "width" in the title text, a rectangular perimeter formula "2 x (NUM + NUM)" is also needed, wherein "2" cannot be directly generated from the title text), which is an unsolved problem, in addition to the question text.

The network model provided by the embodiment of the invention can better simulate the human reading process, improve the semantic understanding ability and enhance the reasoning effect. The whole method of the invention is used as a network model, and a mathematical application problem and a reasoning mathematical expression are understood based on a Sequence-to-Sequence (Seq 2Seq) framework, as shown in figures 2 to 3, the network model mainly comprises three parts: a Hierarchical Encoder Module (Hierarchical Encoder Module), a syntax-dependent enhanced Clause representation Module (Dependency-enhanced class Module), and a Tree-based Decoder Module (Tree-based Decoder Module).

The hierarchical encoder module is responsible for dividing the title text into clauses and learning semantic representation of the title text from local to global, the clause representation module with enhanced grammar dependence is responsible for learning representation of each clause for the hierarchical encoder module by combining structure dependence in the clauses, the decoder module based on the tree structure generates a mathematical expression through neural network recursion based on the tree structure on the basis of the title text semantic representation, and extracts existing information and deduces additional knowledge by integrating semantic information of different hierarchies and a pointer-generation network guidance model by using a hierarchical attention mechanism, so that symbols of each node are accurately predicted. The modules are specifically introduced as follows:

the hierarchical encoder module.

The layered encoder block is shown in the upper dashed box portion of FIG. 2, giving the title sequence P ═ P₁,p₂,…,p_nAnd (5) imitating the reading and understanding habit of human from local to global, and dividing the title text into m clauses according to commas and periods

As a unit of local semantic representation, each clause C_kIs a word subsequence of topic text P (containing a number of P_s) K is 1, …, and m is the number of clauses. Then based on sentence division, the encoder learns the semantic representation of words, sentences and the whole subject text in turn through a bottom-up method from local to whole:

1. a context-enhanced word representation.

At the Word Level (the "Word-Level" Level of FIG. 2), each Word p is added_sWord vector x mapped to equal dimensions_s(the word vector is obtained by word2vec pre-training), and each word p is obtained by a bidirectional Gated loop Unit (GRU)_sSemantic representation of h_s：

Wherein, GRU_fAnd GRU_bThe gated loop units that model the preceding and following information respectively,

and

respectively, a word p containing preceding information and following information_sThe semantic representation of (a) is represented,

and

respectively calculate for GRU

Semantic characterization and computation of the last word required

Semantic representation of the next word required, hence semantic representation h_sIn contains the word p_sSemantic information of the user and context information in the front and back directions.

2. Grammars rely on enhanced clause representation.

At the Clause Level (the "class-Level" Level of FIG. 2), each Clause C is extracted according to the division of the Clause_kEach word in p_sSemantic representation of h_sUsing the intra-sentence association to enhance the sentence semantics, establishing a grammar dependency tree to store the question structure, modeling the structural dependency between elements in the sentence, and combining the structural dependency relationship to obtain the sentence C_kSemantic characterization of

This stage is implemented by a syntax-dependent enhanced clause representation module, which will be described in detail laterAnd (4) explanation.

3. And associating the enhanced title text representation among the clauses.

At the topic Level ("Proble-Level" Level in FIG. 2), each clause C is based on_kSemantic characterization of

Sentence semantics are enhanced and representation of global semantic information corresponding to the topic text is obtained by combining semantic dependency relationships between sentences (description on different aspects of the same entity, such as the value of "width" in the first sentence and the relationship between "width" and "length" in the second sentence in fig. 1).

First, for each clause C_kSemantic characterization of

Obtaining position-enhanced clause representations by a learnable position-coded PE modeling an order relationship between clauses

Wherein, pe (k) is a coding vector representing the position of the kth clause, and then relevant semantic information is extracted from other clauses by a self-attention mechanism to enhance clause semantics: sentence characterization using neural network modeling

Is characterized by each of the other clauses

Correlation between S_as：

Wherein, ReLU is a linear rectification function, [.]Representing a stitching operation of a plurality of vectors, T being a vector transpose operation, W_ss、W_saAnd b_saAnd fitting training data by adjusting the contents of learnable matrixes/vectors in the linear module during model training for learnable parameters (W is a weight parameter and b is a bias parameter) of the linear module of the neural network.

Those skilled in the art will understand that the neural network comprises a linear module and a non-linear module, wherein the linear module comprises a weight parameter W and a bias parameter b, and the non-linear module mainly comprises a ReLU; the neural networks involved in the respective portions of the present invention may be identical or different in structure.

Characterizing from each other clause according to inter-clause correlations

Extracting related semantic information according to a certain weight

Wherein s is an index of each clause when extracting semantic information, and alpha_sTo extract the weight of semantic information from the s-th clause, k' is the clause index needed to calculate the weight.

By correlating semantic information

Enhanced clause C_kSemantic characterization of

Obtaining partial semantic information and others in clausesClause representation of global semantic information of clauses

Wherein, W_soAnd b_soIs a learnable parameter of the neural network linear module, with the last clause C_mAs the representation of global semantic information corresponding to the topic text

The last clause is usually the problem part of the application topic, i.e. the solution target of the application topic.

And secondly, a sentence characterization module with enhanced grammar dependence.

The module can be understood as an independent module and can also be understood as an internal sub-module of a hierarchical encoder module, and neither understanding mode influences the implementation of the invention.

Fig. 3 is a block diagram of a syntax-dependent enhanced clause representation module. Given each clause C_kAnd each word p in the clause_sSemantic representation of h_sEach clause C is built by a syntactic analysis tool_kSyntax dependency tree T of_k＝{t_k1,t_k2,…,t_kLCapturing the grammar dependency relationship among words in clauses (the upper left corner of figure 3), and a dependency tree T_kIn each node t_klAll represent a word p in a clause_skL k1, … kL, L being clause C_kK 1-kL represent the dependency tree T of the kth clause_kAn index of each node in the set; defining the parent-child relationship among the nodes, wherein the child nodes depend on the parent nodes and provide detailed description for the parent nodes; such as: "retangle" in fig. 3 depends on the parent node "perimeter", and describes the type of "perimeter".

In the embodiment of the invention, the dependency tree T is based on grammar_kUsing child nodesThe details provided enhance the semantics of the parent node.

For leaf node t_f(e.g., "retangle"), f ∈ [ k1, kL]Symbolizing h by the corresponding word_fNode semantic representation as enhancement

For non-leaf nodes t_p(e.g., "perimeter"), p ∈ [ k1, kL]To characterize h in the semantics of the corresponding word_pInitializing node semantic representations, and extracting relevant detail description semantic enhancement node semantic representations from child nodes through an attention mechanism:

first, a non-leaf node t is calculated_pWith each of its child nodes t_cEnhanced node semantic representation

Semantic relevance of S_ac：

Wherein W_cs、W_caAnd b_caIs a learnable parameter of a neural network linear module, based on semantic relevance S_acExtracting relevant detail semantics from child nodes according to certain weight

Wherein c is the index of each child node when extracting detail semantics, alpha_cFor extracting the weight of detail semantic from the c sub-nodeAnd p' is the index of the child node required to compute the weight.

Fusing the detail semantics of the child nodes with the initialized node semantic representations to obtain enhanced node semantic representations

Wherein, W_coAnd b_coLearnable parameters for neural network linear modules based on a syntax dependency tree T_kThe enhanced representation node semantic representation of each node is calculated according to the bottom-up sequence from the leaf node to the root node; finally, enhanced node semantic representation of the root node (e.g., "is" in FIG. 3)

With local semantics of all nodes in clauses and structural information of the syntactic dependency tree, characterized by enhanced node semantics of the root node

Local semantic representation as clauses

And thirdly, a decoder module based on a tree structure.

The tree structure based decoder module is shown in the lower two dotted box parts of fig. 2, and its input is semantic representation of each level of the title text P. Since each valid mathematical expression can be uniquely converted into a corresponding expression tree, the decoder converts the prediction and generation equivalence of the mathematical expression into the prediction and generation of the expression tree. First, each node of a desired expression tree is predicted and generated sequentially from top to bottom recursively in order from a root node to leaf nodes by a tree structure-based method

(including symbols on nodes)

) Form a complete and effective expression tree

The symbol on each leaf node in the expression tree is a predicted constant or a numerical value in a title text, and the symbol on each non-leaf node is a predicted operator and represents the operation result of two child nodes; then, the symbols on each node are arranged in the order of generation

Is obtained by

Equivalent required prefix mathematical expression

In the embodiment of the invention, each node

Containing the corresponding symbols

Target vector q_pAnd a context vector c_p(ii) a Representation h of global semantic information corresponding to target vector of root node through topic text^sInitialization, target vectors of the remaining nodes being generated by their parent nodes, context vector c_pBy means of a target vector q_pObtaining the symbol of the node by combining the hierarchy attention mechanism

The target vector q through the node_pAnd the context toQuantity c_pPredicting, comprehensively predicting symbols of all nodes, and arranging according to sequence to obtain prefix mathematical expression

The preferred embodiments of the above parts are as follows:

1. and generating a target vector.

Representation h of global semantic information corresponding to target vector of root node through topic text^sInitializing, wherein the target vectors of the other nodes are generated by the father node of the other nodes, and the process of generating the target vectors of the child nodes by the father node is as follows: for any parent node

The target vector is denoted as q_pConsidering the order of node generation, q_pHas already been made by

Is generated. The decoder first bases on the target vector q_pGenerating a node

Context vector c of_PAnd predict the sign thereof

(this process will be described below) if the predicted sign

Is an operator, then generate

And performing target decomposition (called node expansion) according to the target vector q_pGenerating target vectors for two child nodes: firstly, according to nodes through a gated neural network

Target vector q of_pContext vector c_pSymbol, symbol

Generating a target vector q of the left child node_lSymbol prediction and expansion of the left child node are carried out, and the process is carried out recursively; after the left subtree is completely predicted, generating a semantic representation vector t of a part of mathematical expression corresponding to the left subtree by using all nodes on the left subtree in a bottom-up mode through a gated neural network_l，t_lGenerating a target vector for a parent node right child node; then according to

Target vector q_pContext vector c_pSymbol, symbol

Semantic representation vector t of word vector and left subtree_lGeneration of target vector q of right child node by gated neural network_rAnd predicting and expanding the right child node. When the symbol predicted by a node is not an operator (constant or numerical value), the target decomposition process on the node is finished, the node can be corresponding to a left sub-tree of a certain father node (if the node is a left sub-node, the node corresponds to a left sub-tree of the father node of the node, if the node is a right sub-node, the node corresponds to a left sub-tree of a higher-level father node), and all predictions of the left sub-tree are finished, the father node is positioned through a backtracking target decomposition process, a semantic representation of the left sub-tree is established according to the process, and the prediction and expansion of the right sub-tree of the father node are started.

2. And extracting the context.

Node-based target vector q_pExtracting related local semantic information from semantic representations of different levels of a title through a Hierarchical Attention mechanism (Hierarchical Attention) to obtain a context vector c of comprehensive related clauses and word meaning information_p。

First, a target vector q is evaluated_pIs characterized by each clause in the subject text

Correlation of (2)_c：

Wherein, W_sc、W_acAnd b_acIs a learnable parameter of the neural network linear model, and then, the target vector q is evaluated_pWith each clause C of the subject text_kSemantic representation h of each word in_k,t(semantic representations of words are derived by the word layer of the hierarchical encoder module, as indicated by the arrows in FIG. 2) relevance S_w：

Wherein, W_sw、W_awAnd b_awIs a learnable parameter of a neural network linear module based on a target vector q_pWith each clause C_kThe semantic representation of the related clauses and words is extracted from different clauses by different weights to generate a context vector c integrating the information of the related clauses and the semantic representation of the words_p：

Wherein the content of the first and second substances,

to characterize from the kth clause

The weight of the semantic information is extracted,

characterising h for the t word from the kth clause_k,tWherein k is the index of the clause when semantic information is extracted from each clause, k' is the clause index required for calculating the weight, and t is the clause C_kThe term index when extracting semantic information from each term, t' is a clause C required for calculating the weight_kThe word index in (1) is.

3. And (4) symbol prediction.

Node-based target vector q_pAnd a context vector c_pPredictive symbol

In particular, a node-based target vector q_pAnd a context vector c for predicting the symbol of the node by copying the numerical variables (e.g. "3" and "4" appearing in both the Problem and Expression parts in FIG. 1) in the subject text through a Pointer-generating network (Pointer generator) while inferring additional knowledge (constants and operators in the external symbol set, e.g. "2" and "x" appearing only in the Expression part)

First, a symbol to be predicted

Possibly a numerical variable in the subject text or a constant or operator in an external symbol set, and calculating a value probability P for each possible value of the symbol to be predicted by calculating the following three probabilities_c：

Calculating a prediction to beSymbol

Probability of being a constant or operator in the outer symbol set:

P_gen＝σ(W_pg[q_p,c_p]+b_pg)

in the first case: if the symbol to be predicted is

Is a numerical variable from the subject text, calculating the symbol to be predicted in this case

Probability distribution of (2):

in the second case: if the symbol to be predicted is

Is a constant or operator from an external set of symbols, the symbol to be predicted in this case being calculated

Probability distribution of (2):

wherein the content of the first and second substances,

is the symbol to be predicted in the first case

Is used to characterize the word that is possible to take,

is the symbol to be predicted in the second case

A possible valued word vector of, W_pg、b_pg、W_ps、W_pa、b_pa、W_gs、W_gaAnd b_gaIs a learnable parameter of the neural network linear module;

then the total probability

Comprises the following steps:

the model being a symbol to be predicted

Respectively calculates a probability P for each possible symbol value (problem variable, constant and operator)_cDuring training, the model adjusts learnable parameters to maximize the probability P of correct symbol values (i.e., true symbols in the training data) y_c(y) making it more likely to be selected in the prediction; probability of taking value in prediction according to all possible symbols

Predicting the sign of a node

The symbol with the highest probability is generally selected or one symbol is selected according to probability using a bundle search method.

4. And generating an expression.

After all the nodes of the expression tree are predicted, the symbols of all the predicted nodes are synthesized, and the mathematical expression in the form of prefix expression is generated according to the positions of the nodes or the sequence of node generation

The above is the main principle of the above scheme of the embodiment of the present invention, and in addition, the target expression E corresponding to the topic text P and the data set is pointed out_P＝{y₁,y₂,…,y_GAnd training a model corresponding to the whole method by minimizing the following loss function L:

the penalty function represents the sum of the negative logarithms of the probabilities of the model-computed true signs at each location of the target expression, where each y represents the true sign in one of the target expressions, P_c(y_g|y₁,y₂,…,y_g-1P) denotes the first g-1 symbols y in a given title text P₁,y₂,…,y_g-1In the case of (1), the true symbol y calculated by the model at the g-th position_gThe objective of training is therefore to maximize the probability of the correct symbol (i.e., the true symbol in the training data), making it more likely to be selected in the prediction. When using trained model prediction, only the topic text P is used and the symbols are predicted with probabilities calculated from the model

And generating a mathematical expression, and simultaneously evaluating the accuracy of model prediction by using the corresponding real answers in the data set.

Those skilled in the art will appreciate that the training phase may optimize learnable parameters in the model, such as the weight parameter W and the bias parameter b in each of the neural network linear modules mentioned above, according to a loss function.

According to the scheme, information in the topic text is extracted through the tree structure, local semantics and global semantics of the topic text are fused, and the semantic understanding capability and the reasoning capability of the method are enhanced; in addition, semantic vectors generated by the encoder can provide certain technical support for practices in the intelligent education field such as better test question representation, automatic test question labeling, personalized test question recommendation and the like for a plurality of education platforms, and certain potential economic benefits can be brought.

Another embodiment of the present invention further provides a hierarchical mathematic application topic semantic recognition and inference system with enhanced grammar dependence, as shown in fig. 4, which mainly includes: a processing device and a display device; wherein:

the processing equipment adopts the method of any one of claims 1 to 9 to carry out semantic recognition and reasoning on the mathematical application questions;

the display equipment is used for displaying results obtained at each stage in the reasoning and solving process of the mathematical application problem.

In the embodiment of the present invention, the display device may be a touch screen, which not only can display results obtained at each stage, but also can output a control command to the processing device, for example, click a relevant button to perform data preprocessing, or click a relevant button to control a model to work.

In the embodiment of the present invention, the semantic recognition and inference process related to the processing device has been described in detail in the previous embodiment, and therefore, the description thereof is omitted.

It will be clear to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional modules is merely used as an example, and in practical applications, the above function distribution may be performed by different functional modules according to needs, that is, the internal structure of the system is divided into different functional modules to perform all or part of the above described functions.

Through the above description of the embodiments, it is clear to those skilled in the art that the above embodiments can be implemented by software, and can also be implemented by software plus a necessary general hardware platform. With this understanding, the technical solutions of the embodiments can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.), and includes several instructions for enabling a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods according to the embodiments of the present invention.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A semantic recognition and inference method for a mathematic application topic with enhanced grammar dependence is characterized by comprising the following steps:

2. The method of claim 1, wherein the dividing of the topic text into a plurality of clauses, and learning the semantic information representation of the topic from local to global using a hierarchical word-clause-topic encoder comprises:

for topic text P, the sequence represented as n words P ═ P₁，p₂，...，p_nWherein p is_sAre all a word, s 1.., n; dividing title P into m clauses according to comma and sentence number

Each clause C_kComprising a number of words p_sK is 1,., m, m is the number of clauses;

at the word level, each word p_sWord vector x mapped to equal dimensions_sAnd obtaining each word p by a bidirectional gating cyclic unit_sSemantic representation of h_s：

and

and

respectively for gated loop unit calculation

Semantic representation of the last word and semantic representation of the next word, semantic representation h_sIn (a) contains p_sSemantic information of the user and context information in the front and back directions;

at the sentence level, according to the scoreDividing sentences to extract each clause C_kEach word in p_sSemantic representation of h_sBuilding a grammar dependency tree to store the question structure and modeling the structural dependency among elements in the clause by enhancing the clause semantics through the association in the clause, and obtaining the clause C by combining the structural dependency relationship_kSemantic characterization of

At the topic level, based on each clause C_kSemantic characterization of

Enhancing sentence semantics by combining semantic dependency relations among the sentences and obtaining the representation of global semantic information corresponding to the subject text; first, for each clause C_kSemantic characterization of

Wherein, pe (k) is a coding vector representing the position of the kth clause;

then, relevant semantic information is extracted from other clauses through a self-attention mechanism to enhance clause semantics: modeling inter-sentence characterization

Is characterized by each of the other clauses

Correlation of (2)S_as：

Wherein, s is 1.. multidot.m; ReLU is a linear rectification function, [.]Representing a splicing operation of vectors, T being a vector transpose operation, W_ss、W_saAnd b_saIs a learnable parameter;

extracting relevant semantic information from other clauses according to the relativity between clauses

Wherein s is an index of each clause when extracting semantic information, and alpha_sExtracting the weight of semantic information from the s-th clause, wherein k' is a clause index required by weight calculation;

by correlating semantic information

Enhanced clause C_kSemantic characterization of

Sentence representation for obtaining global semantic information including local semantic information in sentences and other sentences

Wherein, W_soAnd b_soIs a learnable parameter;

with the last clause C_mAs the representation of global semantic information corresponding to the topic text

3. The method for recognizing and reasoning semantics of mathematic application questions with enhanced grammar dependence according to claim 1 or 2, characterized in that sentence semantics are enhanced by using intra-sentence association, a grammar dependence tree is established to store a question structure and model structural dependence among elements in a sentence, and a sentence C is obtained by combining the structural dependence relationship_kThe semantic representations of (a) include:

given each clause C_kAnd each word p in the clause_sSemantic representation of h_sEach clause C is built by a syntactic analysis tool_kSyntax dependency tree T of_k＝{t_k1，t_k2，...，t_kLCapturing the grammar dependency relationship among words in the clause, and a dependency tree T_kIn each node t_klAll represent a word p in a clause_skL k1_kThe number of words of; defining the parent-child relationship among the nodes, wherein the child nodes depend on the parent nodes and provide detailed description for the parent nodes;

for leaf node t_f，f∈[k1，kL]Symbolizing h by the corresponding word_fNode semantic representation as enhancement

For non-leaf nodes t_p，p∈[k1，kL]To characterize h in the semantics of the corresponding word_pInitializing node semantic representations, and extracting relevant detail description semantic enhancement node semantic representations from child nodes through an attention mechanism:

Semantic relevance of S_ac：

Wherein, W_cs、W_caAnd b_caIs a learnable parameter;

according to semantic relevance S_acExtracting relevant detail semantics from child nodes

Wherein c is the index of each child node when extracting detail semantics, alpha_cExtracting the weight of detail semantics from the c-th child node, wherein p' is the index of the child node required for calculating the weight;

Wherein, W_coAnd b_coIs a learnable parameter;

based on the syntax dependency tree T_kThe enhanced representation node semantic representation of each node is calculated according to the bottom-up sequence from the leaf node to the root node; finally, enhanced node semantic representation of root nodes

Local semantic representation as clauses

4. The syntax dependence enhanced mathematical application topic semantic recognition and inference method according to claim 1, wherein the recursively generating the mathematical expression corresponding to the topic text by using a tree structure-based decoder in combination with the representation of the global semantic information and the local semantic representations of the respective clauses comprises:

sequentially predicting and generating each node of the required expression tree from top to bottom recursively in order from the root node to the leaf nodes by a tree structure-based method

Composing a complete and efficient expression tree

The symbol on each leaf node in the expression tree is a predicted constant or a numerical value in a title text, and the symbol on each non-leaf node is a predicted operator and represents the operation result of two child nodes; arranged according to the generated orderEach node of the column

Symbol of

Get and expression tree

Equivalent required prefix mathematical expression

5. The semantic recognition and inference method for mathematics application questions with enhanced grammar dependence of claim 4, characterized in that each node

Containing the corresponding symbols

The target vector q through the node_pAnd a context vector c_pAnd (6) obtaining a prediction.

6. The semantic recognition and inference method for math application problems with grammar dependency enhancement as claimed in claim 5, wherein the manner of generating node target vectors comprises:

note the book

For a parent node, the target vector is denoted as q_pFrom the target vector q_pGenerating a context vector c_PAnd predict the sign thereof

If the predicted sign

If the operator is an operator, generating two child nodes and carrying out target decomposition, wherein the target decomposition is carried out according to a target vector q_pGenerating target vectors of two child nodes; firstly, a target vector q according to nodes is obtained through a gated neural network_pContext vector c_pSymbol, symbol

Generating a target vector q of the left child node_lPerforming symbol prediction and target decomposition of the left child node, performing symbol prediction and target decomposition of the left child node recursively to obtain a left sub-tree, and generating a semantic representation vector t of a part of mathematical expression corresponding to the left sub-tree in a bottom-up manner through a gated neural network_lThen according to the target vector q_pContext vector c_pSymbol, symbol

Word vector and left sub-tree semantic representation vector t_lGeneration of target vector q of right child node by gated neural network_rPredicting and decomposing the target of the right child node; when the symbol predicted by the node is not an operator, the target decomposition of the corresponding left sub-tree is ended, and the target decomposition of the right sub-tree is started.

7. The method of claim 5 or 6 wherein said syntax dependent enhanced mathematical application problem semantic recognition and inferenceThe method is characterized in that the target vector q is based on nodes_pExtracting related local semantic information from semantic representations of different levels of the title through a level attention mechanism to obtain a context vector c integrating related clauses and word meaning information_pThe method comprises the following steps:

evaluating a target vector q_pIs characterized by each clause in the subject text

Correlation of (2)_c：

Wherein, W_sc、W_acAnd b_acIs a learnable parameter;

evaluating a target vector q_pWith each clause C of the subject text_kSemantic representation h of each word in_k，tCorrelation of (2)_w：

Wherein, W_sw、W_awAnd b_awIs a learnable parameter;

based on the target vector q_pWith each clause C_kThe semantic representation of the related clauses and words is extracted from different clauses by different weights to generate a context vector c integrating the information of the related clauses and the semantic representation of the words_p：

Wherein the content of the first and second substances,

to characterize from the kth clause

The weight of the semantic information is extracted,

characterising h for the t word from the kth clause_k，tWherein k is the index of the clause when semantic information is extracted from each clause, k' is the clause index required for calculating the weight, and t is the clause C_kThe term index when extracting semantic information from each term, t' is a clause C required for calculating the weight_kThe word index in (1).

8. The semantic recognition and inference method for mathematics application questions with grammar dependence enhancement as claimed in claim 5 or 6, characterized by that, the target vector q based on node is q_pAnd a context vector c, a predicted symbol

The method comprises the following steps:

node-based target vector q_pContext vector c, numerical variables in the network copy topic text are generated through pointers, and the symbols of the nodes are predicted by reasoning additional knowledge

First, a symbol to be predicted

Calculating a value probability P for each possible value of the symbol to be predicted for the value in the subject text or the constant or operator in the external symbol set by calculating the following three probabilities_c：

Computing symbols to be predicted

Probability of being a constant or operator in the outer symbol set:

P_gen＝σ(W_pg[q_p，c]+b_pg)

in the first case: if the symbol to be predicted is

Is a numerical variable from the subject text, calculates the symbol to be predicted

Probability distribution of (2):

in the second case: if the symbol to be predicted is

Is a constant or operator from an external symbol set, calculates the symbol to be predicted

Probability distribution of (2):

wherein the content of the first and second substances,

is the symbol to be predicted in the first case

Is used to characterize the word that is possible to take,

is the symbol to be predicted in the second case

then, the total probability is calculated

Finally, according to the probability of all possible values

Predicting the sign of a node

9. The semantic recognition and inference method for mathematics application questions with enhanced grammar dependence as claimed in claim 1, characterized in that, aiming at the question P and the corresponding target expression E_P＝{y₁，y₂，...，y_GThe corresponding model of the whole method is obtained by minimizing the following lossTraining a loss function L:

the penalty function represents the sum of the negative logarithms of the probabilities of the model-computed true signs at each location of the target expression, where each y represents a true sign in one of the target expressions, P_c(y_g|y₁，y₂，...，y_g-1P) denotes the first g-1 symbols y in a given title text P₁，y₂，...，y_g-1In the case of (1), the true symbol y calculated by the model at the g-th position_gThe probability of (d);

when using trained model prediction, only the topic text P is used, and the predicted symbol is used

A mathematical expression is generated.

10. A syntax dependence enhanced mathematics application topic semantic recognition and reasoning system is characterized by comprising a processing device and a display device; wherein: