CN116401624A - Non-autoregressive mathematical problem solver based on multi-tree structure - Google Patents
Non-autoregressive mathematical problem solver based on multi-tree structure Download PDFInfo
- Publication number
- CN116401624A CN116401624A CN202310433743.7A CN202310433743A CN116401624A CN 116401624 A CN116401624 A CN 116401624A CN 202310433743 A CN202310433743 A CN 202310433743A CN 116401624 A CN116401624 A CN 116401624A
- Authority
- CN
- China
- Prior art keywords
- target
- mathematical
- module
- autoregressive
- sub
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000014509 gene expression Effects 0.000 claims abstract description 44
- 238000000034 method Methods 0.000 claims abstract description 31
- 239000011159 matrix material Substances 0.000 claims abstract description 23
- 238000000354 decomposition reaction Methods 0.000 claims abstract description 19
- 230000008569 process Effects 0.000 claims abstract description 19
- 230000007246 mechanism Effects 0.000 claims abstract description 12
- 238000012545 processing Methods 0.000 claims abstract description 12
- 238000012549 training Methods 0.000 claims abstract description 8
- 238000013528 artificial neural network Methods 0.000 claims description 6
- 238000012512 characterization method Methods 0.000 claims description 4
- 238000002474 experimental method Methods 0.000 description 3
- 230000006872 improvement Effects 0.000 description 3
- 238000002679 ablation Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/11—Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computing Systems (AREA)
- Computational Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Operations Research (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a non-autoregressive mathematical problem solver based on a multi-tree structure, which is added with a non-autoregressive target decomposition module based on the existing problem coding module and a target-driven multi-tree generation module, and is used for processing unordered multi-branch decomposition work, and a father target E is firstly subjected to g P is coded with I position i I=1, 2, …, I, then processing with a multi-head self-attention mechanism, and connecting the multi-heads with a multi-head mutual attention moduleOutput of self-attention moduleAnd candidate set E c Obtaining a contextual representationFinally expressed according to the contextCandidate set E c Selecting the most relevant candidate set E through a pointer network c The candidate characters in (3) are used for obtaining a probability distribution matrix Ptr for calculating cross entropy loss in the training process and a predictive mathematical problem expression in the prediction process. The invention improves the performance of the mathematical problem solver by using the non-autoregressive target decomposition module to process unordered multi-branch decomposition work and explore and capture the relationship between numbers.
Description
Technical Field
The invention belongs to the technical field of mathematical problem solvers, and particularly relates to a non-autoregressive mathematical problem solver based on a multi-tree structure.
Background
Automatic solving of mathematical problems (Math Word Problem, MWP) is an important sub-problem for machine reasoning. For a mathematical problem solver, a solution equation that meets specifications and is computable needs to be generated according to a given problem description and mathematical prior knowledge. More specifically, a mathematical topic is given, which consists of a description of the text containing the number q 1 ,q 2 ,…,q n And some mathematical definitions, it is necessary for the model to automatically give an expression for solving the problem and calculate the final answer to the problem from this expression. The task of solving mathematical problems involves core problems in artificial intelligence research such as deep understanding of natural language text, reasoning capability of machine intelligence, interpretability and the like. As an important test benchmark for machine intelligence, mathematical problem solutions have been of interest to many researchers.
The mathematical problem solving task has undergone several stages since it was proposed. Including early template-based methods, statistical-based methods, and up to later deep learning-based methods. Methods based on statistics and templates rely heavily on manual labeling and statistics, and defined models lack generalization. With the assistance of high-performance computer equipment and Internet mass data, deep learning has made a great progress, neural network-based methods are also applied to mathematical problem solving tasks, and a large number of models have made striking manifestations in the mathematical problem solving tasks. Since 2019 tree decoders were proposed, tree decoders have been widely used in mathematical problem solving tasks and become the dominant method.
The tree structure naturally has the advantage of representing the mathematical expression, and the depth of the tree can correspond to the priority of the operations in the mathematical expression, so that the operation with higher priority is placed on a lower level, and the root node of the tree is the operator with the lowest priority. The tree structure first has structural information of mathematical expressions, and the tree decoder can generate target expressions in a top-down manner and form an expression tree. The top-down method can be understood as splitting the question problem into sub-questions continuously, and finally obtaining the answer of the question by solving the sub-questions continuously, which is a solution method conforming to human intuition and has good interpretability. The sub-expressions generated during the splitting process are in fact intermediate variables in the topics, so the tree decoder can well represent the structural information of the mathematical topics.
Existing mathematical problem solvers mostly follow the encoder-decoder architecture, and the solving process of such architecture is a process of translating a natural language-based problem description into a symbolic language-based mathematical description (solving expression). Under such architecture, the improvement of the mathematical problem solver can be mainly summarized into two major categories of improvement of the encoder and improvement of the decoder.
For encoders, it is critical to enhance the understanding of the neural network model for the textual description of the problem, and in addition, the mathematical problem requires a great deal of external mathematical prior and common sense prior knowledge, how to integrate such prior knowledge into the encoding to help the model to better understand the problem, which is also an important aspect of improving the model encoder.
For the decoder, the key problem is to extract mathematical relations between variables from text features and search solution expressions from the relations, so how to reduce the search space of the solution expressions is an effective thought for improving performance. Furthermore, the expression tree has some drawbacks, for example, the one-to-one relationship of the binary expression tree and the mathematical expression can make it highly dependent on the form of the mathematical expression, for example, a-b+c and c-b+a are expressions using the same semantic meaning of the exchange law, but two different binary trees are generated. The following expression tree of the multi-way tree structure appears, a plurality of parts with the same priority in the mathematical expression are put on the same layer, so that the inconsistency of the semantics is effectively solved, but a solver based on the multi-way tree structure solves the problem based on codes, does not directly solve on the multi-way tree, is split into one path (the path of the multi-way tree refers to one connecting line from a root node to a leaf node), and takes all paths of a training set as a retrieval space to retrieve all paths in one problem and reconstruct the multi-way tree. The solver only learns on the path level, the structural information of the multi-way tree cannot be fully utilized, the mathematical problem solver lacks generalization, paths which do not appear in a training set cannot be generated, and the performance and the effectiveness of the mathematical problem automatic solver are to be improved.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a non-autoregressive mathematical problem solver based on a multi-tree structure so as to explore and capture the relation between numbers and improve the performance of the mathematical problem solver.
To achieve the above object, the present invention provides a non-autoregressive mathematical problem solver based on a multi-tree structure, which adopts an encoder-decoder structure, comprising:
the title coding module is used for giving a mathematical title P= { w of natural language description 1 ,w 2 ,…,w N W, where n Representing the nth word, n=1, 2, …, N encoded as a distributed representation E containing its context information p 、E V Wherein E is p For the characterization of the title of the entire mathematical title, E V Is a numerical representation of mathematical questions;
the target-driven multi-way tree generation module is a top-down multi-way tree generator adopting a target driving mechanism and utilizes a topic to represent E p As the root object of the multi-way tree, sub-objects are recursively generated in a top-down manner, the sub-objects being categorized as operands or operators, when the sub-objects are operationsWhen counting, directly obtaining the result of the current sub-target, and when the sub-target is an operator, the result of the sub-target cannot be obtained yet, and continuing to decompose downwards until the sub-target is an operand;
characterized in that it also comprises
A non-autoregressive target decomposition module for processing unordered multi-branch decomposition work:
first for the parent object E in the child object decomposition process g First, father target E g P is coded with I position i I=1, 2, I fusion, resulting in a fused target E pos :
E pos =[E g +p 1 ;E g +p 2 ;…,E g +p I ];
Then, the multi-head self-attention mechanism is adopted for processing, namely, the target E pos Inputting into a multi-head self-attention module for processing to obtain an output
then, the outputs of the multi-head self-attention modules are connected through a multi-head mutual attention moduleAnd candidate set Ec, will output +.>As Q matrix of multi-head mutual attention module, candidate set E c After passing through a feedforward neural network and trainable parametersMultiplying to obtain K matrix and V matrix of multi-head mutual attention module, thus obtaining output +.>
Wherein d k For coding vector dimensions, E c The candidate sets are:
i.e. numbers, operators, constants and special characters, where E V Is the output of the topic coding module, the rest E op 、E con 、E N For trainable codes, N b Is a special character for representing the number of child nodes;
output ofFor contextual representation, for selecting the most relevant candidate set E through a pointer network c Probability ω of selecting the j-th candidate character at position i ij The method comprises the following steps:
wherein W is p And W is b For learning parameters, u is the column weight vector,and->Are respectively->A vector representation of the i-th position and a vector representation of the j-th character of the candidate set;
thus, the probability distribution Ptr of all the candidate characters at the ith position is obtained i :
Ptr i =softmax(ω i )
Wherein omega i ={ω i1 ,ω i2 ,...,ω ij J is candidate set E c The number of candidate characters in (a);
all probability distribution Ptr i Forming a probability distribution matrix Ptr by rows; in the training process, ptr probability distribution matrix is used for calculating cross entropy loss with a real mathematical problem expression to train a non-autoregressive mathematical problem solver based on a multi-tree structure; in the prediction process, the Ptr probability distribution matrix is used for acquiring characters with the highest probability on each position, and the characters are used as predicted characters of each position to obtain a predicted mathematical problem expression.
The object of the present invention is thus achieved.
The invention relates to a non-autoregressive mathematical problem solver based on a multi-tree structure, which is added with a non-autoregressive target decomposition module based on the existing problem coding module and a target-driven multi-tree generation module, and is used for processing unordered multi-branch decomposition work, wherein a father target E is firstly subjected to g P is coded with I position i I=1, 2, I, then processing with a multi-head self-attention mechanism, and connecting the outputs of the multi-head self-attention modules with a multi-head mutual attention moduleAnd candidate set E c Get contextual representation +.>Finally, according to the context representation +.>Candidate set E c Selecting the most relevant candidate set E through a pointer network c The candidate characters in (3) are used for obtaining a probability distribution matrix Ptr for calculating cross entropy loss in the training process and a predictive mathematical problem expression in the prediction process. The invention improves the performance of the mathematical problem solver by using the non-autoregressive target decomposition module to process unordered multi-branch decomposition work and explore and capture the relationship between numbers.
Drawings
FIG. 1 is a diagram of one embodiment of an MTree tree structure;
FIG. 2 is a schematic diagram of an embodiment of the non-autoregressive mathematical problem solver of the present invention based on a multi-way tree structure.
Detailed Description
The following description of the embodiments of the invention is presented in conjunction with the accompanying drawings to provide a better understanding of the invention to those skilled in the art. It is to be expressly noted that in the description below, detailed descriptions of known functions and designs are omitted here as perhaps obscuring the present invention.
Given a mathematical question composed of natural languageThe topic text contains the number q= { Q 1 ,q 2 ,...,q M },q m Representing the mth digit, m=1, 2, …, M, requires the mathematical problem solver to give a mathematical expression O s ={e 1 ,e 2 ,...,e K E, where e k E { +, -, ×,/} UQ ≡c, C is a constant set {1, pi, }.
The invention designs a non-autoregressive mathematical problem solver based on a multi-way tree structure (hereinafter referred to as MTrees), and adopts a target-driven top-down strategy to generate an expression tree so as to obtain a solving problem expression
1. MTree tree structure
MTree tree structures were introduced into MWP in 2022 to unify expression tree structures. Mtre is a multi-way tree, the internal nodes of which are operators,the external nodes are numbers or constants that appear in the title. Child nodes of an internal node in mtre may exchange with each other. For a mathematical expression O s ={e 1 ,e 2 ,...,e M E, where e m E { +, -, ×, /) u Q u C, m=1, 2,..m, in the expression { ++, the operands of x may be interchanged, while the operands of { -,/} are not interchangeable. To solve this problem, the MTree tree structure introduces two new operators { × +/} to replace. X-means the opposite of the number of digital products, e.g., x- {2,3,4} equals one (2 x 3 x 4). +/represents the inverse of the sum of the operands, e.g. +/{2,3,4} equalsFurthermore, the number n at the leaf node may have a plurality of forms, including +.>The structure of the MTree tree structure is shown in fig. 1.
2. Non-autoregressive mathematical problem solver based on multi-way tree structure
In this embodiment, as shown in fig. 1, the non-autoregressive mathematical problem solver based on the multi-tree structure of the present invention adopts an encoder-decoder structure, and includes a problem encoding module 1, a target-driven multi-tree generating module 2, and a non-autoregressive target decomposing module 3.
The topic coding module 1 is used for giving a mathematical topic P= { w of natural language description 1 ,w 2 ,…,w N W, where n Representing the nth word, n=1, 2, …, N encoded as a distributed representation E containing its context information p 、E V Wherein E is p For the characterization of the title of the entire mathematical title, E V Is a numerical representation of a mathematical problem.
The topic encoding module 1 encodes topics of natural language descriptions into a distributed representation containing its contextual information using a language model. In the prior art, there are two types of language models commonly used: recurrent Neural Networks (RNNs), such as LSTM or GRU, and pre-trained language models (PLM), such as BERT and RoBERTa. Inspired by the superior representation capabilities of the pre-trained model, recent work has tended to use the pre-trained model as a problem encoder. In the present invention, the topic encoding module 1 uses RoBERTa to obtain topic representations as well as digital representations. More specifically, given a mathematical title p= { w of natural language description 1 ,w 2 ,…,w N The title coding module 1 adds a special character [ CLS ] before and after the title]And [ SEP ]]Into Roberta, will [ CLS ]]The output of the location is a representation of the entire title. The topic encoding module 1 can be expressed by the following formula:
E p ,E V =RoBERT a([CLS];P;[SEP])#(1)
wherein E is p For the characterization of the title of the entire mathematical title, E V Is a numerical representation of a mathematical problem. Roberta fine tunes during training.
Expression tree decoders have been well studied in solving mathematical problems. The target driving mechanism gradually decomposes the whole problem into sub-problems, intuitively accords with the thought of human being as a problem, and realizes a top-down MTree generator by adopting the target driving mechanism. Specifically, the object-driven multi-way tree generation module 2 utilizes the topic representation E for a top-down multi-way tree generator employing an object-driven mechanism p As a root object of the multi-way tree, sub-objects are recursively generated in a top-down manner, the sub-objects are classified as operands or operators, when the sub-objects are operands, the result of the current sub-object is directly obtained, when the sub-objects are operators, the result of the sub-object is not obtained yet, and the downward decomposition needs to be continued until the sub-objects are operands.
In the MTrees structure, a plurality of child nodes of the same father node are unordered, and a new non-autoregressive target decomposition module (NAGD) is designed based on the non-autoregressive transformation to process unordered multi-branch decomposition work. Specifically, the present invention relates to a method for manufacturing a semiconductor device.
First for the parent object E in the child object decomposition process g First, father target E g P is coded with I position i ,i=1,2,...,I, fusing to obtain a fused target E pos :
E pos =E pos =[E g +p 1 ;E g +p 2 ;…,E g +p I ]。
In this embodiment, the position code is generated by using sine and cosine position codes in the transducer:
where i denotes the position and l denotes the code p at position i i D is the first dimension of k Is the dimension of the encoded vector.
Then, a multi-head self-attention mechanism is adopted for processing to fuse information of different positions, namely, a target E pos Inputting into a multi-head self-attention module for processing to obtain an output
Then, the outputs of the multi-head self-attention modules are connected through a multi-head mutual attention moduleAnd candidate set Ec, will output +.>As Q matrix of multi-head mutual attention module, candidate set E c After passing through a feedforward neural network and trainable parametersMultiplying to obtain K matrix and V matrix of multi-head mutual attention module, thus obtaining output +.>
Wherein d k For coding vector dimensions, E c The candidate sets are:
i.e. numbers, operators, constants and special characters, where E V Is the output of the topic coding module, the rest E op 、E con 、E N For trainable codes, N b Is a special character used to represent the number of child nodes.
Output ofFor contextual representation, for selecting the most relevant candidate set E through a pointer network c Probability ω of selecting the j-th candidate character at position i ij The method comprises the following steps:
wherein W is p And W is b For learning parameters, u is the column weight vector,and->Are respectively->The vector representation of the i-th position and the vector representation of the j-th character of the candidate set.
Thus, the probability distribution Ptr of all the candidate characters at the ith position is obtained i :
Ptr i =softmax(ω i )
Wherein omega i ={ω i1 ,ω i2 ,...,ω iJ J is candidate set E c The number of candidate characters in the set.
All probability distribution Ptr i Forming a probability distribution matrix Ptr by rows; in the training process, ptr probability distribution matrix is used for calculating cross entropy loss with a real mathematical problem expression to train a non-autoregressive mathematical problem solver based on a multi-tree structure; in the prediction process, the Ptr probability distribution matrix is used for acquiring characters with the highest probability on each position, and the characters are used as predicted characters of each position to obtain a predicted mathematical problem expression.
To calculate the loss between the predicted node and the actual node, we need to align the predicted node with the actual node. Thus, we define a pseudo-order for the real child nodes of each target. Output by operand format classifierI.e. the context representation is sorted to get a pseudo-sequence, where operators are followed by constants and operands in the question. For the example in FIG. 2, the pseudo-order of child nodes is [ ×, ×, -40]. After predicting all child nodes, the mathematical problem solver needs to indicate the type +.>We designed a simple feed forward neural network to classify the types and classify the typesThe penalty is combined with the previous node predictive classification penalty to co-train the entire mathematical problem solver.
3. MTree accuracy and MTree-based IoU
The MTree result can well solve the defects of various forms of the same mathematical expression. For solution expressions generated by various mathematical problem solvers, if the matching of characters is performed directly, false negative cases, such as a-b+c and c-b+a, may result. Therefore, the invention provides MTree precision and MTree IoU to evaluate the solving expression precision more accurately. Specifically, the mtre precision is that the solution expressions generated by different solvers are compared with the true expressions on mtre. It is also noted that such an evaluation of the entire expression tree does not measure the partial correctness of the expression, which is also an important way of evaluating solver capabilities as well as human capabilities. In the example of fig. 1, 13× (10+3) +40 and (13×10+3-40) are two erroneous expressions, the former using only erroneous "+" operations, but the other parts are correct. For this purpose, the invention proposes MTreeIoU to calculate the path accuracy of the connection root and leaf to measure the partial correctness of the expression. The calculation of MTreeIoU is as follows:
where I·| represents the number of elements in the set, E p And P g All paths in the predicted mtre and the true mtre, respectively.
4. Experimental results
In order to evaluate the MTree mathematical problem solver designed by the present invention, a number of experiments were performed on two commonly used public data sets Math23K and mapps. Math23K is a Chinese data set, contains 23162 mathematical topics, and MWPS contains 2300 mathematical topics.
The effectiveness of the MTree solver was first evaluated and analyzed and its performance was compared with the latest technology, and the result of the comparison is shown in table (1).
TABLE 1
FIG. 1 is a representation of a different mainstream math problem solver. From table 1 it can be seen that the present invention is superior to all baseline models, reaching new levels on both data sets, demonstrating the effectiveness and superiority of the present invention.
The SUMC Solver uses path prediction to reconstruct the expression tree, which is far lower in performance than the present invention, because the path prediction may make the nodes independent, breaking the arithmetic relationship between the nodes. While the solver implementing the attention mechanism of the present invention is able to explore and capture the relationships between numbers and produce better results. The behavior of the DeducReasoner, which implements complex relational modeling and deductive reasoning, is very close to our work, which may mean that introducing deductive reasoning into the MTreestructure will bring some new insight.
Ablation studies were also performed in the experiments to investigate the effectiveness of the proposed cross-target attention, the results are shown in table (2).
TABLE 2
Table 2 compares ablation experiments with cross-target attention. As can be seen from table 2, our model was significantly improved upon the addition of cross-target attention, e.g., from 83.2 to 84.4 on Math 23K. This shows that information belonging to different targets can be delivered and aggregated by this cross-target attention mechanism. This cross-target information integration significantly improves the accuracy of single-target decomposition.
To investigate the accuracy of MTrees and the effectiveness of MTree IoUs proposed by the present invention, the present study used 5 representative mathematical problem solvers with open source codes and compared them to the present invention on Math 23K. The results are shown in Table (3).
TABLE 3 Table 3
Table 3 is a comparison result of mtre precision and mtre IoU of the mainstream mathematical problem solver with open source code, and from the viewpoint of conforming to the theorem, the expression precision should be consistent with the value precision, but as can be seen in table 3, the expression precision is far lower than the value precision. While MTree accuracy is slightly lower than value accuracy, which is intuitive. As can be seen from Table 3, the value accuracy, MTree accuracy and MTree IoU are all the highest, and the performance of the mathematical problem solver is improved.
While the foregoing describes illustrative embodiments of the present invention to facilitate an understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, but is to be construed as protected by the accompanying claims insofar as various changes are within the spirit and scope of the present invention as defined and defined by the appended claims.
Claims (1)
1. A non-autoregressive mathematical problem solver based on a multi-way tree structure employing an encoder-decoder structure comprising:
the title coding module is used for giving a mathematical title P= { w of natural language description 1 ,w 2 ,…,w N W, where n Representing the nth word, n=1, 2, …, N encoded as a distributed representation E containing its context information p 、E V Wherein E is p For the characterization of the title of the entire mathematical title, E V Is a numerical representation of mathematical questions;
the target-driven multi-way tree generation module is a top-down multi-way tree generator adopting a target driving mechanism and utilizes a topic to represent E p As the root target of the multi-way tree, sub-targets are recursively generated in a top-down manner, the sub-targets are classified as operands or operators, the result of the current sub-target is directly obtained when the sub-target is an operand, and the sub-target is an operator which cannot be obtained yetObtaining the result of the sub-target, and continuing to decompose until the sub-target is an operand;
characterized in that it also comprises
A non-autoregressive target decomposition module for processing unordered multi-branch decomposition work:
first for the parent object E in the child object decomposition process g First, father target E g P is coded with I position i Fusing i=1, 2, …, I to obtain a fused target E pos :
E pos =[E g +p 1 ;E g +p 2 ;…,E g +p I ];
Then, the multi-head self-attention mechanism is adopted for processing, namely, the target E pos Inputting into a multi-head self-attention module for processing to obtain an output
then, the outputs of the multi-head self-attention modules are connected through a multi-head mutual attention moduleAnd candidate set E c Will output +.>As Q matrix of multi-head mutual attention module, candidate set E c After passing through a feedforward neural network and trainable parametersMultiplying to obtain K matrix and V matrix of multi-head mutual attention module, thus obtaining output +.>
Wherein d k For coding vector dimensions, E c The candidate sets are:
i.e. numbers, operators, constants and special characters, where E V Is the output of the topic coding module, the rest E op 、E con 、E N For trainable codes, N b Is a special character for representing the number of child nodes;
output ofFor contextual representation, for selecting the most relevant candidate set E through a pointer network c Probability ω of selecting the j-th candidate character at position i ij The method comprises the following steps:
wherein W is p And W is b For learning parameters, u is the column weight vector,and->Are respectively->A vector representation of the i-th position and a vector representation of the j-th character of the candidate set;
thus, the probability distribution Ptr of all the candidate characters at the ith position is obtained i :
Ptr i =softmax(ω i )
Wherein omega i ={ω i1 ,ω i2 ,…,ω iJ J is candidate set E c The number of candidate characters in (a);
all probability distribution Ptr i Forming a probability distribution matrix Ptr by rows; in the training process, ptr probability distribution matrix is used for calculating cross entropy loss with a real mathematical problem expression to train a non-autoregressive mathematical problem solver based on a multi-tree structure; in the prediction process, the Ptr probability distribution matrix is used for acquiring characters with the highest probability on each position, and the characters are used as predicted characters of each position to obtain a predicted mathematical problem expression.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310433743.7A CN116401624A (en) | 2023-04-21 | 2023-04-21 | Non-autoregressive mathematical problem solver based on multi-tree structure |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310433743.7A CN116401624A (en) | 2023-04-21 | 2023-04-21 | Non-autoregressive mathematical problem solver based on multi-tree structure |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116401624A true CN116401624A (en) | 2023-07-07 |
Family
ID=87012231
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310433743.7A Pending CN116401624A (en) | 2023-04-21 | 2023-04-21 | Non-autoregressive mathematical problem solver based on multi-tree structure |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116401624A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116680502A (en) * | 2023-08-02 | 2023-09-01 | 中国科学技术大学 | Intelligent solving method, system, equipment and storage medium for mathematics application questions |
-
2023
- 2023-04-21 CN CN202310433743.7A patent/CN116401624A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116680502A (en) * | 2023-08-02 | 2023-09-01 | 中国科学技术大学 | Intelligent solving method, system, equipment and storage medium for mathematics application questions |
CN116680502B (en) * | 2023-08-02 | 2023-11-28 | 中国科学技术大学 | Intelligent solving method, system, equipment and storage medium for mathematics application questions |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113642330B (en) | Rail transit standard entity identification method based on catalogue theme classification | |
CN111985245A (en) | Attention cycle gating graph convolution network-based relation extraction method and system | |
CN110990564B (en) | Negative news identification method based on emotion calculation and multi-head attention mechanism | |
CN107562792A (en) | A kind of question and answer matching process based on deep learning | |
CN105938485A (en) | Image description method based on convolution cyclic hybrid model | |
CN112749562A (en) | Named entity identification method, device, storage medium and electronic equipment | |
CN113157886B (en) | Automatic question and answer generation method, system, terminal and readable storage medium | |
CN112100397A (en) | Electric power plan knowledge graph construction method and system based on bidirectional gating circulation unit | |
CN109241199B (en) | Financial knowledge graph discovery method | |
CN115099338A (en) | Power grid master equipment-oriented multi-source heterogeneous quality information fusion processing method and system | |
CN110765277A (en) | Online equipment fault diagnosis platform of mobile terminal based on knowledge graph | |
CN113704437A (en) | Knowledge base question-answering method integrating multi-head attention mechanism and relative position coding | |
CN115526236A (en) | Text network graph classification method based on multi-modal comparative learning | |
CN116401624A (en) | Non-autoregressive mathematical problem solver based on multi-tree structure | |
CN113988075A (en) | Network security field text data entity relation extraction method based on multi-task learning | |
Xiong et al. | Transferable natural language interface to structured queries aided by adversarial generation | |
CN115906816A (en) | Text emotion analysis method of two-channel Attention model based on Bert | |
CN115496072A (en) | Relation extraction method based on comparison learning | |
CN117592563A (en) | Power large model training and adjusting method with field knowledge enhancement | |
Huang et al. | Design knowledge graph-aided conceptual product design approach based on joint entity and relation extraction | |
CN117151222A (en) | Domain knowledge guided emergency case entity attribute and relation extraction method thereof, electronic equipment and storage medium | |
Qiu et al. | NeuroSPE: A neuro‐net spatial relation extractor for natural language text fusing gazetteers and pretrained models | |
CN113901172B (en) | Case-related microblog evaluation object extraction method based on keyword structural coding | |
CN113361615B (en) | Text classification method based on semantic relevance | |
CN113326371B (en) | Event extraction method integrating pre-training language model and anti-noise interference remote supervision information |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |