CN116401624A - Non-autoregressive mathematical problem solver based on multi-tree structure - Google Patents

Non-autoregressive mathematical problem solver based on multi-tree structure Download PDF

Info

Publication number
CN116401624A
CN116401624A CN202310433743.7A CN202310433743A CN116401624A CN 116401624 A CN116401624 A CN 116401624A CN 202310433743 A CN202310433743 A CN 202310433743A CN 116401624 A CN116401624 A CN 116401624A
Authority
CN
China
Prior art keywords
target
mathematical
module
autoregressive
sub
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310433743.7A
Other languages
Chinese (zh)
Inventor
杨阳
宾燚
韩孟群
史文浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202310433743.7A priority Critical patent/CN116401624A/en
Publication of CN116401624A publication Critical patent/CN116401624A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/11Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computing Systems (AREA)
  • Computational Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Operations Research (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a non-autoregressive mathematical problem solver based on a multi-tree structure, which is added with a non-autoregressive target decomposition module based on the existing problem coding module and a target-driven multi-tree generation module, and is used for processing unordered multi-branch decomposition work, and a father target E is firstly subjected to g P is coded with I position i I=1, 2, …, I, then processing with a multi-head self-attention mechanism, and connecting the multi-heads with a multi-head mutual attention moduleOutput of self-attention module
Figure DDA0004191344810000011
And candidate set E c Obtaining a contextual representation
Figure DDA0004191344810000012
Finally expressed according to the context
Figure DDA0004191344810000013
Candidate set E c Selecting the most relevant candidate set E through a pointer network c The candidate characters in (3) are used for obtaining a probability distribution matrix Ptr for calculating cross entropy loss in the training process and a predictive mathematical problem expression in the prediction process. The invention improves the performance of the mathematical problem solver by using the non-autoregressive target decomposition module to process unordered multi-branch decomposition work and explore and capture the relationship between numbers.

Description

Non-autoregressive mathematical problem solver based on multi-tree structure
Technical Field
The invention belongs to the technical field of mathematical problem solvers, and particularly relates to a non-autoregressive mathematical problem solver based on a multi-tree structure.
Background
Automatic solving of mathematical problems (Math Word Problem, MWP) is an important sub-problem for machine reasoning. For a mathematical problem solver, a solution equation that meets specifications and is computable needs to be generated according to a given problem description and mathematical prior knowledge. More specifically, a mathematical topic is given, which consists of a description of the text containing the number q 1 ,q 2 ,…,q n And some mathematical definitions, it is necessary for the model to automatically give an expression for solving the problem and calculate the final answer to the problem from this expression. The task of solving mathematical problems involves core problems in artificial intelligence research such as deep understanding of natural language text, reasoning capability of machine intelligence, interpretability and the like. As an important test benchmark for machine intelligence, mathematical problem solutions have been of interest to many researchers.
The mathematical problem solving task has undergone several stages since it was proposed. Including early template-based methods, statistical-based methods, and up to later deep learning-based methods. Methods based on statistics and templates rely heavily on manual labeling and statistics, and defined models lack generalization. With the assistance of high-performance computer equipment and Internet mass data, deep learning has made a great progress, neural network-based methods are also applied to mathematical problem solving tasks, and a large number of models have made striking manifestations in the mathematical problem solving tasks. Since 2019 tree decoders were proposed, tree decoders have been widely used in mathematical problem solving tasks and become the dominant method.
The tree structure naturally has the advantage of representing the mathematical expression, and the depth of the tree can correspond to the priority of the operations in the mathematical expression, so that the operation with higher priority is placed on a lower level, and the root node of the tree is the operator with the lowest priority. The tree structure first has structural information of mathematical expressions, and the tree decoder can generate target expressions in a top-down manner and form an expression tree. The top-down method can be understood as splitting the question problem into sub-questions continuously, and finally obtaining the answer of the question by solving the sub-questions continuously, which is a solution method conforming to human intuition and has good interpretability. The sub-expressions generated during the splitting process are in fact intermediate variables in the topics, so the tree decoder can well represent the structural information of the mathematical topics.
Existing mathematical problem solvers mostly follow the encoder-decoder architecture, and the solving process of such architecture is a process of translating a natural language-based problem description into a symbolic language-based mathematical description (solving expression). Under such architecture, the improvement of the mathematical problem solver can be mainly summarized into two major categories of improvement of the encoder and improvement of the decoder.
For encoders, it is critical to enhance the understanding of the neural network model for the textual description of the problem, and in addition, the mathematical problem requires a great deal of external mathematical prior and common sense prior knowledge, how to integrate such prior knowledge into the encoding to help the model to better understand the problem, which is also an important aspect of improving the model encoder.
For the decoder, the key problem is to extract mathematical relations between variables from text features and search solution expressions from the relations, so how to reduce the search space of the solution expressions is an effective thought for improving performance. Furthermore, the expression tree has some drawbacks, for example, the one-to-one relationship of the binary expression tree and the mathematical expression can make it highly dependent on the form of the mathematical expression, for example, a-b+c and c-b+a are expressions using the same semantic meaning of the exchange law, but two different binary trees are generated. The following expression tree of the multi-way tree structure appears, a plurality of parts with the same priority in the mathematical expression are put on the same layer, so that the inconsistency of the semantics is effectively solved, but a solver based on the multi-way tree structure solves the problem based on codes, does not directly solve on the multi-way tree, is split into one path (the path of the multi-way tree refers to one connecting line from a root node to a leaf node), and takes all paths of a training set as a retrieval space to retrieve all paths in one problem and reconstruct the multi-way tree. The solver only learns on the path level, the structural information of the multi-way tree cannot be fully utilized, the mathematical problem solver lacks generalization, paths which do not appear in a training set cannot be generated, and the performance and the effectiveness of the mathematical problem automatic solver are to be improved.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a non-autoregressive mathematical problem solver based on a multi-tree structure so as to explore and capture the relation between numbers and improve the performance of the mathematical problem solver.
To achieve the above object, the present invention provides a non-autoregressive mathematical problem solver based on a multi-tree structure, which adopts an encoder-decoder structure, comprising:
the title coding module is used for giving a mathematical title P= { w of natural language description 1 ,w 2 ,…,w N W, where n Representing the nth word, n=1, 2, …, N encoded as a distributed representation E containing its context information p 、E V Wherein E is p For the characterization of the title of the entire mathematical title, E V Is a numerical representation of mathematical questions;
the target-driven multi-way tree generation module is a top-down multi-way tree generator adopting a target driving mechanism and utilizes a topic to represent E p As the root object of the multi-way tree, sub-objects are recursively generated in a top-down manner, the sub-objects being categorized as operands or operators, when the sub-objects are operationsWhen counting, directly obtaining the result of the current sub-target, and when the sub-target is an operator, the result of the sub-target cannot be obtained yet, and continuing to decompose downwards until the sub-target is an operand;
characterized in that it also comprises
A non-autoregressive target decomposition module for processing unordered multi-branch decomposition work:
first for the parent object E in the child object decomposition process g First, father target E g P is coded with I position i I=1, 2, I fusion, resulting in a fused target E pos
E pos =[E g +p 1 ;E g +p 2 ;…,E g +p I ];
Then, the multi-head self-attention mechanism is adopted for processing, namely, the target E pos Inputting into a multi-head self-attention module for processing to obtain an output
Figure BDA0004191344760000031
Figure BDA0004191344760000032
Wherein,,
Figure BDA0004191344760000033
is a trainable parameter matrix in the multi-head self-attention module;
then, the outputs of the multi-head self-attention modules are connected through a multi-head mutual attention module
Figure BDA0004191344760000034
And candidate set Ec, will output +.>
Figure BDA0004191344760000035
As Q matrix of multi-head mutual attention module, candidate set E c After passing through a feedforward neural network and trainable parameters
Figure BDA0004191344760000036
Multiplying to obtain K matrix and V matrix of multi-head mutual attention module, thus obtaining output +.>
Figure BDA0004191344760000037
Figure BDA0004191344760000038
Wherein d k For coding vector dimensions, E c The candidate sets are:
Figure BDA0004191344760000039
i.e. numbers, operators, constants and special characters, where E V Is the output of the topic coding module, the rest E op 、E con 、E N For trainable codes, N b Is a special character for representing the number of child nodes;
output of
Figure BDA00041913447600000310
For contextual representation, for selecting the most relevant candidate set E through a pointer network c Probability ω of selecting the j-th candidate character at position i ij The method comprises the following steps:
Figure BDA0004191344760000041
wherein W is p And W is b For learning parameters, u is the column weight vector,
Figure BDA0004191344760000042
and->
Figure BDA0004191344760000043
Are respectively->
Figure BDA0004191344760000044
A vector representation of the i-th position and a vector representation of the j-th character of the candidate set;
thus, the probability distribution Ptr of all the candidate characters at the ith position is obtained i
Ptr i =softmax(ω i )
Wherein omega i ={ω i1 ,ω i2 ,...,ω ij J is candidate set E c The number of candidate characters in (a);
all probability distribution Ptr i Forming a probability distribution matrix Ptr by rows; in the training process, ptr probability distribution matrix is used for calculating cross entropy loss with a real mathematical problem expression to train a non-autoregressive mathematical problem solver based on a multi-tree structure; in the prediction process, the Ptr probability distribution matrix is used for acquiring characters with the highest probability on each position, and the characters are used as predicted characters of each position to obtain a predicted mathematical problem expression.
The object of the present invention is thus achieved.
The invention relates to a non-autoregressive mathematical problem solver based on a multi-tree structure, which is added with a non-autoregressive target decomposition module based on the existing problem coding module and a target-driven multi-tree generation module, and is used for processing unordered multi-branch decomposition work, wherein a father target E is firstly subjected to g P is coded with I position i I=1, 2, I, then processing with a multi-head self-attention mechanism, and connecting the outputs of the multi-head self-attention modules with a multi-head mutual attention module
Figure BDA0004191344760000045
And candidate set E c Get contextual representation +.>
Figure BDA0004191344760000046
Finally, according to the context representation +.>
Figure BDA0004191344760000047
Candidate set E c Selecting the most relevant candidate set E through a pointer network c The candidate characters in (3) are used for obtaining a probability distribution matrix Ptr for calculating cross entropy loss in the training process and a predictive mathematical problem expression in the prediction process. The invention improves the performance of the mathematical problem solver by using the non-autoregressive target decomposition module to process unordered multi-branch decomposition work and explore and capture the relationship between numbers.
Drawings
FIG. 1 is a diagram of one embodiment of an MTree tree structure;
FIG. 2 is a schematic diagram of an embodiment of the non-autoregressive mathematical problem solver of the present invention based on a multi-way tree structure.
Detailed Description
The following description of the embodiments of the invention is presented in conjunction with the accompanying drawings to provide a better understanding of the invention to those skilled in the art. It is to be expressly noted that in the description below, detailed descriptions of known functions and designs are omitted here as perhaps obscuring the present invention.
Given a mathematical question composed of natural language
Figure BDA0004191344760000053
The topic text contains the number q= { Q 1 ,q 2 ,...,q M },q m Representing the mth digit, m=1, 2, …, M, requires the mathematical problem solver to give a mathematical expression O s ={e 1 ,e 2 ,...,e K E, where e k E { +, -, ×,/} UQ ≡c, C is a constant set {1, pi, }.
The invention designs a non-autoregressive mathematical problem solver based on a multi-way tree structure (hereinafter referred to as MTrees), and adopts a target-driven top-down strategy to generate an expression tree so as to obtain a solving problem expression
1. MTree tree structure
MTree tree structures were introduced into MWP in 2022 to unify expression tree structures. Mtre is a multi-way tree, the internal nodes of which are operators,the external nodes are numbers or constants that appear in the title. Child nodes of an internal node in mtre may exchange with each other. For a mathematical expression O s ={e 1 ,e 2 ,...,e M E, where e m E { +, -, ×, /) u Q u C, m=1, 2,..m, in the expression { ++, the operands of x may be interchanged, while the operands of { -,/} are not interchangeable. To solve this problem, the MTree tree structure introduces two new operators { × +/} to replace. X-means the opposite of the number of digital products, e.g., x- {2,3,4} equals one (2 x 3 x 4). +/represents the inverse of the sum of the operands, e.g. +/{2,3,4} equals
Figure BDA0004191344760000051
Furthermore, the number n at the leaf node may have a plurality of forms, including +.>
Figure BDA0004191344760000052
The structure of the MTree tree structure is shown in fig. 1.
2. Non-autoregressive mathematical problem solver based on multi-way tree structure
In this embodiment, as shown in fig. 1, the non-autoregressive mathematical problem solver based on the multi-tree structure of the present invention adopts an encoder-decoder structure, and includes a problem encoding module 1, a target-driven multi-tree generating module 2, and a non-autoregressive target decomposing module 3.
The topic coding module 1 is used for giving a mathematical topic P= { w of natural language description 1 ,w 2 ,…,w N W, where n Representing the nth word, n=1, 2, …, N encoded as a distributed representation E containing its context information p 、E V Wherein E is p For the characterization of the title of the entire mathematical title, E V Is a numerical representation of a mathematical problem.
The topic encoding module 1 encodes topics of natural language descriptions into a distributed representation containing its contextual information using a language model. In the prior art, there are two types of language models commonly used: recurrent Neural Networks (RNNs), such as LSTM or GRU, and pre-trained language models (PLM), such as BERT and RoBERTa. Inspired by the superior representation capabilities of the pre-trained model, recent work has tended to use the pre-trained model as a problem encoder. In the present invention, the topic encoding module 1 uses RoBERTa to obtain topic representations as well as digital representations. More specifically, given a mathematical title p= { w of natural language description 1 ,w 2 ,…,w N The title coding module 1 adds a special character [ CLS ] before and after the title]And [ SEP ]]Into Roberta, will [ CLS ]]The output of the location is a representation of the entire title. The topic encoding module 1 can be expressed by the following formula:
E p ,E V =RoBERT a([CLS];P;[SEP])#(1)
wherein E is p For the characterization of the title of the entire mathematical title, E V Is a numerical representation of a mathematical problem. Roberta fine tunes during training.
Expression tree decoders have been well studied in solving mathematical problems. The target driving mechanism gradually decomposes the whole problem into sub-problems, intuitively accords with the thought of human being as a problem, and realizes a top-down MTree generator by adopting the target driving mechanism. Specifically, the object-driven multi-way tree generation module 2 utilizes the topic representation E for a top-down multi-way tree generator employing an object-driven mechanism p As a root object of the multi-way tree, sub-objects are recursively generated in a top-down manner, the sub-objects are classified as operands or operators, when the sub-objects are operands, the result of the current sub-object is directly obtained, when the sub-objects are operators, the result of the sub-object is not obtained yet, and the downward decomposition needs to be continued until the sub-objects are operands.
In the MTrees structure, a plurality of child nodes of the same father node are unordered, and a new non-autoregressive target decomposition module (NAGD) is designed based on the non-autoregressive transformation to process unordered multi-branch decomposition work. Specifically, the present invention relates to a method for manufacturing a semiconductor device.
First for the parent object E in the child object decomposition process g First, father target E g P is coded with I position i ,i=1,2,...,I, fusing to obtain a fused target E pos
E pos =E pos =[E g +p 1 ;E g +p 2 ;…,E g +p I ]。
In this embodiment, the position code is generated by using sine and cosine position codes in the transducer:
Figure BDA0004191344760000061
Figure BDA0004191344760000062
where i denotes the position and l denotes the code p at position i i D is the first dimension of k Is the dimension of the encoded vector.
Then, a multi-head self-attention mechanism is adopted for processing to fuse information of different positions, namely, a target E pos Inputting into a multi-head self-attention module for processing to obtain an output
Figure BDA0004191344760000063
Figure BDA0004191344760000071
Wherein,,
Figure BDA0004191344760000072
is a trainable parameter matrix in a multi-head self-attention module.
Then, the outputs of the multi-head self-attention modules are connected through a multi-head mutual attention module
Figure BDA0004191344760000073
And candidate set Ec, will output +.>
Figure BDA0004191344760000074
As Q matrix of multi-head mutual attention module, candidate set E c After passing through a feedforward neural network and trainable parameters
Figure BDA0004191344760000075
Multiplying to obtain K matrix and V matrix of multi-head mutual attention module, thus obtaining output +.>
Figure BDA0004191344760000076
Figure BDA0004191344760000077
Wherein d k For coding vector dimensions, E c The candidate sets are:
Figure BDA0004191344760000078
i.e. numbers, operators, constants and special characters, where E V Is the output of the topic coding module, the rest E op 、E con 、E N For trainable codes, N b Is a special character used to represent the number of child nodes.
Output of
Figure BDA0004191344760000079
For contextual representation, for selecting the most relevant candidate set E through a pointer network c Probability ω of selecting the j-th candidate character at position i ij The method comprises the following steps:
Figure BDA00041913447600000710
wherein W is p And W is b For learning parameters, u is the column weight vector,
Figure BDA00041913447600000711
and->
Figure BDA00041913447600000712
Are respectively->
Figure BDA00041913447600000713
The vector representation of the i-th position and the vector representation of the j-th character of the candidate set.
Thus, the probability distribution Ptr of all the candidate characters at the ith position is obtained i
Ptr i =softmax(ω i )
Wherein omega i ={ω i1 ,ω i2 ,...,ω iJ J is candidate set E c The number of candidate characters in the set.
All probability distribution Ptr i Forming a probability distribution matrix Ptr by rows; in the training process, ptr probability distribution matrix is used for calculating cross entropy loss with a real mathematical problem expression to train a non-autoregressive mathematical problem solver based on a multi-tree structure; in the prediction process, the Ptr probability distribution matrix is used for acquiring characters with the highest probability on each position, and the characters are used as predicted characters of each position to obtain a predicted mathematical problem expression.
To calculate the loss between the predicted node and the actual node, we need to align the predicted node with the actual node. Thus, we define a pseudo-order for the real child nodes of each target. Output by operand format classifier
Figure BDA0004191344760000081
I.e. the context representation is sorted to get a pseudo-sequence, where operators are followed by constants and operands in the question. For the example in FIG. 2, the pseudo-order of child nodes is [ ×, ×, -40]. After predicting all child nodes, the mathematical problem solver needs to indicate the type +.>
Figure BDA0004191344760000082
We designed a simple feed forward neural network to classify the types and classify the typesThe penalty is combined with the previous node predictive classification penalty to co-train the entire mathematical problem solver.
3. MTree accuracy and MTree-based IoU
The MTree result can well solve the defects of various forms of the same mathematical expression. For solution expressions generated by various mathematical problem solvers, if the matching of characters is performed directly, false negative cases, such as a-b+c and c-b+a, may result. Therefore, the invention provides MTree precision and MTree IoU to evaluate the solving expression precision more accurately. Specifically, the mtre precision is that the solution expressions generated by different solvers are compared with the true expressions on mtre. It is also noted that such an evaluation of the entire expression tree does not measure the partial correctness of the expression, which is also an important way of evaluating solver capabilities as well as human capabilities. In the example of fig. 1, 13× (10+3) +40 and (13×10+3-40) are two erroneous expressions, the former using only erroneous "+" operations, but the other parts are correct. For this purpose, the invention proposes MTreeIoU to calculate the path accuracy of the connection root and leaf to measure the partial correctness of the expression. The calculation of MTreeIoU is as follows:
Figure BDA0004191344760000083
where I·| represents the number of elements in the set, E p And P g All paths in the predicted mtre and the true mtre, respectively.
4. Experimental results
In order to evaluate the MTree mathematical problem solver designed by the present invention, a number of experiments were performed on two commonly used public data sets Math23K and mapps. Math23K is a Chinese data set, contains 23162 mathematical topics, and MWPS contains 2300 mathematical topics.
The effectiveness of the MTree solver was first evaluated and analyzed and its performance was compared with the latest technology, and the result of the comparison is shown in table (1).
Figure BDA0004191344760000091
TABLE 1
FIG. 1 is a representation of a different mainstream math problem solver. From table 1 it can be seen that the present invention is superior to all baseline models, reaching new levels on both data sets, demonstrating the effectiveness and superiority of the present invention.
The SUMC Solver uses path prediction to reconstruct the expression tree, which is far lower in performance than the present invention, because the path prediction may make the nodes independent, breaking the arithmetic relationship between the nodes. While the solver implementing the attention mechanism of the present invention is able to explore and capture the relationships between numbers and produce better results. The behavior of the DeducReasoner, which implements complex relational modeling and deductive reasoning, is very close to our work, which may mean that introducing deductive reasoning into the MTreestructure will bring some new insight.
Ablation studies were also performed in the experiments to investigate the effectiveness of the proposed cross-target attention, the results are shown in table (2).
Figure BDA0004191344760000092
TABLE 2
Table 2 compares ablation experiments with cross-target attention. As can be seen from table 2, our model was significantly improved upon the addition of cross-target attention, e.g., from 83.2 to 84.4 on Math 23K. This shows that information belonging to different targets can be delivered and aggregated by this cross-target attention mechanism. This cross-target information integration significantly improves the accuracy of single-target decomposition.
To investigate the accuracy of MTrees and the effectiveness of MTree IoUs proposed by the present invention, the present study used 5 representative mathematical problem solvers with open source codes and compared them to the present invention on Math 23K. The results are shown in Table (3).
Figure BDA0004191344760000101
TABLE 3 Table 3
Table 3 is a comparison result of mtre precision and mtre IoU of the mainstream mathematical problem solver with open source code, and from the viewpoint of conforming to the theorem, the expression precision should be consistent with the value precision, but as can be seen in table 3, the expression precision is far lower than the value precision. While MTree accuracy is slightly lower than value accuracy, which is intuitive. As can be seen from Table 3, the value accuracy, MTree accuracy and MTree IoU are all the highest, and the performance of the mathematical problem solver is improved.
While the foregoing describes illustrative embodiments of the present invention to facilitate an understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, but is to be construed as protected by the accompanying claims insofar as various changes are within the spirit and scope of the present invention as defined and defined by the appended claims.

Claims (1)

1. A non-autoregressive mathematical problem solver based on a multi-way tree structure employing an encoder-decoder structure comprising:
the title coding module is used for giving a mathematical title P= { w of natural language description 1 ,w 2 ,…,w N W, where n Representing the nth word, n=1, 2, …, N encoded as a distributed representation E containing its context information p 、E V Wherein E is p For the characterization of the title of the entire mathematical title, E V Is a numerical representation of mathematical questions;
the target-driven multi-way tree generation module is a top-down multi-way tree generator adopting a target driving mechanism and utilizes a topic to represent E p As the root target of the multi-way tree, sub-targets are recursively generated in a top-down manner, the sub-targets are classified as operands or operators, the result of the current sub-target is directly obtained when the sub-target is an operand, and the sub-target is an operator which cannot be obtained yetObtaining the result of the sub-target, and continuing to decompose until the sub-target is an operand;
characterized in that it also comprises
A non-autoregressive target decomposition module for processing unordered multi-branch decomposition work:
first for the parent object E in the child object decomposition process g First, father target E g P is coded with I position i Fusing i=1, 2, …, I to obtain a fused target E pos
E pos =[E g +p 1 ;E g +p 2 ;…,E g +p I ];
Then, the multi-head self-attention mechanism is adopted for processing, namely, the target E pos Inputting into a multi-head self-attention module for processing to obtain an output
Figure FDA0004191344750000011
Figure FDA0004191344750000012
Wherein,,
Figure FDA0004191344750000013
is a trainable parameter matrix in the multi-head self-attention module;
then, the outputs of the multi-head self-attention modules are connected through a multi-head mutual attention module
Figure FDA0004191344750000014
And candidate set E c Will output +.>
Figure FDA0004191344750000015
As Q matrix of multi-head mutual attention module, candidate set E c After passing through a feedforward neural network and trainable parameters
Figure FDA0004191344750000016
Multiplying to obtain K matrix and V matrix of multi-head mutual attention module, thus obtaining output +.>
Figure FDA0004191344750000017
Figure FDA0004191344750000018
Wherein d k For coding vector dimensions, E c The candidate sets are:
Figure FDA0004191344750000021
i.e. numbers, operators, constants and special characters, where E V Is the output of the topic coding module, the rest E op 、E con 、E N For trainable codes, N b Is a special character for representing the number of child nodes;
output of
Figure FDA0004191344750000022
For contextual representation, for selecting the most relevant candidate set E through a pointer network c Probability ω of selecting the j-th candidate character at position i ij The method comprises the following steps:
Figure FDA0004191344750000023
wherein W is p And W is b For learning parameters, u is the column weight vector,
Figure FDA0004191344750000024
and->
Figure FDA0004191344750000025
Are respectively->
Figure FDA0004191344750000026
A vector representation of the i-th position and a vector representation of the j-th character of the candidate set;
thus, the probability distribution Ptr of all the candidate characters at the ith position is obtained i
Ptr i =softmax(ω i )
Wherein omega i ={ω i1i2 ,…,ω iJ J is candidate set E c The number of candidate characters in (a);
all probability distribution Ptr i Forming a probability distribution matrix Ptr by rows; in the training process, ptr probability distribution matrix is used for calculating cross entropy loss with a real mathematical problem expression to train a non-autoregressive mathematical problem solver based on a multi-tree structure; in the prediction process, the Ptr probability distribution matrix is used for acquiring characters with the highest probability on each position, and the characters are used as predicted characters of each position to obtain a predicted mathematical problem expression.
CN202310433743.7A 2023-04-21 2023-04-21 Non-autoregressive mathematical problem solver based on multi-tree structure Pending CN116401624A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310433743.7A CN116401624A (en) 2023-04-21 2023-04-21 Non-autoregressive mathematical problem solver based on multi-tree structure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310433743.7A CN116401624A (en) 2023-04-21 2023-04-21 Non-autoregressive mathematical problem solver based on multi-tree structure

Publications (1)

Publication Number Publication Date
CN116401624A true CN116401624A (en) 2023-07-07

Family

ID=87012231

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310433743.7A Pending CN116401624A (en) 2023-04-21 2023-04-21 Non-autoregressive mathematical problem solver based on multi-tree structure

Country Status (1)

Country Link
CN (1) CN116401624A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116680502A (en) * 2023-08-02 2023-09-01 中国科学技术大学 Intelligent solving method, system, equipment and storage medium for mathematics application questions

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116680502A (en) * 2023-08-02 2023-09-01 中国科学技术大学 Intelligent solving method, system, equipment and storage medium for mathematics application questions
CN116680502B (en) * 2023-08-02 2023-11-28 中国科学技术大学 Intelligent solving method, system, equipment and storage medium for mathematics application questions

Similar Documents

Publication Publication Date Title
CN113642330B (en) Rail transit standard entity identification method based on catalogue theme classification
CN111985245A (en) Attention cycle gating graph convolution network-based relation extraction method and system
CN110990564B (en) Negative news identification method based on emotion calculation and multi-head attention mechanism
CN107562792A (en) A kind of question and answer matching process based on deep learning
CN105938485A (en) Image description method based on convolution cyclic hybrid model
CN112749562A (en) Named entity identification method, device, storage medium and electronic equipment
CN113157886B (en) Automatic question and answer generation method, system, terminal and readable storage medium
CN112100397A (en) Electric power plan knowledge graph construction method and system based on bidirectional gating circulation unit
CN109241199B (en) Financial knowledge graph discovery method
CN115099338A (en) Power grid master equipment-oriented multi-source heterogeneous quality information fusion processing method and system
CN110765277A (en) Online equipment fault diagnosis platform of mobile terminal based on knowledge graph
CN113704437A (en) Knowledge base question-answering method integrating multi-head attention mechanism and relative position coding
CN115526236A (en) Text network graph classification method based on multi-modal comparative learning
CN116401624A (en) Non-autoregressive mathematical problem solver based on multi-tree structure
CN113988075A (en) Network security field text data entity relation extraction method based on multi-task learning
Xiong et al. Transferable natural language interface to structured queries aided by adversarial generation
CN115906816A (en) Text emotion analysis method of two-channel Attention model based on Bert
CN115496072A (en) Relation extraction method based on comparison learning
CN117592563A (en) Power large model training and adjusting method with field knowledge enhancement
Huang et al. Design knowledge graph-aided conceptual product design approach based on joint entity and relation extraction
CN117151222A (en) Domain knowledge guided emergency case entity attribute and relation extraction method thereof, electronic equipment and storage medium
Qiu et al. NeuroSPE: A neuro‐net spatial relation extractor for natural language text fusing gazetteers and pretrained models
CN113901172B (en) Case-related microblog evaluation object extraction method based on keyword structural coding
CN113361615B (en) Text classification method based on semantic relevance
CN113326371B (en) Event extraction method integrating pre-training language model and anti-noise interference remote supervision information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination