CN116401624A

CN116401624A - Non-autoregressive mathematical problem solver based on multi-tree structure

Info

Publication number: CN116401624A
Application number: CN202310433743.7A
Authority: CN
Inventors: 杨阳; 宾燚; 韩孟群; 史文浩
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2023-04-21
Filing date: 2023-04-21
Publication date: 2023-07-07

Abstract

The invention discloses a non-autoregressive mathematical problem solver based on a multi-tree structure, which is added with a non-autoregressive target decomposition module based on the existing problem coding module and a target-driven multi-tree generation module, and is used for processing unordered multi-branch decomposition work, and a father target E is firstly subjected to _g P is coded with I position _i I=1, 2, …, I, then processing with a multi-head self-attention mechanism, and connecting the multi-heads with a multi-head mutual attention moduleOutput of self-attention module

And candidate set E _c Obtaining a contextual representation

Finally expressed according to the context

Candidate set E _c Selecting the most relevant candidate set E through a pointer network _c The candidate characters in (3) are used for obtaining a probability distribution matrix Ptr for calculating cross entropy loss in the training process and a predictive mathematical problem expression in the prediction process. The invention improves the performance of the mathematical problem solver by using the non-autoregressive target decomposition module to process unordered multi-branch decomposition work and explore and capture the relationship between numbers.

Description

Non-autoregressive mathematical problem solver based on multi-tree structure

Technical Field

The invention belongs to the technical field of mathematical problem solvers, and particularly relates to a non-autoregressive mathematical problem solver based on a multi-tree structure.

Background

Automatic solving of mathematical problems (Math Word Problem, MWP) is an important sub-problem for machine reasoning. For a mathematical problem solver, a solution equation that meets specifications and is computable needs to be generated according to a given problem description and mathematical prior knowledge. More specifically, a mathematical topic is given, which consists of a description of the text containing the number q ₁ ,q ₂ ,…,q _n And some mathematical definitions, it is necessary for the model to automatically give an expression for solving the problem and calculate the final answer to the problem from this expression. The task of solving mathematical problems involves core problems in artificial intelligence research such as deep understanding of natural language text, reasoning capability of machine intelligence, interpretability and the like. As an important test benchmark for machine intelligence, mathematical problem solutions have been of interest to many researchers.

The mathematical problem solving task has undergone several stages since it was proposed. Including early template-based methods, statistical-based methods, and up to later deep learning-based methods. Methods based on statistics and templates rely heavily on manual labeling and statistics, and defined models lack generalization. With the assistance of high-performance computer equipment and Internet mass data, deep learning has made a great progress, neural network-based methods are also applied to mathematical problem solving tasks, and a large number of models have made striking manifestations in the mathematical problem solving tasks. Since 2019 tree decoders were proposed, tree decoders have been widely used in mathematical problem solving tasks and become the dominant method.

The tree structure naturally has the advantage of representing the mathematical expression, and the depth of the tree can correspond to the priority of the operations in the mathematical expression, so that the operation with higher priority is placed on a lower level, and the root node of the tree is the operator with the lowest priority. The tree structure first has structural information of mathematical expressions, and the tree decoder can generate target expressions in a top-down manner and form an expression tree. The top-down method can be understood as splitting the question problem into sub-questions continuously, and finally obtaining the answer of the question by solving the sub-questions continuously, which is a solution method conforming to human intuition and has good interpretability. The sub-expressions generated during the splitting process are in fact intermediate variables in the topics, so the tree decoder can well represent the structural information of the mathematical topics.

Existing mathematical problem solvers mostly follow the encoder-decoder architecture, and the solving process of such architecture is a process of translating a natural language-based problem description into a symbolic language-based mathematical description (solving expression). Under such architecture, the improvement of the mathematical problem solver can be mainly summarized into two major categories of improvement of the encoder and improvement of the decoder.

For encoders, it is critical to enhance the understanding of the neural network model for the textual description of the problem, and in addition, the mathematical problem requires a great deal of external mathematical prior and common sense prior knowledge, how to integrate such prior knowledge into the encoding to help the model to better understand the problem, which is also an important aspect of improving the model encoder.

For the decoder, the key problem is to extract mathematical relations between variables from text features and search solution expressions from the relations, so how to reduce the search space of the solution expressions is an effective thought for improving performance. Furthermore, the expression tree has some drawbacks, for example, the one-to-one relationship of the binary expression tree and the mathematical expression can make it highly dependent on the form of the mathematical expression, for example, a-b+c and c-b+a are expressions using the same semantic meaning of the exchange law, but two different binary trees are generated. The following expression tree of the multi-way tree structure appears, a plurality of parts with the same priority in the mathematical expression are put on the same layer, so that the inconsistency of the semantics is effectively solved, but a solver based on the multi-way tree structure solves the problem based on codes, does not directly solve on the multi-way tree, is split into one path (the path of the multi-way tree refers to one connecting line from a root node to a leaf node), and takes all paths of a training set as a retrieval space to retrieve all paths in one problem and reconstruct the multi-way tree. The solver only learns on the path level, the structural information of the multi-way tree cannot be fully utilized, the mathematical problem solver lacks generalization, paths which do not appear in a training set cannot be generated, and the performance and the effectiveness of the mathematical problem automatic solver are to be improved.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a non-autoregressive mathematical problem solver based on a multi-tree structure so as to explore and capture the relation between numbers and improve the performance of the mathematical problem solver.

To achieve the above object, the present invention provides a non-autoregressive mathematical problem solver based on a multi-tree structure, which adopts an encoder-decoder structure, comprising:

the title coding module is used for giving a mathematical title P= { w of natural language description ₁ ，w ₂ ，…，w _N W, where _n Representing the nth word, n=1, 2, …, N encoded as a distributed representation E containing its context information _p 、E _V Wherein E is _p For the characterization of the title of the entire mathematical title, E _V Is a numerical representation of mathematical questions;

the target-driven multi-way tree generation module is a top-down multi-way tree generator adopting a target driving mechanism and utilizes a topic to represent E _p As the root object of the multi-way tree, sub-objects are recursively generated in a top-down manner, the sub-objects being categorized as operands or operators, when the sub-objects are operationsWhen counting, directly obtaining the result of the current sub-target, and when the sub-target is an operator, the result of the sub-target cannot be obtained yet, and continuing to decompose downwards until the sub-target is an operand;

characterized in that it also comprises

A non-autoregressive target decomposition module for processing unordered multi-branch decomposition work:

first for the parent object E in the child object decomposition process _g First, father target E _g P is coded with I position _i I=1, 2, I fusion, resulting in a fused target E _pos ：

E _pos ＝[E _g +p ₁ ；E _g +p ₂ ；…，E _g +p _I ]；

Then, the multi-head self-attention mechanism is adopted for processing, namely, the target E _pos Inputting into a multi-head self-attention module for processing to obtain an output

Wherein,,

is a trainable parameter matrix in the multi-head self-attention module;

then, the outputs of the multi-head self-attention modules are connected through a multi-head mutual attention module

And candidate set Ec, will output +.>

As Q matrix of multi-head mutual attention module, candidate set E _c After passing through a feedforward neural network and trainable parameters

Multiplying to obtain K matrix and V matrix of multi-head mutual attention module, thus obtaining output +.>

Wherein d _k For coding vector dimensions, E _c The candidate sets are:

i.e. numbers, operators, constants and special characters, where E _V Is the output of the topic coding module, the rest E _op 、E _con 、E _N For trainable codes, N _b Is a special character for representing the number of child nodes;

output of

For contextual representation, for selecting the most relevant candidate set E through a pointer network _c Probability ω of selecting the j-th candidate character at position i _ij The method comprises the following steps:

wherein W is _p And W is _b For learning parameters, u is the column weight vector,

and->

Are respectively->

A vector representation of the i-th position and a vector representation of the j-th character of the candidate set;

thus, the probability distribution Ptr of all the candidate characters at the ith position is obtained _i ：

Ptr _i ＝softmax(ω _i )

Wherein omega _i ＝{ω _i1 ，ω _i2 ，...，ω _ij J is candidate set E _c The number of candidate characters in (a);

all probability distribution Ptr _i Forming a probability distribution matrix Ptr by rows; in the training process, ptr probability distribution matrix is used for calculating cross entropy loss with a real mathematical problem expression to train a non-autoregressive mathematical problem solver based on a multi-tree structure; in the prediction process, the Ptr probability distribution matrix is used for acquiring characters with the highest probability on each position, and the characters are used as predicted characters of each position to obtain a predicted mathematical problem expression.

The object of the present invention is thus achieved.

The invention relates to a non-autoregressive mathematical problem solver based on a multi-tree structure, which is added with a non-autoregressive target decomposition module based on the existing problem coding module and a target-driven multi-tree generation module, and is used for processing unordered multi-branch decomposition work, wherein a father target E is firstly subjected to _g P is coded with I position _i I=1, 2, I, then processing with a multi-head self-attention mechanism, and connecting the outputs of the multi-head self-attention modules with a multi-head mutual attention module

And candidate set E _c Get contextual representation +.>

Finally, according to the context representation +.>

Drawings

FIG. 1 is a diagram of one embodiment of an MTree tree structure;

FIG. 2 is a schematic diagram of an embodiment of the non-autoregressive mathematical problem solver of the present invention based on a multi-way tree structure.

Detailed Description

The following description of the embodiments of the invention is presented in conjunction with the accompanying drawings to provide a better understanding of the invention to those skilled in the art. It is to be expressly noted that in the description below, detailed descriptions of known functions and designs are omitted here as perhaps obscuring the present invention.

Given a mathematical question composed of natural language

The topic text contains the number q= { Q ₁ ，q ₂ ，...，q _M }，q _m Representing the mth digit, m=1, 2, …, M, requires the mathematical problem solver to give a mathematical expression O _s ＝{e ₁ ，e ₂ ，...，e _K E, where e _k E { +, -, ×,/} UQ ≡c, C is a constant set {1, pi, }.

The invention designs a non-autoregressive mathematical problem solver based on a multi-way tree structure (hereinafter referred to as MTrees), and adopts a target-driven top-down strategy to generate an expression tree so as to obtain a solving problem expression

1. MTree tree structure

MTree tree structures were introduced into MWP in 2022 to unify expression tree structures. Mtre is a multi-way tree, the internal nodes of which are operators,the external nodes are numbers or constants that appear in the title. Child nodes of an internal node in mtre may exchange with each other. For a mathematical expression O _s ＝{e ₁ ，e ₂ ，...，e _M E, where e _m E { +, -, ×, /) u Q u C, m=1, 2,..m, in the expression { ++, the operands of x may be interchanged, while the operands of { -,/} are not interchangeable. To solve this problem, the MTree tree structure introduces two new operators { × +/} to replace. X-means the opposite of the number of digital products, e.g., x- {2,3,4} equals one (2 x 3 x 4). +/represents the inverse of the sum of the operands, e.g. +/{2,3,4} equals

Furthermore, the number n at the leaf node may have a plurality of forms, including +.>

The structure of the MTree tree structure is shown in fig. 1.

2. Non-autoregressive mathematical problem solver based on multi-way tree structure

In this embodiment, as shown in fig. 1, the non-autoregressive mathematical problem solver based on the multi-tree structure of the present invention adopts an encoder-decoder structure, and includes a problem encoding module 1, a target-driven multi-tree generating module 2, and a non-autoregressive target decomposing module 3.

The topic coding module 1 is used for giving a mathematical topic P= { w of natural language description ₁ ，w ₂ ，…，w _N W, where _n Representing the nth word, n=1, 2, …, N encoded as a distributed representation E containing its context information _p 、E _V Wherein E is _p For the characterization of the title of the entire mathematical title, E _V Is a numerical representation of a mathematical problem.

The topic encoding module 1 encodes topics of natural language descriptions into a distributed representation containing its contextual information using a language model. In the prior art, there are two types of language models commonly used: recurrent Neural Networks (RNNs), such as LSTM or GRU, and pre-trained language models (PLM), such as BERT and RoBERTa. Inspired by the superior representation capabilities of the pre-trained model, recent work has tended to use the pre-trained model as a problem encoder. In the present invention, the topic encoding module 1 uses RoBERTa to obtain topic representations as well as digital representations. More specifically, given a mathematical title p= { w of natural language description ₁ ，w ₂ ，…，w _N The title coding module 1 adds a special character [ CLS ] before and after the title]And [ SEP ]]Into Roberta, will [ CLS ]]The output of the location is a representation of the entire title. The topic encoding module 1 can be expressed by the following formula:

E _p ，E _V ＝RoBERT a([CLS]；P；[SEP])#(1)

wherein E is _p For the characterization of the title of the entire mathematical title, E _V Is a numerical representation of a mathematical problem. Roberta fine tunes during training.

Expression tree decoders have been well studied in solving mathematical problems. The target driving mechanism gradually decomposes the whole problem into sub-problems, intuitively accords with the thought of human being as a problem, and realizes a top-down MTree generator by adopting the target driving mechanism. Specifically, the object-driven multi-way tree generation module 2 utilizes the topic representation E for a top-down multi-way tree generator employing an object-driven mechanism _p As a root object of the multi-way tree, sub-objects are recursively generated in a top-down manner, the sub-objects are classified as operands or operators, when the sub-objects are operands, the result of the current sub-object is directly obtained, when the sub-objects are operators, the result of the sub-object is not obtained yet, and the downward decomposition needs to be continued until the sub-objects are operands.

In the MTrees structure, a plurality of child nodes of the same father node are unordered, and a new non-autoregressive target decomposition module (NAGD) is designed based on the non-autoregressive transformation to process unordered multi-branch decomposition work. Specifically, the present invention relates to a method for manufacturing a semiconductor device.

First for the parent object E in the child object decomposition process _g First, father target E _g P is coded with I position _i ，i＝1,2，...，I, fusing to obtain a fused target E _pos ：

E _pos ＝E _pos ＝[E _g +p ₁ ；E _g +p ₂ ；…，E _g +p _I ]。

In this embodiment, the position code is generated by using sine and cosine position codes in the transducer:

where i denotes the position and l denotes the code p at position i _i D is the first dimension of _k Is the dimension of the encoded vector.

Then, a multi-head self-attention mechanism is adopted for processing to fuse information of different positions, namely, a target E _pos Inputting into a multi-head self-attention module for processing to obtain an output

Wherein,,

is a trainable parameter matrix in a multi-head self-attention module.

And candidate set Ec, will output +.>

Wherein d _k For coding vector dimensions, E _c The candidate sets are:

i.e. numbers, operators, constants and special characters, where E _V Is the output of the topic coding module, the rest E _op 、E _con 、E _N For trainable codes, N _b Is a special character used to represent the number of child nodes.

Output of

and->

Are respectively->

The vector representation of the i-th position and the vector representation of the j-th character of the candidate set.

Ptr _i ＝softmax(ω _i )

Wherein omega _i ＝{ω _i1 ，ω _i2 ，...，ω _iJ J is candidate set E _c The number of candidate characters in the set.

To calculate the loss between the predicted node and the actual node, we need to align the predicted node with the actual node. Thus, we define a pseudo-order for the real child nodes of each target. Output by operand format classifier

I.e. the context representation is sorted to get a pseudo-sequence, where operators are followed by constants and operands in the question. For the example in FIG. 2, the pseudo-order of child nodes is [ ×, ×, -40]. After predicting all child nodes, the mathematical problem solver needs to indicate the type +.>

We designed a simple feed forward neural network to classify the types and classify the typesThe penalty is combined with the previous node predictive classification penalty to co-train the entire mathematical problem solver.

3. MTree accuracy and MTree-based IoU

The MTree result can well solve the defects of various forms of the same mathematical expression. For solution expressions generated by various mathematical problem solvers, if the matching of characters is performed directly, false negative cases, such as a-b+c and c-b+a, may result. Therefore, the invention provides MTree precision and MTree IoU to evaluate the solving expression precision more accurately. Specifically, the mtre precision is that the solution expressions generated by different solvers are compared with the true expressions on mtre. It is also noted that such an evaluation of the entire expression tree does not measure the partial correctness of the expression, which is also an important way of evaluating solver capabilities as well as human capabilities. In the example of fig. 1, 13× (10+3) +40 and (13×10+3-40) are two erroneous expressions, the former using only erroneous "+" operations, but the other parts are correct. For this purpose, the invention proposes MTreeIoU to calculate the path accuracy of the connection root and leaf to measure the partial correctness of the expression. The calculation of MTreeIoU is as follows:

where I·| represents the number of elements in the set, E _p And P _g All paths in the predicted mtre and the true mtre, respectively.

4. Experimental results

In order to evaluate the MTree mathematical problem solver designed by the present invention, a number of experiments were performed on two commonly used public data sets Math23K and mapps. Math23K is a Chinese data set, contains 23162 mathematical topics, and MWPS contains 2300 mathematical topics.

The effectiveness of the MTree solver was first evaluated and analyzed and its performance was compared with the latest technology, and the result of the comparison is shown in table (1).

TABLE 1

FIG. 1 is a representation of a different mainstream math problem solver. From table 1 it can be seen that the present invention is superior to all baseline models, reaching new levels on both data sets, demonstrating the effectiveness and superiority of the present invention.

The SUMC Solver uses path prediction to reconstruct the expression tree, which is far lower in performance than the present invention, because the path prediction may make the nodes independent, breaking the arithmetic relationship between the nodes. While the solver implementing the attention mechanism of the present invention is able to explore and capture the relationships between numbers and produce better results. The behavior of the DeducReasoner, which implements complex relational modeling and deductive reasoning, is very close to our work, which may mean that introducing deductive reasoning into the MTreestructure will bring some new insight.

Ablation studies were also performed in the experiments to investigate the effectiveness of the proposed cross-target attention, the results are shown in table (2).

TABLE 2

Table 2 compares ablation experiments with cross-target attention. As can be seen from table 2, our model was significantly improved upon the addition of cross-target attention, e.g., from 83.2 to 84.4 on Math 23K. This shows that information belonging to different targets can be delivered and aggregated by this cross-target attention mechanism. This cross-target information integration significantly improves the accuracy of single-target decomposition.

To investigate the accuracy of MTrees and the effectiveness of MTree IoUs proposed by the present invention, the present study used 5 representative mathematical problem solvers with open source codes and compared them to the present invention on Math 23K. The results are shown in Table (3).

TABLE 3 Table 3

Table 3 is a comparison result of mtre precision and mtre IoU of the mainstream mathematical problem solver with open source code, and from the viewpoint of conforming to the theorem, the expression precision should be consistent with the value precision, but as can be seen in table 3, the expression precision is far lower than the value precision. While MTree accuracy is slightly lower than value accuracy, which is intuitive. As can be seen from Table 3, the value accuracy, MTree accuracy and MTree IoU are all the highest, and the performance of the mathematical problem solver is improved.

While the foregoing describes illustrative embodiments of the present invention to facilitate an understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, but is to be construed as protected by the accompanying claims insofar as various changes are within the spirit and scope of the present invention as defined and defined by the appended claims.

Claims

1. A non-autoregressive mathematical problem solver based on a multi-way tree structure employing an encoder-decoder structure comprising:

the title coding module is used for giving a mathematical title P= { w of natural language description ₁ ,w ₂ ,…,w _N W, where _n Representing the nth word, n=1, 2, …, N encoded as a distributed representation E containing its context information _p 、E _V Wherein E is _p For the characterization of the title of the entire mathematical title, E _V Is a numerical representation of mathematical questions;

the target-driven multi-way tree generation module is a top-down multi-way tree generator adopting a target driving mechanism and utilizes a topic to represent E _p As the root target of the multi-way tree, sub-targets are recursively generated in a top-down manner, the sub-targets are classified as operands or operators, the result of the current sub-target is directly obtained when the sub-target is an operand, and the sub-target is an operator which cannot be obtained yetObtaining the result of the sub-target, and continuing to decompose until the sub-target is an operand;

characterized in that it also comprises

first for the parent object E in the child object decomposition process _g First, father target E _g P is coded with I position _i Fusing i=1, 2, …, I to obtain a fused target E _pos ：

E _pos ＝[E _g +p ₁ ；E _g +p ₂ ；…,E _g +p _I ]；