CN113988300A

CN113988300A - Topic structure reasoning method and system

Info

Publication number: CN113988300A
Application number: CN202111281369.0A
Authority: CN
Inventors: 余伟江; 卢宇彤; 郑馥丹; 文英鹏; 陈志广
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2021-11-01
Filing date: 2021-11-01
Publication date: 2022-01-28

Abstract

The invention discloses a topic structure reasoning method and a system, wherein the method comprises the following steps: receiving an input text and constructing the mutual relation among contents to obtain a semantic relation; acquiring prior knowledge and fusing the prior knowledge with an input text to obtain implicit knowledge; combining the implicit knowledge and the semantic relation and generating a tree structure target; and analyzing the tree structure target according to a preset traversal sequence to obtain an equation expression. The system comprises: the system comprises a hierarchical reasoning encoder, a knowledge encoder, a tree structure encoder and a tree structure decoder. By using the invention, the construction of the mathematical expression based on the problem description is realized. The question structure reasoning method and the question structure reasoning system can be widely applied to the field of natural language processing.

Description

Topic structure reasoning method and system

Technical Field

The invention relates to the field of natural language processing, in particular to a question structure reasoning method and system.

Background

The math question task is an inference task for answering math inquiry based on question description, and is a cross-disciplinary research topic for connecting math and natural language processing. A question is described by a brief narrative and a question is posed about an unknown number. In recent years, multitasking research based on deep learning methods has received increasing attention. Typically, to solve the reasoning task of such mathematical queries, the user not only needs to parse the question and understand the context, but also needs to exercise external knowledge. However, previous approaches only learn textual descriptions from brief and limited narratives, without using any background knowledge that does not appear in the description, which limits the ability of the model to infer mathematical problems from a global perspective.

Disclosure of Invention

In order to solve the technical problems, the invention aims to provide a topic structure reasoning method and system, and a mathematical expression is constructed based on problem description.

The first technical scheme adopted by the invention is as follows: a topic structure reasoning method comprises the following steps:

receiving an input text and constructing the mutual relation among contents to obtain a semantic relation;

acquiring prior knowledge and fusing the prior knowledge with an input text to obtain implicit knowledge;

combining the implicit knowledge and the semantic relation and generating a tree structure target;

and analyzing the tree structure target according to a preset traversal sequence to obtain an equation expression.

Further, the interrelation between the contents includes a word-level reasoning relation and a sentence-level reasoning relation, and the step of receiving the input text and constructing the interrelation between the contents to obtain the semantic relation specifically includes:

receiving an input text;

constructing a word level reasoning relation based on the word level reasoning layer;

constructing a sentence-level reasoning relation based on the sentence-level reasoning layer;

the word-level reasoning layer and the sentence-level reasoning layer both adopt sequence coding based on GRU.

Further, the step of constructing a word-level inference relationship based on the word-level inference layer specifically includes:

encoding the words based on the bidirectional GRU;

merging the context information into word-level representation to generate word representation;

an attention mechanism is introduced to extract important words, and the importance of the words is measured by using word-level context vectors to obtain word-level reasoning relations;

the important word tokens are aggregated into a sentence vector.

Further, the step of constructing a sentence-level inference relationship based on the sentence-level inference layer specifically includes:

encoding the sentence based on the bidirectional GRU;

merging the information of adjacent sentences into sentence-level representation to generate sentence representation;

and (4) introducing an attention mechanism and measuring the importance of the sentence by using the sentence-level context vector to obtain a sentence-level reasoning relation.

Further, the step of obtaining the prior knowledge and fusing the prior knowledge with the input text to obtain the implicit knowledge specifically includes:

acquiring priori knowledge based on a Chinese pre-training model on a large corpus;

and fusing the prior knowledge and the input text, and extracting to obtain the implicit knowledge of the input text.

Further, the step of combining implicit knowledge and semantic relations and generating a tree structure target specifically includes:

combining implicit knowledge and semantic relations based on dot product self-adaptation, and generating enhanced representation through a linear mapping function;

dividing nodes of the tree structure into mathematical operators, common sense numerical values and numbers;

initializing a root node vector according to the enhanced representation;

iteratively using a root node vector with a trainable vector in conjunction with a target vocabulary pre-prepared with candidate words to predict probabilities belonging to nodes;

and generating tree nodes according to the prediction probability to obtain a tree structure target.

Further, the expression of the enhanced representation is as follows:

Y＝F([w^pY_p,w^hY_h])

in the above formula, Y represents an enhancement expression, Y represents_pRepresenting a semantic relationship, Y_hRepresenting implicit knowledge, w^pAnd w^hRepresenting the corresponding importance [. ]]Representing a chaining operation and F a linear mapping function.

Further, the step of analyzing the tree structure target according to a predetermined traversal order to obtain an equation expression specifically includes:

analyzing the tree structure target;

generating an intermediate operator according to the topmost node of the tree structure target;

and recursively completing the analysis of all the nodes from the left child node to the right child node to obtain an equation expression.

The second technical scheme adopted by the invention is as follows: a topic structure reasoning system comprising:

the hierarchical reasoning encoder is used for receiving an input text and constructing the mutual relation among contents to obtain a semantic relation;

the knowledge encoder is used for acquiring the prior knowledge and fusing the prior knowledge with the input text to obtain the implicit knowledge;

the tree structure encoder is used for combining the implicit knowledge and the semantic relation and generating a tree structure target;

a tree structure decoder; and the method is used for analyzing the tree structure target according to a preset traversal sequence to obtain an equation expression.

The method and the system have the beneficial effects that: the method effectively fuses the implicit knowledge into the model based on the knowledge encoder, can help the model to correctly analyze the semantics of words from complex texts, constructs the relation between words and sentences based on the hierarchical reasoning encoder, realizes the connection between an entity domain and a context domain, and further realizes the construction of a mathematical expression based on problem description.

Drawings

FIG. 1 is a flow chart of the steps of a topic structure inference method of the present invention;

FIG. 2 is a block diagram of a topic structure inference system of the present invention;

FIG. 3 is a schematic flow chart of an embodiment of the present invention.

FIG. 4 is a diagram of hierarchical inference according to an embodiment of the present invention.

Detailed Description

The invention is described in further detail below with reference to the figures and the specific embodiments. The step numbers in the following embodiments are provided only for convenience of illustration, the order between the steps is not limited at all, and the execution order of each step in the embodiments can be adapted according to the understanding of those skilled in the art.

The mathematical problem (MWP) can be represented as (P, E), where P is the problem text and E is the solution expression. Suppose that a description of an MWP has L sentences s_iEach sentence containing T_iA word, w_it(t∈[1,T]) Representing the word in the ith sentence. Our proposed encoder projects the original problem description into a vector representation, on the basis of which a tree-structured decoder is constructed to predict the mathematical expression.

Referring to fig. 1 and 3, the present invention provides a topic structure inference method, including the steps of:

s1, receiving the input text and constructing the mutual relation among the contents to obtain the semantic relation;

s2, acquiring prior knowledge and fusing the prior knowledge with the input text to obtain implicit knowledge;

s3, combining the implicit knowledge and the semantic relation and generating a tree structure target;

and S4, analyzing the tree structure target according to a preset traversal sequence to obtain an equation expression.

As a further preferred embodiment of the method, the interrelation between the contents includes a word-level inference relationship and a sentence-level inference relationship, and the step of receiving the input text and constructing the interrelation between the contents to obtain the semantic relationship specifically includes:

s11, receiving an input text;

s12, constructing a word level reasoning relation based on the word level reasoning layer;

s13, constructing sentence-level reasoning relation based on the sentence-level reasoning layer;

In particular, this step is implemented by means of a hierarchical inference encoder, a schematic diagram of which, with reference to fig. 4, takes into account that the different parts of the mathematical description do not have similar relevant information. Furthermore, determining relevant parts involves modeling the interactions between words, not just their isolated presence in the text. Thus, in this regard, the model includes two levels of reasoning mechanisms. One is word-level reasoning and the other is sentence-level reasoning, which allows the model to focus more or less on individual words and sentences in building the entire descriptive representation. The hierarchical inference encoder consists of two layers. The first layer is a word-level inference layer and the second layer is a sentence-level inference layer. Both of these two layer generalizations use GRU-based sequence coding.

Based on the coding of the sequence of the GRU, the GRU uses a gating mechanism to track the state of the sequence without using a separate memory unit. There are two types of doors: reset gate r_tAnd an update gate z_t. They combine to control how information is updated to the state. At time t, the GRU calculates the new state as:

this is the last state h_t-1And the current new state calculated by using the new sequence information

Linear interpolation of (2). Updating the door z_tIt is decided how much past information to keep and how much new information to add. z is a radical of_tThe updating is as follows:

z_t＝σ(W_zx_t+U_zh_t-1+b_z)

wherein x_tIs the sequence vector at time t. Candidate states

The calculation is as follows:

wherein r is_tTo reset the gate, the contribution of the previous state to the candidate state is controlled. If r is_tA value of 0 indicates that the user has forgotten to go. r is_tThe updating is as follows:

r_t＝σ(W_rx_t+U_rh_t-1+b_r)

the W and U are learnable matrix weights, and b is a learnable offset vector.

Further, as a preferred embodiment of the method, the step of constructing the word-level inference relationship based on the word-level inference layer specifically includes:

s121, coding the words based on the bidirectional GRU;

s122, combining the context information into word level representation to generate word representation;

s123, extracting important words by introducing an attention mechanism, and measuring the importance of the words by using word-level context vectors to obtain word-level reasoning relations;

and S124, aggregating the important word characteristics into a sentence vector.

In particular, word level reasoning. At this level, the model uses bidirectional GRUs to generate representations of words by integrating information from both directions. Thus, it incorporates context information into the word-level representation. Given a worded word w_it,t∈[1,T]Sentences and an embedding matrix W_eA bidirectional GRU includes a forward direction

For sentence s_iFrom the word w_i1Read the word w_iTAnd a reverse direction

From the word w_iTRead the word w_i1：

x_it＝W_ew_it,t∈[1,T],

Given word w_itIs obtained by concatenating the forward hidden state and the backward hidden state, i.e.

Can combine with w_itInformation of the entire sentence as a center. Note that not all words have the same effect on the meaning of a representation sentence. Therefore, we introduce an attention mechanism to extract words important to sentences and aggregate the representations of these words with large amounts of information into a sentence vector. In particular, the first and second (c) substrates,

u_it＝tanh(W_wh_it+b_w),

s_i＝∑α_ith_it.

we first input word-level properties h through a single-layer MLP_itTo obtain u_itAs h_itIs hidden from view. Then using the word-level context vector u_wMeasure the importance of the word and obtain a normalized importance weight alpha by a softmax function_it. Then, we compute a weighted sum of word representations as a sentence vector s from the learnable weights_i. In the training process, the context vector u of the word_wPerforming random initialization and joint learning

Further, as a preferred embodiment of the method, the step of constructing the sentence-level inference relationship based on the sentence-level inference layer specifically includes:

s131, coding the sentence based on the bidirectional GRU;

s132, merging the information of the adjacent sentences into sentence level representation to generate sentence representation;

s133, an attention mechanism is introduced, and the importance of the sentence is measured by using the sentence-level context vector, so that a sentence-level reasoning relation is obtained.

Specifically, sentence-level reasoning. Given sentence vector s_iWe get the problem description vector in a similar way. We encode sentences using bidirectional GRUs:

wherein the content of the first and second substances,

and

forward GRU and reverse GRU are indicated, respectively. We will want to

And

connected to obtain the target representation of sentence i

h_iThe information of the neighboring sentences of sentence i is integrated, but still focused on sentence i. To reward sentences associated with correctly parsing problem descriptions, we again use the attention mechanism to introduce a sentence-level context vector u_sTo measure the importance of the sentence, the formula is:

u_it＝tanh(W_sh_i+b_s),

v＝∑α_ih_i,

where v is a global text vector that summarizes all the information of the sentence in one description. Also, during the training process, the sentence-level context vector u_sRandom initialization and joint learning are also possible.

Further, as a preferred embodiment of the method, the step of obtaining the prior knowledge and fusing the prior knowledge with the input text to obtain the implicit knowledge specifically includes:

s21, acquiring priori knowledge based on a Chinese pre-training model on a large corpus;

and S22, fusing the prior knowledge and the input text, and extracting to obtain the implicit knowledge of the input text.

Specifically, the step is realized by a pre-training knowledge encoder, a language model based on a transform is adopted, an encoder is established, and a Roberta model which is pre-trained on BooksCorpus and Wikipedia of a large corpus is used for pre-training so as to capture implicit knowledge. As in BERT, we mark a description with WordPiece

To obtain

The sequence of tokens is embedded into a pre-trained Roberta embedding, and the position code of the Roberta is modified to obtain a series of d-dimensional token representations

We input this information into a transformer-based pre-trained knowledge encoder, fine-tuning the representation during the training process. We average the outputs of all transform steps to get the combined implicit knowledge representation Y_p。

Further, as a preferred embodiment of the method, the step of combining implicit knowledge and semantic relations and generating a tree structure target specifically includes:

s31, combining implicit knowledge and semantic relations based on dot product self-adaptation, and generating an enhanced representation through a linear mapping function;

specifically, the result Y is obtained by a pre-training knowledge encoder and a hierarchical inference encoder, respectively_pAnd Y_hLater, we adaptively merge Y using the two encoder-end parsers as shown in FIG. 1_pAnd Y_hAn enhanced representation Y is obtained for final decoding. The enhancement representation Y may be expressed as:

Y＝F([w^pY_p,w^hY_h]),

wherein w^pAnd w^hFrom Y_pAnd Y_hDerived, to calculate the importance of the task. [. the]Indicating a linking operation. We use a simple dot product to merge Y_pAnd Y_hThese two representations. The enhanced representation Y is then generated for final decoding using a linear mapping function F, such as a fully connected layer. w is a^pAnd w^hCan be calculated as:

wherein, W_pAnd W_hAre all trainable weighting matrices that are used in a training process,

and

representing different MLPs.

S32, combining implicit knowledge and semantic relations based on dot product self-adaptation, and generating an enhanced representation through a linear mapping function;

s33, dividing the nodes of the tree structure into mathematical operators, common sense numerical values and numbers;

s34, initializing a root node vector according to the enhancement representation;

s35, combining a target vocabulary table prepared with candidate words, and predicting the probability of the node by iteratively using a root node vector with a trainable vector;

and S36, generating tree nodes according to the prediction probability to obtain a tree structure target.

In particular, our model initializes the root node vector according to the global context representation Y from both encoders. The expression tree in the decoder contains three types of nodes: mathematical operator V_opA common sense value V encountered in the target expression but not present in the problem text_con(e.g., a rabbit has 4 legs) and the number n encountered in question P_P. For target expression V^tarIs defined as:

wherein M is_opAnd M_conAre two trainable word embedding matrices independent of the specific problem.For one n_PFrom the encoder, we adopt the corresponding hidden state

As its token embedding, where loc (y, P) is the index position of the value y in P. Mathematical operator V_opOccupying the non-leaf position. n is_PThe representation of (c) relies on certain MWP descriptions. Since y must derive the corresponding hidden state from the encoder output

V_opAnd V_conAre respectively represented by two embedded matrixes M_opAnd M_conAre independently obtained.

The tree decoder includes a tree encoder and a tree decoder, and as with the tree encoder, we prepare candidate words for operators and numbers in the target vocabulary, and then iteratively use a root vector with a trainable vector to predict the probability of node token y in the target vocabulary. Then, the specific y with the highest probability is replaced with a tree node according to the rule in equation (19).

Further as a preferred embodiment of the method, the expression of the enhanced representation is as follows:

Y＝F([w^pY_p,w^hY_h])

Further, as a preferred embodiment of the method, the step of analyzing the tree structure target according to a predetermined traversal order to obtain an equation expression specifically includes:

s41, analyzing the tree structure target;

s42, generating an intermediate operator according to the topmost node of the tree structure target;

and S43, recursively completing the analysis of all the nodes from the left child node to the right child node to obtain an equation expression.

In particular, mathematical equations are usually composed of operators and variables. First, defining variables as leaf nodes, each operation node needs two child nodes. Then, the tree structure decoder parses the equation expressions in a predetermined traversal order. The middle most operator is generated first, and then the left child node is generated. The generation process recurses until the last leaf node is generated. Next, the right child node is generated in the same manner.

Further as a preferred embodiment of the method, since the MWP task can be expressed as (P, E), we define its loss function as

It can be expressed as predicting the token y of the t node_tIs determined by the sum of the negative log-likelihood probabilities of (c). Formally, the objective function of the training optimizer is:

wherein m is the size of E, q_tAnd Y_tThe target vector of the t-th node and the context vector thereof. p is calculated using a distributed computation function in the target-driven tree structure.

As shown in fig. 2, a topic structure inference system includes:

The contents in the above method embodiments are all applicable to the present system embodiment, the functions specifically implemented by the present system embodiment are the same as those in the above method embodiment, and the beneficial effects achieved by the present system embodiment are also the same as those achieved by the above method embodiment.

While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A topic structure reasoning method is characterized by comprising the following steps:

2. The topic structure reasoning method according to claim 1, wherein the interrelations between the contents include a word-level reasoning relationship and a sentence-level reasoning relationship, and the step of receiving the input text and constructing the interrelations between the contents to obtain the semantic relationship specifically comprises:

receiving an input text;

combining the word level reasoning relation and the sentence level reasoning relation to obtain a semantic relation;

3. The topic structure reasoning method of claim 2, wherein the step of constructing the word-level reasoning relationship based on the word-level reasoning layer specifically comprises:

encoding the words based on the bidirectional GRU;

the important word tokens are aggregated into a sentence vector.

4. The topic structure reasoning method of claim 3, wherein the step of constructing a sentence-level reasoning relationship based on a sentence-level reasoning layer specifically comprises:

encoding the sentence based on the bidirectional GRU;

5. The question structure reasoning method of claim 2, wherein the step of obtaining the prior knowledge and fusing the prior knowledge with the input text to obtain the implicit knowledge specifically comprises:

6. The topic structure reasoning method according to claim 3, wherein the step of combining implicit knowledge and semantic relations and generating a tree structure target specifically comprises:

initializing a root node vector according to the enhanced representation;

7. The topic structure reasoning method of claim 6, wherein the expression of the enhanced representation is as follows:

Y＝F([w^pY_p,w^hY_h])

8. The topic structure reasoning method of claim 7, wherein the step of resolving the tree structure target according to a predetermined traversal order to obtain an equation expression specifically comprises:

analyzing the tree structure target;

9. A topic structure reasoning system, comprising: