CN114818698A

CN114818698A - Mixed word embedding method of natural language text and mathematical language text

Info

Publication number: CN114818698A
Application number: CN202210469691.4A
Authority: CN
Inventors: 董石; 唐家玉; 陶雪云; 王志锋; 田元; 陈加; 陈迪; 左明章
Original assignee: Central China Normal University
Current assignee: Central China Normal University
Priority date: 2022-04-28
Filing date: 2022-04-28
Publication date: 2022-07-29
Anticipated expiration: 2042-04-28
Also published as: CN114818698B

Abstract

The invention provides a mixed word embedding method of a natural language text and a mathematical language text, which comprises the following steps: identifying and preprocessing the mixed text to obtain a mathematical resource data set consisting of the text and a mathematical expression; carrying out position coding on a mathematical expression with a tree structure, and keeping the relative position translation of the tree structure unchanged; carrying out unified position coding on a text with linear structure characteristics and a mathematical expression with tree structure characteristics; and sending the relative position code into an attention module of a pre-training model, pre-training mathematical resources by adopting a masking language model and a next sentence prediction two standard pre-training tasks, and obtaining the embedded vector expression rich in context information by each symbol after the pre-training is finished.

Description

Mixed word embedding method of natural language text and mathematical language text

Technical Field

The invention relates to the technical field of natural language processing, in particular to a mixed word embedding method of a natural language text and a mathematical language text.

Background

Mathematical text refers to natural language text containing mathematical expressions, has characteristics of ambiguity and polymorphism, and widely appears in STEM disciplines and higher education. The natural language text has a linear structure characteristic, the mathematical expression has a tree structure characteristic, and the word embedding expression of the mixed text plays a vital role in the related field of the mathematical text. The traditional word embedding technology is suitable for processing texts with linear characteristics, and is difficult to process mathematical expressions with tree-structured characteristics.

The mathematical expression can be expressed into two most important Tree structures, one is a Symbol Layout Tree (SLT), and the representation is constructed according to the written line of the expression and has the appearance information of the mathematical expression; the other is an Operator Tree (OPT), which is constructed according to an Operator hierarchy in an expression and has mathematical expression semantic information. In 2021, Peng et al, university of Beijing, proposed a math expression pre-training model MathBERT based on BERT, which could obtain word-embedded expressions of mixed text. The author inputs LaTeX sequence of mathematical expression, OPT tree middle sequence traversal sequence and context text sequence as BERT model, and extracts the structure information of OPT tree by using attention masking matrix, so that adjacent nodes in the tree structure can be mutually visible in the masking matrix. And finally, adding a masking structure prediction task on the masking language model and the context prediction task to train a BERT model. However, this method artificially limits the attention calculation range and makes it difficult to acquire word-embedding information that depends on a long distance. In the same year, Shen et al, Pa.university, proposed a MathBERT model for mathematics education, and innovatively fine-tuned the BERT using an automatic scoring task and a cognitive tracking prediction task. But the author uses a simple linear sequence of mathematical text as input, ignores the tree structure characteristic of a mathematical expression and enables word embedding to lack mathematical semantic information.

Disclosure of Invention

Aiming at the technical problems that the word embedding expression is not comprehensive and accurate enough due to the fact that the mathematical text has extensive ambiguity and polymorphism characteristics depending on the context and the prior method is difficult to extract the semantic relation of the remote-dependent mathematical expression, the invention carries out position coding on the mathematical expression with the tree structure according to the position expression principle of the mathematical structure and the structural characteristics of the natural language and mathematical language mixed text, unifies the position coding on the text with linear sequence characteristics and the mathematical expression with the tree structure characteristics, and further obtains the word embedding expression of the natural language and mathematical language mixed text by finely adjusting a pre-training model under the mathematical language processing task.

In order to achieve the above object, the present invention provides a mixed word embedding method of a natural language text and a mathematical language text, comprising:

s1: preprocessing learning resources including a natural language text and a mathematical language text to obtain a mathematical resource data set, wherein the mathematical language text is a mathematical expression with a tree structure, and the natural language text is a context with linear sequence characteristics;

s2: absolute position coding is carried out on a mathematical expression with a tree structure by adopting a position coding mode based on branches, and relative position coding of two nodes in the tree structure is calculated according to an absolute position coding result;

s3: adopting negative integer position coding for a context with linear sequence characteristics, expressing by using a complement, then taking a root node of a tree structure as a first node of a linear sequence to realize uniform position coding of a mathematical expression and the context, and then calculating the relative position coding of any two nodes in the tree structure and the linear sequence according to the uniform position coding;

s4: inputting the data set of the mathematical resources obtained in the step S1 into a BERT pre-training model, wherein the BERT pre-training model is provided with a position coding module and an attention module, inputting the unified position code obtained in the step S3 into the position coding module, and coding the relative positions of any two nodes in the tree structure and the linear sequence calculated in the step S3 into the attention module of the BERT pre-training model for training, and pre-training the mathematical resources by adopting a masking language model and a next sentence prediction two standard pre-training tasks to obtain a trained word embedding model;

s5: and processing the natural language text and the mathematical language text by using the trained word embedding model to obtain the final mixed word embedding expression.

In one embodiment, the step S1 of preprocessing the learning resource containing the natural language text and the mathematical language text includes:

processing a learning resource containing a natural language text and a mathematical language text into a symbol sequence, wherein the mathematical expression is in a LaTeX format, and the mathematical resource data set is a mathematical resource set expressed as L ═ L { (L) ₁ ，L ₂ ，…，L _i ,…,L _N’ }，L _i Indicating the ith mathematical resource.

In one embodiment, processing a learning resource containing natural language text and mathematical language text into a sequence of symbols includes:

utilizing a mathematical expression in a LaTeX format of an im2 tagging word segmentation tool to perform word segmentation to obtain a symbol sequence of a word segmentation result of the mathematical expression, utilizing a TangenS tool to convert the mathematical expression in the LaTeX format into an operator OPT tree, and performing depth-first traversal on the OPT tree to obtain a symbol sequence of a traversal result of a tree structure of the mathematical expression, wherein the jth mathematical expression of the ith mathematical resource is represented as a jth mathematical expression of the ith mathematical resource

Denotes the j (th)The nth' symbol of the mathematical expression after being participled by the LaTeX format,

representing the kth symbol obtained by depth-first traversal of an OPT tree of the jth mathematical expression, wherein each mathematical resource consists of a natural language text and a mathematical expression, the natural language text is the context of the mathematical expression, and the mathematical expression M is _i,j Is in the context of C _i,j ＝{t _z |t _z ∈L _i ,|z-p _ij R is less than or equal to I, wherein t is equal to or less than R _z Representing the z-th natural language word, p _ij Is a mathematical expression M _i,j The position in the sequence as a whole, R is at most 64;

obtaining the expression of each mathematical resource according to the natural language and the symbolic expression form of the mathematical expression, wherein the ith mathematical resource is expressed as:

N _T is the total length of the natural language text;

when the mathematical expression M _i,j When the device is composed of a plurality of continuous equality or inequality, the device is divided into equal signs and unequal signs

Obtaining a mathematical resource data set according to the expression of each mathematical resource as a pre-training model data set

Wherein i is the number of the learning resource, j is the number of the mathematical expression, and w is the number of the sub-expression.

In one embodiment, S2 introduces a shift operation in absolute position encoding, the mathematical expression is N-ary tree, and the root node is defined as

For any subsequent child nodeThe encoding method is as follows:

s2.1: representing the sub-nodes of all branches by using one hot codes, wherein the one hot codes have N bits, and for the sub-node of the r-th branch, the r-th bit of the one hot codes from the right to the left is 1, and the rest bits are 0; s2.2: the position code of the father node is shifted to the left by N bits and then is added with the one hot code of the branch child node, the final absolute position code of the branch node is obtained, and any node in the final expression tree is represented as

Wherein n is a node

Absolute position coding of D _n In the form of an absolute position-coded decimal representation,

is D _n When calculating the relative position of the nodes in the tree, the following method is adopted:

wherein PE represents a relative position calculation function, T represents a tree,

representing nodes in a tree of mathematical expressions

And node

Relative position calculation function of D _m Is a node

The absolute value of the code of (a),

is D _m The length of the binary code of (a),<<representing the left shift operator.

In one embodiment, step S3 includes:

for natural language text with linear sequence features, relative position coding is performed, wherein the relative position between words is defined as the difference of absolute positions and expressed as

a and b represent the absolute position of the device,

a relative position calculation function for representing the second word, wherein the relative position coding is adopted by coding the position of the linear sequence by negative integer, and the length of the position coding of the linear sequence is L _S ＝n _T ×l _T ，n _T Maximum bifurcation Tree, l, representing a Tree Structure _T Representing the maximum layer number of the tree structure, and representing a negative integer by using a complementary code;

taking a root node of the tree structure as a head node of the linear sequence to realize unified position coding of two structures, wherein the calculation of relative positions in the unified position coding is as follows:

wherein the content of the first and second substances,

representing any two nodes

And

a function is calculated of the relative position between,

representing nodes in a linear sequence

And node

A relative position calculation function therebetween, S denotes a sequence,

representing nodes in a tree structure

And node

A function of the relative position between them is calculated,

representing nodes in a linear sequence

And a relative position calculation function between the root nodes,

representing root nodes and nodes in a tree structure

A function of the relative position between them is calculated,

representing nodes in a tree structure

And a relative position calculation function between the root nodes,

representing root node andnodes in a linear sequence

The function is calculated for the relative position therebetween.

In one embodiment, when the relative position codes of any two nodes in the tree structure and the linear sequence are sent to an attention module of a BERT pre-training model for training, the action expression of the relative position codes is as follows:

wherein the content of the first and second substances,

is the A and B relative position embedding vector of the first layer transform in the BERT model,

is the l-th layer a-th word embedding vector,

is the layer I, the B word embedding vector, W ^Q,l Is the Query matrix of the l-th layer, W ^K,l Is the Key matrix at level l, d is the word vector dimension,

is an unnormalized attention weight.

In one embodiment, the final mixed word embedding expression is calculated as:

wherein the content of the first and second substances,

to represent

The normalization process of (1) is performed,

denotes the B-th word embedding vector, W, of the l-th layer ^V,l Is the Value matrix of the l-th layer, n ₁ Represents a total of n ₁ The word, when the layer l is the last layer,

and embedding the words of the A-th word in the l-th layer as the final mixed word embedded expression of the expression.

One or more technical solutions in the embodiments of the present application have at least one or more of the following technical effects:

the invention provides a mixed word embedding method of a natural language text and a mathematical language text, which comprises the following steps: identifying and preprocessing the mixed text to obtain a mathematical resource data set consisting of the text and a mathematical expression; absolute position coding is carried out on a mathematical expression with a tree structure, relative position coding is calculated according to the absolute position coding result, and the relative position translation of the tree structure is kept unchanged; carrying out unified position coding on a text with linear structure characteristics and a mathematical expression with tree structure characteristics; and the relative position codes are sent to an attention module of a pre-training model, a masking language model and a next sentence prediction two standard pre-training tasks are adopted to pre-train mathematical resources, and after the pre-training is finished, each symbol can be represented by an embedded vector rich in context information, so that the information contained in the final word embedded expression is richer, and the expression is more accurate.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a flowchart of a mixed word embedding method for natural language text and mathematical language text according to an embodiment of the present invention;

FIG. 2 is a flow chart of data preprocessing for a word embedding method in an embodiment of the present invention;

FIG. 3 is a diagram of a mathematical expression tree in an embodiment of the present invention;

FIG. 4 is a schematic diagram of tree position coding in an embodiment of the present invention;

FIG. 5 is a schematic diagram of a unified position code in an embodiment of the present invention;

FIG. 6 is a diagram of a pre-training model in an embodiment of the invention.

Detailed Description

Compared with the prior art, the position coding is carried out on the tree structure of the mathematical expression, the relative position translation invariance of the position coding of the tree structure is ensured, and the text with linear structure characteristics and the mathematical expression with tree structure characteristics are subjected to unified position coding and are used for a BERT pre-training model, so that the word embedding expression is extracted.

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The embodiment of the invention provides a mixed word embedding method of a natural language text and a mathematical language text, which comprises the following steps:

s4: inputting the data set of the mathematical resources obtained in the step S1 into a BERT pre-training model, wherein the BERT pre-training model is provided with a position coding module and an attention module, the unified position coding obtained in the step S3 is inputted into the position coding module, the relative position codes of any two nodes in the tree structure and the linear sequence calculated in the step S3 are sent into the attention module of the BERT pre-training model for training, and the mathematical resources are pre-trained by adopting a masking language model and two standard pre-training tasks for predicting a next sentence, so that a trained word embedding model is obtained;

Please refer to fig. 1, which is a flowchart illustrating a method for embedding mixed words of natural language text and mathematical language text according to an embodiment of the present invention.

Specifically, S1 is preprocessing of learning resources, S2 is coding of a mathematical expression having a tree structure, S3 is position coding of a context, and unified position coding of the mathematical expression and the context is realized, S4 is training of a BERT pre-training model using unified position coding, and S5 is application of a trained word embedding model.

N' represents the total number of mathematical resources.

utilizing a mathematical expression in a LaTeX format of an im2 tagging word segmentation tool to perform word segmentation to obtain a mathematical expression word segmentation result symbol sequence, utilizing a TangenS tool to convert the mathematical expression in the LaTeX format into an operator OPT tree, and performing depth-first traversal on the OPT tree to obtain a mathematical expression tree structure traversal result symbol sequence, wherein the jth mathematical expression of the ith mathematical resource is represented as a jth mathematical expression

The nth' symbol of the jth mathematical expression after being participled by the LaTeX format is represented,

k-th symbol obtained by depth-first traversal of OPT tree representing jth mathematical expressionEach mathematical resource is composed of a natural language text and a mathematical expression, wherein the natural language text is the context of the mathematical expression, and the mathematical expression M _i,j Is in the context of C _i,j ＝{t _z |t _z ∈L _i ,|z-p _ij R is less than or equal to I, wherein t is equal to or less than R _z Representing the z-th natural language word, p _ij Is a mathematical expression M _i,j The position in the sequence as a whole, R is at most 64;

N _T is the total length of the natural language text;

In a specific implementation process, as shown in fig. 2, the data is preprocessed to obtain a data set of mathematical resources. And processing the learning resources including the natural language and the mathematical language into a symbol sequence through a Mathpix OCR interface to obtain a mathematical resource set, wherein the learning resources include an answering process. The Mathpix OCR interface may extract text and mathematical expressions from the picture and convert the mathematical formulas to LaTeX format. With L ═ L ₁ ，L ₂ ，…，L _i ,…,L _N Denotes the entire set of mathematical resources, where L _i The ith mathematical resource is represented.

And (3) segmenting the mathematical expression in the LaTeX format by using an im2markup segmentation tool for the mathematical expression of each mathematical resource to obtain a mathematical expression symbol sequence. Using the TangenS tool, the mathematical expression in LaTeX format is converted into an operator OPT tree, which is shown in FIG. 3. And performing depth-first traversal on the OPT tree to obtain a mathematical expression symbol sequence. Therefore, two symbol sequences can be obtained by one mathematical expression, and for the jth mathematical expression of the ith mathematical resource, the j can be expressed as:

wherein

and (4) a k-th operator or operand obtained by depth-first traversal of the OPT tree representing the jth mathematical expression.

Each mathematical resource is composed of a text and a mathematical expression, wherein the text is the context of the mathematical expression, and the context may contain description and interpretation information of the mathematical expression and is the key of semantic association between a mathematical symbol and a natural language. Since the length of the input data is limited, the context of the mathematical expression needs to be defined. Defining a mathematical expression M _i,j Is in the context of C _i,j ＝{t _z |t _z ∈L _i ,|z-p _ij R is less than or equal to I, wherein t is equal to or less than R _z Denotes the z-th natural language word, p _ij Is a mathematical expression M _i,j The position of the whole in the sequence is maximum to 64, namely 64 natural language text symbols before and after the mathematical expression are taken, and 128 natural language text symbols in total are taken as context;

in summary, for the ith mathematical resource, it can be expressed as:

N _T is the total length of the natural language text.

In learning resources, mathematical expression M _i,j Often, multi-step derivation is included, namely, a plurality of continuous equations or inequalities form an expression together, and the expression can be further segmented into sub-expressions by taking equal signs and unequal signs as marks

Each sub-expression

Only one equal sign or unequal sign is included, namely only one derivation step is provided, and the sub-expressions of the multi-step derivation expression share one context.

Finally, all learning resources are processed according to the process to form a pre-training model data set

Wherein i is a learning resource serial number, j is a mathematical expression serial number, and w is a sub-expression serial number.

For any subsequent child node, the coding mode is as follows:

s2.1: representing the sub-nodes of all branches by using one hot codes, wherein the one hot codes have N bits, and for the sub-node of the r-th branch, the r-th bit of the one hot codes from the right to the left is 1, and the rest bits are 0;

s2.2: the position code of the father node is shifted to the left by N bits and then is added with the one hot code of the branch child node, the final absolute position code of the branch node is obtained, and any node in the final expression tree is represented as

Wherein n is a node

Absolute position coding of, D _n In the form of an absolute position-coded decimal representation,

representing nodes in a mathematical expression tree

And node

Relative position calculation function of D _m Is a node

The absolute value of the code of (a),

In a specific implementation process, step S2 performs position coding on the mathematical expression having the tree structure, and ensures that the position coding having the tree structure has relative position translation invariance. The relative position can reflect the relation between words, in a linear sequence, the relative position is defined as the difference of absolute positions, and the translation invariance of the relative position refers to that the position offset is the same no matter the absolute position of a word, as long as the relative position is the same. The relative position is unchanged, so that semantic association with the words with fixed relative positions can be always realized no matter where the words are, and the training process can obtain the semantic relationship.

In order to ensure that the position code of the tree structure has relative position translation invariance, the invention adopts a branch-based coding mode and introduces displacement operation when calculating the relative position.

As shown in fig. 4, assuming that the mathematical expression tree is a 3-way tree, the root node is defined as (000), and any subsequent child node is a first branch child node, the root node is left-shifted by 3 bits and then a one hot code is added (001), and finally the branch child node is represented as (000001), similarly, the 2 nd branch child node is (000010), and the 3 rd branch child node is (000100).

Any node in the expression tree can be represented as

Wherein D _n A decimal representation of n is encoded for absolute position,

is D _n Of the relative position of the nodes in the computation tree

When the tree is in use, the relative position between any two nodes in the tree is independent of the absolute position according to the formula. As shown in fig. 4, node p _20,9 And node p _0,3 Has a relative position value of 20, node p _84,12 And node p _1,6 Is also 20.

In one embodiment, step S3 includes:

a and b represent the absolute position of the device,

wherein the content of the first and second substances,

representing any two nodes

And

a function of the relative position between them is calculated,

representing nodes in a linear sequence

And node

A relative position calculation function therebetween, S denotes a sequence,

representing nodes in a tree structure

And node

A function of the relative position between them is calculated,

representing nodes in a linear sequence

And a relative position calculation function between the root nodes,

representing root nodes and nodes in a tree structure

A function of the relative position between them is calculated,

representing nodes in a tree structure

And a relative position calculation function between the root nodes,

representing root nodes and nodes in a linear sequence

The function is calculated for the relative position therebetween.

Specifically, step S3 fuses the linear sequence of contextual natural language with the coding model of S2. For natural language text with a linear sequence, the relative position between words is defined as the difference in absolute position, which can be expressed as

a and b represent absolute positions, and translation invariance of relative positions is easily satisfied：

In order to unify the position coding of the linear sequence and the tree structure, the position of the linear sequence is coded by negative integers and is represented by a complementary code, so that the highest position of the position coding of all natural language words is 1, and the highest position of the position coding of the nodes of the tree structure is 0. Wherein the position code of the linear sequence has a length L _S ＝n _T ×l _T ，n _T Maximum bifurcation Tree, l, representing a Tree Structure _T Representing the maximum number of layers of the tree structure. Finally, the root node of the tree structure is used as the head node of the linear sequence to realize the unified representation of the two structures, as shown in fig. 5. The formula for calculating the relative position of the unified position code is shown above.

wherein the content of the first and second substances,

is the ith layer a word embedding vector,

is the B-th word embedding vector of the l-th layer, W ^Q,l Is the Query matrix of the l-th layer, W ^K,l Is the Key matrix at level l, d is the word vector dimension,

is not normalizedThe gravity value.

In one embodiment, the final mixed word embedding expression is calculated by:

wherein the content of the first and second substances,

to represent

The normalization process of (a) is performed,

Specifically, through the foregoing steps, learning resources are converted into data sets

Wherein the context sequence C _i,j And mathematical expressions

After word segmentation, the code is coded by using a uniform position coding mode and sent to a BERT pre-training model. As shown in fig. 6, the input of the BERT pre-training model consists of three parts, the first part is a context sequence, the second part is a mathematical expression LaTeX sequence, and the third part is a depth-first traversal sequence of a mathematical expression OPT tree. The relative position code is then fed into the attention module of the pre-training model BERT for training. In the attention module, the action expression of relative position coding is as beforeAs shown herein. By passing

The final mixed word embedding expression is obtained by the calculation formula.

After the pre-training model is adjusted, a Masking Language Model (MLM) and a Next Sentence Prediction (NSP) two standard pre-training tasks are adopted to obtain mixed word embedded expression of a natural Language text and a mathematical Language text.

In the MLM task, 15% of words are randomly extracted from three input sequences, and among the extracted words, 80% of the words are replaced by [ MASK ] labels, 10% of the words are replaced by random other words, and 10% of the words are not changed. The MLM task uses cross entropy as a loss function, which in this embodiment is expressed as:

wherein

Is the estimated probability of the masked word x after linear classification and Softmax regression, and p (x) is its original distribution, i.e., its one hot vector.

In the NSP task, 50% of the NSP tasks are randomly selected

Will be its context C _i,j The context of the random formula is replaced. The NSP task also generally uses cross entropy as a loss function, which in this embodiment is expressed as:

wherein if the context is not replaced by p-1, if the context is replaced by p-0,

is the estimated probability that the context matches the formula.

In summary, compared with the prior art, the invention performs position coding on the tree structure of the mathematical expression, ensures that the position coding of the tree structure has relative position translation invariance, unifies the position coding of the text with linear structure characteristics and the mathematical expression with tree structure characteristics, and uses the unified position coding for the BERT pre-training model to further extract the word embedding expression, so that the final word embedding expression contains more information and is more accurate in expression.

The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications may be made in addition to or substituted for those described in the detailed description by those skilled in the art without departing from the spirit of the invention or exceeding the scope of the claims set forth below.

Claims

1. A method for embedding mixed words of a natural language text and a mathematical language text, comprising:

2. The mixed word embedding method of the natural language text and the mathematical language text as claimed in claim 1, wherein the step S1 of preprocessing the learning resource containing the natural language text and the mathematical language text includes:

3. The method of mixed word embedding of natural language text and mathematical language text as claimed in claim 2, wherein processing a learning resource containing natural language text and mathematical language text into a sequence of symbols comprises:

N _T is the total length of the natural language text;

4. The method for embedding mixed words of natural language text and mathematical language text according to claim 1, wherein S2 is performing absolute position codingIn time, a displacement operation is introduced, a mathematical expression is an N-ary tree, and a root node is defined as

For any subsequent child node, the coding mode is as follows:

Wherein n is a node

representing nodes in a mathematical expression tree

And node

Relative position calculation function of D _m Is a node

The absolute value of the code of (a),

5. The mixed word embedding method of natural language text and mathematical language text as claimed in claim 1, wherein the step S3 includes:

a and b represent the absolute position of the object,

wherein the content of the first and second substances,

representing any two nodes

And

a function is calculated of the relative position between,

representing nodes in a linear sequence

And node

A relative position calculation function therebetween, S denotes a sequence,

representing nodes in a tree structure

And node

A function of the relative position between them is calculated,

representing nodes in a linear sequence

And a relative position calculation function between the root nodes,

representing root nodeNodes in point and tree structures

A function of the relative position between them is calculated,

representing nodes in a tree structure

And a relative position calculation function between the root nodes,

representing root nodes and nodes in a linear sequence

The function is calculated for the relative position therebetween.

6. The method of mixed word embedding of natural language text and mathematical language text as claimed in claim 1, wherein when the relative position codes of any two nodes in the tree structure and the linear sequence are fed into the attention module of the BERT pre-training model for training, the action expressions of the relative position codes are as follows:

wherein the content of the first and second substances,

is the ith layer a word embedding vector,

is an unnormalized attention weight.

7. The method of claim 6, wherein the final mixed-word embedding expression is calculated by:

wherein the content of the first and second substances,

to represent

The normalization process of (a) is performed,