CN114818698B

CN114818698B - Mixed word embedding method for natural language text and mathematical language text

Info

Publication number: CN114818698B
Application number: CN202210469691.4A
Authority: CN
Inventors: 董石; 唐家玉; 陶雪云; 王志锋; 田元; 陈加; 陈迪; 左明章
Original assignee: Central China Normal University
Current assignee: Central China Normal University
Priority date: 2022-04-28
Filing date: 2022-04-28
Publication date: 2024-04-16
Anticipated expiration: 2042-04-28
Also published as: CN114818698A

Abstract

The invention provides a mixed word embedding method of natural language text and mathematical language text, which comprises the following steps: identifying and preprocessing the mixed text to obtain a mathematical resource data set consisting of the text and the mathematical expression; performing position coding on a mathematical expression with a tree structure, and keeping the relative position translation of the tree structure unchanged; unified position coding is carried out on the text with the linear structure characteristic and the mathematical expression with the tree structure characteristic; and sending the relative position codes to an attention module of a pre-training model, pre-training the mathematical resource by adopting a masking language model and a sentence-in prediction two standard pre-training task, and obtaining an embedded vector representation rich in context information by each symbol after the pre-training is completed.

Description

Mixed word embedding method for natural language text and mathematical language text

Technical Field

The invention relates to the technical field of natural language processing, in particular to a mixed word embedding method of natural language text and mathematical language text.

Background

Mathematical text refers to natural language text containing mathematical expressions, and has characteristics of ambiguity and polymorphism, which are widely used in STEM subjects and higher education. Natural language text has linear structural features, while mathematical expressions have tree structural features, and word embedded expressions of such mixed text have a crucial role in the field of related mathematical texts. Conventional word embedding techniques are suitable for processing text with linear features, and difficult to process mathematical expressions with tree-structured features.

The mathematical expression can be expressed as two most important tree structures, one is a Symbol layout tree (Symbol LAYER TREE, SLT), which is constructed according to the writing line of the expression, with mathematical expression appearance information; the other is an Operator Tree (OPT), which is constructed from an Operator hierarchy in the expression, with mathematical expression semantic information. In 2021, peng et al, university of beijing, proposed a BERT-based mathematical expression pre-training model MathBERT that could obtain word embedded expressions of mixed text. The authors input the LaTeX sequence of the mathematical expression, the intermediate traversal sequence of the OPT tree, and the context text sequence as the BERT model, and extract the structural information of the OPT tree using the attention masking matrix, so that adjacent nodes in the tree structure are visible to each other in the masking matrix. And finally, adding a masking structure prediction task to the masking language model and the context prediction task to train the BERT model. However, the method artificially limits the calculation range of the attention, and is difficult to acquire word embedding information which is far dependent. In the same year, shen et al, university of pennsylvania, have proposed a MathBERT model oriented to mathematical education, innovatively using an automatic scoring task and a cognitive tracking prediction task to fine tune BERT. However, the author uses a simple linear sequence of mathematical texts as input, ignores the tree structure features of mathematical expressions, and makes word embedding lack of mathematical semantic information.

Disclosure of Invention

Aiming at the technical problems that the mathematical text is extensive and depends on the ambiguity and polymorphism characteristics of the context, and the semantic relation of the mathematical expression which is difficult to extract and depends on a long distance by the existing method, so that word embedding representation is not comprehensive and accurate, the invention performs position coding on the mathematical expression with a tree structure according to the position expression principle of the mathematical structure and the structural characteristics of the mixed text of the natural language and the mathematical language, performs unified position coding on the text with the linear sequence characteristics and the mathematical expression with the tree structure characteristics, and further obtains the word embedding representation of the mixed text of the natural language and the mathematical language by fine adjustment of a pre-training model under the mathematical language processing task.

In order to achieve the above object, the present invention provides a mixed word embedding method for natural language text and mathematical language text, comprising:

S1: preprocessing learning resources containing natural language texts and mathematical language texts to obtain a mathematical resource data set, wherein the mathematical language texts are mathematical expressions with tree structures, and the natural language texts are contexts with linear sequence characteristics;

S2: absolute position coding is carried out on the mathematical expression with the tree structure by adopting a position coding mode based on branches, and the relative position coding of two nodes in the tree structure is calculated according to the absolute position coding result;

s3: the method comprises the steps of adopting negative integer position coding for a context with linear sequence characteristics, using complementary codes to represent, then taking a root node of a tree structure as a first node of a linear sequence, realizing unified position coding of a mathematical expression and the context, and calculating relative position coding of any two nodes in the tree structure and the linear sequence according to the unified position coding;

S4: inputting the mathematical resource data set obtained in the step S1 into a BERT pre-training model, wherein the BERT pre-training model is provided with a position coding module and an attention module, inputting the unified position code obtained in the step S3 into the position coding module, and sending the relative position codes of any two nodes in the tree structure and the linear sequence calculated in the step S3 into the attention module of the BERT pre-training model for training, and pre-training mathematical resources by adopting a masking language model and a next sentence prediction two standard pre-training task to obtain a trained word embedding model;

S5: and processing the natural language text and the mathematical language text by using the trained word embedding model to obtain the final mixed word embedding expression.

In one embodiment, step S1 pre-processes a learning resource containing natural language text and mathematical language text, including:

processing learning resources containing natural language text and mathematical language text into a symbol sequence, wherein the mathematical expression is in LaTeX format, the mathematical resource data set is a mathematical resource set, and the mathematical resource data set is expressed as L= { L ₁,L₂,…,L_i,…,L_N'},L_i and represents an ith mathematical resource.

In one embodiment, processing a learning resource containing natural language text and mathematical language text into a sequence of symbols includes:

Performing word segmentation by utilizing a mathematical expression in an im2mark up word segmentation tool LaTeX format to obtain a symbol sequence of a mathematical expression word segmentation result, converting the mathematical expression in the LaTeX format into an operator OPT tree by utilizing a TangenS tool, performing depth-first traversal on the OPT tree to obtain a symbol sequence of a mathematical expression tree structure traversal result, wherein the j-th mathematical expression of the i-th mathematical resource is expressed as and represents an n' symbol of the j-th mathematical expression after the word segmentation in the LaTeX format, the/> represents a k symbol of the j-th mathematical expression, which is obtained by performing depth-first traversal on the OPT tree, each mathematical resource consists of a natural language text and a mathematical expression, wherein the natural language text is the context of the mathematical expression, the context of the mathematical expression M _i,j is C _i,j＝{t_z|t_z∈L_i,|z-p_ij I is not more than R, the t _z represents the z-th natural language, and the p _ij is the position of the mathematical expression M _i,j in the sequence as a whole, and R is maximally 64;

The expression of each mathematical resource is obtained according to the symbol expression form of the natural language and the mathematical expression, wherein the ith mathematical resource is expressed as follows:

N _T is the total length of the natural language text;

When the mathematical expression M _i,j is made up of a plurality of equations or inequalities, the mathematical resource data set is obtained from the expression of each mathematical resource by dividing with the equal and unequal signs as labels, as the pre-training model data set , where i is the learning resource number, j is the mathematical expression number, and w is the sub-expression number.

In one embodiment, when S2 performs absolute position coding, a displacement operation is introduced, the mathematical expression is N-ary tree, the root node is pairs of any subsequent child nodes, and the coding mode is as follows:

S2.1: the child nodes of all branches are represented by one hot code, the one hot code has N bits, and for the child node of the r branch, the r bit of the one hot code from right to left is 1, and the rest bits are 0; s2.2: the position code of the father node is shifted left by N bits and then added with one hot code of the branch child node, then the final absolute position code of the branch node is obtained, any node in the final expression tree is expressed as , wherein N is the absolute position code of node/> , D _n is the decimal representation of the absolute position code, and/> is the binary code length of D _n, and when the relative position of the node in the tree is calculated, the following method is adopted:

Wherein PE represents a relative position calculation function, T represents a tree, represents a relative position calculation function of a node and a node/> in a mathematical expression tree, D _m is an absolute code value of the node/> ,/> is a binary code length of D _m, and < represents a left shift operator.

In one embodiment, step S3 includes:

For natural language text with linear sequence characteristics, carrying out relative position coding, wherein the relative positions among words are defined as differences of absolute positions, the differences are expressed as a and b to express absolute positions, and the relative position calculating function of the second word is expressed by '', wherein the adopted relative position coding mode is to code the positions of the linear sequence by negative integers, the length of the linear sequence position coding is L _S＝n_T×l_T,n_T to express the maximum bifurcation tree of a tree structure, L _T to express the maximum layer number of the tree structure and the complementary codes to express the negative integers;

And taking the root node of the tree structure as the first node of the linear sequence to realize unified position coding of two structures, wherein the calculation of the relative position in the unified position coding is as follows:

wherein represents a relative position calculation function between any two nodes/> and/> ,/> represents a relative position calculation function between a node/> and a node/> in the linear sequence, S represents a sequence,/> represents a relative position calculation function between a node/> and a node/> in the tree structure,/> represents a relative position calculation function between a node/> and a root node in the linear sequence,/> represents a relative position calculation function between a root node and a node/> in the tree structure,/> represents a relative position calculation function between a node/> and a root node in the tree structure, and/> represents a relative position calculation function between a root node and a node/> in the linear sequence.

In one embodiment, when the attention module of the BERT pre-training model is trained by sending the relative position codes of any two nodes in the tree structure and the linear sequence, the functional expression of the relative position codes is as follows:

Wherein is the A and B relative position embedding vector of the first layer of transformers in the BERT model,/> is the first layer of A word embedding vector,/> is the first layer of B word embedding vector, W ^Q,l is the Query matrix of the first layer, W ^K,l is the Key matrix of the first layer, d is the word vector dimension,/> is the unnormalized attention weight.

In one embodiment, the final mixed word embedding expression is calculated by:

Wherein represents the normalization process of/> ,/> represents the first layer of the B-th word embedding vector, W ^V,l is the Value matrix of the first layer, n ₁ represents a total of n ₁ words, and when the first layer is the last layer,/> is the word embedding of the a-th word in the first layer as the expression of the final mixed word embedding expression.

The above technical solutions in the embodiments of the present application at least have one or more of the following technical effects:

The invention provides a mixed word embedding method of natural language text and mathematical language text, which comprises the following steps: identifying and preprocessing the mixed text to obtain a mathematical resource data set consisting of the text and the mathematical expression; absolute position coding is carried out on the mathematical expression with the tree structure, and relative position coding is calculated according to the absolute position coding result, so that the relative position translation of the tree structure is kept unchanged; unified position coding is carried out on the text with the linear structure characteristic and the mathematical expression with the tree structure characteristic; and sending the relative position codes to an attention module of a pre-training model, pre-training mathematical resources by adopting a masking language model and a sentence-in prediction two standard pre-training task, and obtaining an embedded vector representation rich in context information by each symbol after the pre-training is finished, so that the information contained in the final word embedded expression is richer and the expression is more accurate.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart of a method for embedding mixed words of natural language text and mathematical language text provided by an embodiment of the present invention;

FIG. 2 is a flow chart of data preprocessing of a word embedding method in an embodiment of the present invention;

FIG. 3 is a schematic representation of a tree of mathematical expressions in an embodiment of the invention;

FIG. 4 is a schematic diagram of tree position coding in an embodiment of the invention;

FIG. 5 is a diagram of unified position coding in an embodiment of the invention;

FIG. 6 is a schematic diagram of a pre-training model in an embodiment of the invention.

Detailed Description

Compared with the prior art, the method has the advantages that the position coding is carried out on the tree structure of the data expression, the relative position translation invariance of the position coding of the tree structure is guaranteed, the text with linear structural characteristics and the mathematical expression with the tree structural characteristics are uniformly position coded, the text and the mathematical expression with the tree structural characteristics are used for the BERT pre-training model, and further the word embedding expression is extracted.

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The embodiment of the invention provides a mixed word embedding method of natural language text and mathematical language text, which comprises the following steps:

S4: inputting the mathematical resource data set obtained in the step S1 into a BERT pre-training model, wherein the BERT pre-training model is provided with a position coding module and an attention module, the unified position coding obtained in the step S3 is input into the position coding module, the relative position coding of any two nodes in the tree structure and the linear sequence calculated in the step S3 is sent into the attention module of the BERT pre-training model for training, and a masking language model and a sentence-making prediction two standard pre-training task are adopted for pre-training mathematical resources, so that a trained word embedding model is obtained;

Referring to fig. 1, a flowchart of a method for embedding mixed words of natural language text and mathematical language text is provided in an embodiment of the present invention.

Specifically, S1 is preprocessing of learning resources, S2 is encoding of a mathematical expression having a tree structure, S3 is position encoding of a context, and unified position encoding of the mathematical expression and the context is achieved, S4 is training of a BERT pre-training model by using unified position encoding, and S5 is application of a trained word embedding model.

N' represents the total number of mathematical resources.

Performing word segmentation by utilizing a mathematical expression in an im2mark up word segmentation tool LaTeX format to obtain a mathematical expression word segmentation result symbol sequence, converting the mathematical expression in the LaTeX format into an operator OPT tree by utilizing a TangenS tool, performing depth-first traversal on the OPT tree to obtain a mathematical expression tree structure traversal result symbol sequence, wherein the j-th mathematical expression of the i-th mathematical resource is expressed as to represent the n' symbol of the j-th mathematical expression after word segmentation in the LaTeX format, and/ to represent the k symbol of the OPT tree of the j-th mathematical expression after depth-first traversal, wherein each mathematical resource consists of a natural language text and a mathematical expression, the natural language text is the context of the mathematical expression, the context of the mathematical expression M _i,j is C _i,j＝{t_z|t_z∈L_i,|z-p_ij I & ltoreq R }, t _z is the z-th natural language word, and p _ij is the position of the mathematical expression M _i,j in the sequence as a whole, and R is 64;

N _T is the total length of the natural language text;

In the specific implementation process, as shown in fig. 2, the data is preprocessed to obtain a mathematical resource data set. Processing learning resources containing natural language and mathematical language into symbol sequences through Mathpix OCR interfaces to obtain a mathematical resource set, wherein the learning resources contain a solution process. The Mathpix OCR interface can extract text and mathematical expressions from the pictures and convert the mathematical formulas to LaTeX format. The entire set of mathematical resources is denoted by l= { L ₁,L₂,…,L_i,…,L_N } where L _i denotes the ith mathematical resource.

And (3) word segmentation is carried out on the mathematical expression of each mathematical resource by using an im2mark up word segmentation tool to obtain a mathematical expression symbol sequence. The mathematical expression in LaTeX format is converted into an operator OPT tree by TangenS tool, and the operator OPT tree of the mathematical expression is shown in figure 3. And performing depth-first traversal on the OPT tree to obtain a mathematical expression symbol sequence. Thus, one mathematical expression can yield two symbol sequences, which for the j-th mathematical expression of the i-th mathematical resource can be expressed as:

Wherein/> denotes the nth symbol of the jth mathematical expression after the LaTeX format word segmentation,/> denotes the kth operator or operand obtained by depth-first traversal of the OPT tree of the jth mathematical expression.

Each mathematical resource consists of text and a mathematical expression, where text is the context of the mathematical expression, which may contain descriptive and explanatory information for the mathematical expression, which is the key for semantic association between mathematical symbols and natural language. Since the length of input data is limited, the context of the mathematical expression needs to be defined. Defining the context of the mathematical expression M _i,j as C _i,j＝{t_z|t_z∈L_i,|z-p_ij I.ltoreq.R, wherein t _z represents the z-th natural language word, p _ij is the position of the mathematical expression M _i,j as a whole in the sequence, rmax is 64, namely taking 64 of the front and rear of the mathematical expression, and 128 natural language text symbols are taken as the context;

to sum up, for the ith mathematical resource, it can be expressed as:

N_T Is the total length of the natural language text.

In learning resources, the mathematical expression M _i,j often contains multiple deductions, i.e., multiple ligations or inequalities together form an expression, which may be further split into sub-expressions with equal and unequal signs as labels, each sub-expression/> containing only one equal or unequal sign, i.e., only one deduction step, the sub-expressions of the multiple deduction expressions sharing a context.

Finally, all learning resources are processed according to the process to form a pre-training model dataset , wherein i is the learning resource serial number, j is the mathematical expression number, and w is the sub-expression number.

S2.1: the child nodes of all branches are represented by one hot code, the one hot code has N bits, and for the child node of the r branch, the r bit of the one hot code from right to left is 1, and the rest bits are 0;

s2.2: the position code of the father node is shifted left by N bits and then added with one hot code of the branch child node, then the final absolute position code of the branch node is obtained, any node in the final expression tree is expressed as , wherein N is the absolute position code of the node , D _n is the decimal representation of the absolute position code, and/ is the binary code length of D _n, and when the relative position of the node in the tree is calculated, the following method is adopted:

In the specific implementation process, step S2 performs position coding on the mathematical expression having the tree structure, and ensures that the position coding of the tree structure has relative position translation invariance. The relative position can reflect the word-to-word relationship, and in a linear sequence, the relative position is defined as the difference in absolute position, and the relative position translational invariance means that the position offset should be the same regardless of the absolute position of the word as long as the relative position is the same. The relative position is unchanged, so that no matter where the word is, the word with fixed relative position can be subjected to semantic association, and the training process can obtain the semantic relationship.

In order to ensure that the position coding of the tree structure has relative position translation invariance, the invention adopts a coding mode based on branches, and displacement operation is introduced when the relative position is calculated.

As shown in fig. 4, assuming that the mathematical expression tree is a 3-way tree, a root node is defined as (000), any one of the following child nodes is defined as a first branch child node, the root node is shifted left by 3 bits and then one hot code is added (001), and finally the branch child node is represented as (000001), and similarly, a2 nd branch child node is represented as (000010), and a3 rd branch child node is represented as (000100).

Any node in the expression tree can be represented as where D _n is a decimal representation of the absolute position code n and/> is the binary code length of D _n, and when calculating the relative position of the node in the tree/> , it can be seen from the above formula that the relative position between any two nodes in the tree is independent of the absolute position. As shown in fig. 4, the relative position value of the node p _20,9 and the node p _0,3 is 20, and the relative position value of the node p _84,12 and the node p _1,6 is also 20.

In one embodiment, step S3 includes:

Wherein represents a relative position calculation function between any two nodes/> and/> ,/> represents a relative position calculation function between a node/> and a node/> in the linear sequence, S represents sequence,/> represents a relative position calculation function between a node/> and a node/> in the tree structure,/> represents a relative position calculation function between a node/> and a root node in the linear sequence,/> represents a relative position calculation function between a root node and a node in the tree structure,/> represents a relative position calculation function between a node/> and a root node in the tree structure, and/> represents a relative position calculation function between a root node and a node/> in the linear sequence.

Specifically, step S3 fuses the linear sequence of the contextual natural language with the coding model of S2. For natural language text with a linear sequence, the relative position between words is defined as the difference in absolute position, which can be expressed as a and b to represent absolute positions, and translational invariance to the relative position can be easily satisfied: /(I)

In order to unify the linear sequence and the tree structure, the positions of the linear sequence are encoded by negative integers and represented by complementary codes, so that the position encoding highest bits of all natural language words are 1, and the position encoding highest bits of the tree structure nodes are 0. Where the length of the position code of the linear sequence is L _S＝n_T×l_T,n_T representing the maximum bifurcation tree of the tree structure and L _T representing the maximum number of layers of the tree structure. Finally, the root node of the tree structure is used as the first node of the linear sequence, so that unified representation of the two structures is realized, as shown in fig. 5. The calculation formula of the relative position of the unified position code is shown above.

In one embodiment, the final mixed word embedding expression is calculated by:

Specifically, through the foregoing steps, the learning resources are converted into a dataset , where the context sequence C _i,j and the mathematical expression/> are coded by using a unified position coding mode after word segmentation, and sent to the BERT pre-training model. As shown in fig. 6, the input of the BERT pre-training model consists of three parts, the first part being a context sequence, the second part being a mathematical expression LaTeX sequence, and the third part being a depth-first traversal sequence of the mathematical expression OPT tree. The relative position codes are then fed into the attention module of the pre-training model BERT for training. In the attention module, the relative position-coded effect expression is as described above. And obtaining the final mixed word embedding expression through a calculation formula of/> .

After the pre-training model is adjusted, a mixed word embedding expression of the natural language text and the mathematical language text is obtained by adopting two standard pre-training tasks of a masking language model (Masked Language Modeling, MLM) and a lower sentence prediction (Next Sentence Prediction, NSP).

In the MLM task, 15% of words are randomly extracted from three input sequences, 80% of the extracted words are replaced by [ MASK ] marks, 10% of the words are replaced by random other words, and 10% of the words are unchanged. The MLM task uses cross entropy as a loss function, denoted in this embodiment as:

Where/> is the estimated probability of the masked word x after linear classification and Softmax regression, and p (x) is its original distribution, i.e., its one hot vector.

In the NSP task, 50% of are randomly selected to replace its context C _i,j with the context of the random formula. NSP tasks also typically employ cross entropy as a loss function, denoted in this embodiment as:

Where if the context is not replaced with p=1, p=0 if the context is replaced,/> is the estimated probability that the context matches the formula.

In summary, compared with the prior art, the method has the advantages that the position coding is carried out on the tree structure of the data expression, the position coding of the tree structure is guaranteed to have relative position translation invariance, the text with linear structural characteristics and the mathematical expression with the tree structure characteristics are uniformly position coded and used for the BERT pre-training model, and word embedding expression is extracted, so that the information contained in the final word embedding expression is more abundant, and the expression is more accurate.

The specific embodiments described herein are offered by way of example only to illustrate the spirit of the invention. Various modifications may be made to the particular embodiments described, or equivalents may be substituted, by those skilled in the art without departing from the spirit of the invention or exceeding the scope of the invention as defined by the appended claims.

Claims

1. A method for embedding mixed words of natural language text and mathematical language text, comprising:

S5: processing the natural language text and the mathematical language text by using the trained word embedding model to obtain a final mixed word embedding expression;

wherein the masking language model task in S4 uses cross entropy as a loss function, expressed as:

Where/> is the estimated probability of the masked word x after linear classification and Softmax regression, p (x) is its original distribution,

The following prediction task adopts cross entropy as a loss function, and is expressed as:

Where p=1 if the context is not replaced, and p=0 if it is replaced by context,/> is the estimated probability that the context matches the formula.

2. The mixed word embedding method of natural language text and mathematical language text as claimed in claim 1, wherein the step S1 of preprocessing the learning resource containing the natural language text and the mathematical language text comprises:

3. The method of mixed word embedding of natural language text and mathematical language text as claimed in claim 2, wherein processing a learning resource containing natural language text and mathematical language text as a symbol sequence includes:

N _T is the total length of the natural language text;

When the mathematical expression M _i,j is formed by a plurality of continuous equations or inequalities, the mathematical resource data set is obtained by dividing the mathematical expression M _i,j into with the equal sign and the inequality sign according to the expression of each mathematical resource, and is used as the pre-training model data set/> , wherein i is the learning resource serial number, j is the mathematical expression serial number, and w is the sub-expression serial number.

4. The method for embedding mixed words of natural language text and mathematical language text according to claim 1, wherein S2 introduces a displacement operation when performing absolute position coding, the mathematical expression is an N-ary tree, and a root node is defined as pairs of any subsequent child nodes, and the coding mode is as follows:

Where PE represents a relative position calculation function, T represents a tree, represents a relative position calculation function of nodes/> and nodes/> in the mathematical expression tree, D _m is an absolute code value of nodes/> , and/ is a binary code length of D _m, and < represents a left-shift operator.

5. The mixed word embedding method of natural language text and mathematical language text as claimed in claim 1, wherein the step S3 includes:

Wherein represents a relative position calculation function between any two nodes/> and/> ,/> represents a relative position calculation function between node/> and node/> in the linear sequence, S represents sequence,/> represents a relative position calculation function between node/> and node/> in the tree structure,/> represents a relative position calculation function between node and a root node in the linear sequence,/> represents a relative position calculation function between a root node and node/> in the tree structure,/> represents a relative position calculation function between node/> and a root node in the tree structure, and/> represents a relative position calculation function between the root node and node/> in the linear sequence.

6. The method for embedding mixed words of natural language text and mathematical language text according to claim 1, wherein when the attention module of the BERT pre-training model is trained by feeding the relative position codes of any two nodes in the tree structure and the linear sequence, the functional expression of the relative position codes is as follows:

7. The method for embedding mixed words of natural language text and mathematical language text according to claim 6, wherein the final mixed word embedding expression is calculated by: