CN113515960A

CN113515960A - Automatic translation quality evaluation method fusing syntactic information

Info

Publication number: CN113515960A
Application number: CN202110797021.0A
Authority: CN
Inventors: 陆晓蕾; 倪斌; 韩潮; 张培欣; 管新潮; 李力; 陈晨
Original assignee: Xiamen University
Current assignee: Xiamen University
Priority date: 2021-07-14
Filing date: 2021-07-14
Publication date: 2021-10-19
Anticipated expiration: 2041-07-14
Also published as: CN113515960B

Abstract

A method for automatically evaluating translation quality by fusing syntactic information relates to the technical field of translation. The method comprises the following steps: acquiring a bilingual text representation direction of an input text; respectively constructing a syntax dependency tree for a bilingual input text to form a syntax diagram; coding relevant node relation characteristics by using a graph neural network, splicing, and outputting a quality score by an upper layer of a simple sigmoid layer; the output of the model and the root mean square error of the data label are used as loss, and the quality prediction model parameters are updated through a back propagation algorithm. The problem of introduction of syntactic information lack in automatic translation quality evaluation is solved skillfully by using the graph neural network, and the method is not seen in the field of automatic translation quality evaluation. On the basis of the pre-training model, the graph neural network coding syntactic information is added, so that the model can express the semantic and syntactic information at the same time, and the effect of about 19% of the Pearson correlation coefficient can be generally improved compared with the effect of using the pre-training model alone.

Description

Automatic translation quality evaluation method fusing syntactic information

Technical Field

The invention relates to the technical field of translation, in particular to an automatic evaluation method for translation quality by fusing syntactic information.

Background

With the development of neural machine translation and natural language technology, how to automatically perform quantitative assessment (quality estimation, QE) on translation quality has attracted extensive attention in business and academic circles. The translation quality can be estimated without reference to the translation text based on big data driven translation automatic evaluation. Currently, QE methods can be mainly classified into three categories: firstly, based on characteristic engineering; based on neural network; based on pre-training model; firstly, inputting a traditional machine learning algorithm after manually constructing characteristics, wherein the method is represented by Quest and Quest + +, and has the defects of limited performance and difficulty in processing new language phenomena; the method II is usually two-segment type, a bilingual model is trained on the basis of a large amount of parallel linguistic data to obtain word expression, and then the word expression is input into an upper neural network (such as LSTM), and the method is represented by Predictor-Estimator, and has the defects that a large amount of parallel linguistic data are needed and the training time is long; the method is more common in two years, and a better effect can be obtained by connecting a simple full connection layer on a multi-language pre-training model (such as mBERT and XLM-R). It can be seen from the inspection of these techniques that the current technology mainly uses bilingual semantic features to estimate the translation quality. However, the grammatical features are rarely taken into account in the translation quality estimation, restricting the effect of the model.

Therefore, the method and the device can improve the effect of translation quality estimation to a certain extent by incorporating the syntactic information into the model through the graph neural network on the basis of the pre-training model.

Disclosure of Invention

The invention aims to provide an automatic evaluation method for translation quality fusing syntactic information, which can improve the estimation effect of translation quality.

The invention comprises the following steps:

1) acquiring a bilingual text representation direction of an input text;

2) respectively constructing a syntax dependency tree for a bilingual input text to form a syntax diagram;

3) coding relevant node relation characteristics by using a graph neural network, splicing, and outputting a quality score by an upper layer of a simple sigmoid layer;

4) the output of the model and the root mean square error of the data label are used as loss, and the quality prediction model parameters are updated through a back propagation algorithm.

In step 1), the specific method for acquiring the bilingual text representation direction of the input text may be:

(1) a bilingual pre-training model can be adopted to obtain the bilingual text representation direction of the input text; the bilingual pre-training model comprises XLM-R or mBERT; parameter fine adjustment can be carried out in the model training process;

(2) the method using Word2 Vec;

(3) the word vector representation layer is built using a trained model obtained using the open source toolkit transformations.

In step 2), the constructing syntax dependency trees for the bilingual input texts respectively may be performed by: extracting the syntax dependency relationship of bilingual input by using a self-built syntax dependency algorithm or an open source toolkit, such as NLTK or spaCy and the like; the dependency relationship between sentence components is represented by a directed graph; the dependency graph contains nodes and relationship types among the nodes, and is represented by a triple, such as: node a, relationship r, node B. Thus, the syntactic dependency of the whole sentence is encoded into a triple list, [ triple 1, triple 2, triple 3, …, triple n ]; then, converting the ternary group list into a matrix form by using an adjacency matrix; the adjacency matrix is a two-dimensional array of V, wherein V is the number of nodes in the graph; let adj [ ] [ ] be the adjacency matrix, then:

wherein, (vi, vj) represents the edge from node i to node j; if (vi, vj) does not exist, then adj [ i ] [ j ] is assigned a value of 0; if (vi, vj) is present or i ═ j, then adj [ i ] [ j ] is assigned a value of 1.

In step 3), the coding of the relevant node relation characteristics by using the graph neural network is to apply deep learning to a graph structure by using the graph neural network, code the node relation of the syntactic graph by using the graph neural network, and code bilingual input into implicit vectors Hs and Ht respectively; the graph neural network comprises GNN, graph convolution neural network GCN and GAT.

The upper layer outputs a simple sigmoid layer to output a quality score, and a vector H is obtained after bilingual representation Hs and Ht coded by a graph neural network are spliced; and then connecting a full connection layer as an output layer, wherein the activation function of the output layer obtains an output OUT for Sigmoid, namely:

OUT＝Sigmoid(WH+B)

wherein W is a linear transformation parameter and B is a bias term;

the number of neurons in the output layer (i.e. the final output number) is determined according to the specific situation of the task; if the QE is sentence-level, the number of the neurons is 1; if the word level QE is obtained, the number of the neurons is the number of the words.

Compared with the prior art, the invention has the following outstanding advantages and technical effects:

the method skillfully solves the problem of the introduction of the lack of syntactic information in the automatic evaluation of the translation quality by using the graph neural network, and the method is not seen in the field of the automatic evaluation of the translation quality. According to the method, the graph neural network coding syntactic information is added on the basis of the pre-training model, so that the model can express the semantic and syntactic information at the same time, and the effect of about 19% on Pearson correlation coefficients can be generally improved compared with the effect of using the pre-training model alone.

Drawings

FIG. 1 is a diagram of a model architecture of the present invention.

Fig. 2 is the generation of a syntax map.

Detailed Description

The following examples will further illustrate the present invention with reference to the accompanying drawings.

The embodiment of the invention comprises three steps: obtaining a word vector representation; generating a syntactic graph and mapping a neural network; a quality prediction model. These three steps are also shown in the model architecture diagram of fig. 1, and are described below.

The word vector represents:

first, an input bilingual text representation vector needs to be obtained. The present invention proposes to use bilingual pre-training models, such as XLM-R (Conneau et al, 2019) or mBERT (Devrin et al, 2018), to obtain word vector representations of the input text and to perform parameter tuning during the model training process. Of course, the method using Word2Vec is also possible. The word vector representation layer can be built using a model obtained with the open source toolkit transforms (ref: https:// github. com/hugging face/transforms) trained. As shown in FIG. 1, the Source language Source and the Target language Target respectively obtain a word vector E after passing through XLM-R₁，E₂,…,E_nAnd F₁,F₂,F_m。

Syntax tree generation and graph neural network:

this section is the core part of the present invention. It mainly contains two large pieces of content: syntactic graph generation and graph neural networks.

Generating a syntactic graph

And extracting the syntactic dependency relationship of the bilingual input by using a self-built syntactic dependency algorithm or an open source toolkit, such as NLTK or spaCy. The dependency between sentence components can be represented using a directed graph (fig. 2). The dependency graph contains nodes and relationship classes between the nodes, which can be represented by triples, such as (node a, relationship r, node B). This encodes the syntactic dependency of the entire sentence into a list of triples, [ triplet 1, triplet 2, triplet 3, …, triplet n ]. The ternary list is then converted to matrix form using an adjacency matrix. The adjacency matrix is a two-dimensional array of V, where V is the number of nodes in the graph. Let adj [ ] [ ] be the adjacency matrix, then:

wherein, (vi, vj) represents the edge from node i to node j. If (vi, vj) does not exist, then adj [ i ] [ j ] is assigned a value of 0. If (vi, vj) is present or i ═ j, then adj [ i ] [ j ] is assigned a value of 1.

Graph neural network

The graph neural network may apply deep learning to the graph structure. Node relations of the syntactic Graph can be encoded through a Graph neural network, the Graph neural network is commonly used as GNN, and Graph convolution neural networks (Graph dd) GCN and GAT can encode the node relations. The present embodiment takes GAT as an example. The GAT adopts a Multi-head Attention mechanism (Multi-heads Attention), different weights can be distributed to different nodes, and training depends on paired adjacent nodes and does not depend on a specific network structure. Assume that Graph contains N nodes, and the feature vector of each node (obtained in 2.1) is

Then the output of GAT

As follows:

where K represents the total number of heads of attention,

indicating the kth attention head for node i and node j. W^kIs a linear transformation parameter. Ni represents the neighbor node of i. σ denotes the activation function, typically using a modified linear unit function with leakage (LeakyReLU). With this step, the bilingual input is divided, as in FIG. 1Respectively encoded into implicit vectors Hs and Ht.

A quality prediction model:

and splicing the bilingual representation Hs and Ht coded by the neural network of the graph to obtain a vector H ═ Hs: Ht. After that, a full connection layer is connected as an output layer, the activation function is Sigmoid to obtain an output OUT, that is:

OUT＝Sigmoid(WH+B)

where W is the linear transformation parameter and B is the bias term. The number of output layer neurons (i.e., the final number of outputs) depends on the specifics of the task. For example: if the QE is sentence-level, the number of the neurons is 1; if the word level QE is obtained, the number of the neurons is the number of the words. The output of the model is used as a quality score, the root mean square error of the label of the data is used as a loss, and the whole model parameter is updated through a back propagation algorithm.

The invention provides a translation quality estimation technology fusing syntax, which is characterized in that graph neural network coding syntax information is added on the basis of a pre-training model, so that the model can express semantics and syntax information at the same time. Relevant experiments are carried out in a QE data set of International machine translation tournament (WMT2020) in 2020, and results show that the effect of adding syntactic information can be generally improved by about 19% on Pearson (Pearson) correlation coefficients compared with a simple pre-training model in comparison with a single pre-training model. Therefore, the scheme of the invention can improve the estimation effect of the translation quality and has better effect.

Claims

1. A translation quality automatic evaluation method fusing syntactic information is characterized by comprising the following steps:

1) acquiring a bilingual text representation direction of an input text;

2. The method as claimed in claim 1, wherein in step 1), the specific method for obtaining the bilingual text representation direction of the input text is one of the following:

(1) acquiring a bilingual text representation direction of an input text by adopting a bilingual pre-training model; the bilingual pre-training model comprises XLM-R or mBERT; parameter fine adjustment can be carried out in the model training process;

(2) the method using Word2 Vec;

3. The method as claimed in claim 1, wherein in step 2), the method for automatically evaluating translation quality by fusing syntactic information constructs a syntactic dependency tree from the bilingual input text respectively, and the specific method for forming the syntactic graph is as follows: extracting a syntax dependency relationship of bilingual input by using a self-built syntax dependency algorithm or an open source toolkit; the dependency relationship between sentence components is represented by a directed graph; the dependency graph contains nodes and relationship types among the nodes, and is represented by a triple, such as: node A, relationship r, node B; thus, the syntactic dependency of the whole sentence is encoded into a triple list, [ triple 1, triple 2, triple 3, …, triple n ]; then, converting the ternary group list into a matrix form by using an adjacency matrix; the adjacency matrix is a two-dimensional array of V, wherein V is the number of nodes in the graph; let adj [ ] [ ] be the adjacency matrix, then:

4. The method as claimed in claim 1, wherein in step 3), the coding of the relevant node relation features by the graph neural network is to apply deep learning to the graph structure by using the graph neural network, the node relation of the syntactic graph is coded by the graph neural network, and the bilingual input is coded into implicit vectors Hs and Ht, respectively; the graph neural network comprises GNN, graph convolution neural network GCN and GAT.

5. The method as claimed in claim 1, wherein in step 3), the upper layer outputs a simple sigmoid layer to splice the bilingual representation Hs and Ht encoded by the neural network to obtain a vector H ═ Hs: ht ]; and then connecting a full connection layer as an output layer, wherein the activation function of the output layer obtains an output OUT for Sigmoid, namely:

OUT＝Sigmoid(WH+B)

where W is the linear transformation parameter and B is the bias term.

6. The method as claimed in claim 5, wherein the number of neurons in the output layer is determined according to the specific task; if the QE is sentence-level, the number of the neurons is 1; if the word level QE is obtained, the number of the neurons is the number of the words.