CN114386437A

CN114386437A - Mid-heading translation quality estimation method and system based on cross-language pre-training model

Info

Publication number: CN114386437A
Application number: CN202210035223.6A
Authority: CN
Inventors: 赵亚慧; 李飞雨; 崔荣一; 李德; 金国哲; 张振国; 金晶; 姜克鑫; 刘帆; 王苑儒; 夏明会; 鲁雅鑫
Original assignee: Yanbian University
Current assignee: Yanbian University
Priority date: 2022-01-13
Filing date: 2022-01-13
Publication date: 2022-04-22
Anticipated expiration: 2042-01-13
Also published as: CN114386437B

Abstract

The application discloses a mid-heading translation quality estimation method and a mid-heading translation quality estimation system based on a cross-language pre-training model, wherein the method comprises the following steps of: splicing a source language sentence and a machine translation, and obtaining an initial characteristic matrix of the spliced sentence by using an XLM-R model; performing attention calculation on the initial characteristic matrix, and performing sentence embedding through a convolutional neural network to obtain a sentence vector; based on the sentence vectors, a quality estimation score is calculated using a fully connected neural network. The system comprises a cross-language feature extraction module, an attention calculation module and a quality estimation module; the cross-language feature extraction module performs feature extraction on the sentence pairs to be evaluated by using an XLM-R model and generates an initial feature matrix; the attention calculation module carries out attention calculation on the initial characteristic matrix to obtain sentence vectors; the quality estimation module calculates a score for obtaining the translation quality estimation. The sentence embedding quality is high, quality estimation is facilitated, and the performance of the Zhongzhu machine translation quality estimation task is effectively improved.

Description

Mid-heading translation quality estimation method and system based on cross-language pre-training model

Technical Field

The application belongs to the field of natural language processing in artificial intelligence, and particularly relates to a mid-heading translation quality estimation method and system based on a cross-language pre-training model.

Background

Quality Estimation (QE) of machine translation is a subtask of machine translation, and is widely concerned by academia and industry due to its important roles in system evaluation, post-translation optimization, corpus screening, etc. of machine translation. Unlike common machine translation evaluation indicators, the QE task is able to automatically evaluate the translation quality of machine-generated translations after supervised training, and does not rely on any reference translations during the prediction phase. With the rapid development of neural machine translation models, many scholars begin to introduce deep learning methods into the translation quality estimation task, which makes great progress in the field of translation quality estimation.

The machine translation quality estimate is different from machine translation evaluation metrics such as BLEU, TER, METEOR, and the like. It can automatically give a prediction of the quality of the machine generated translation without relying on any reference translation. The most common quality score is the Human Translation Editing Rate (HTER). Quest is a model proposed by special et al for the quality estimation task. The model is used as a baseline model of a machine translation quality estimation task and consists of a feature extraction module and a machine learning module. To solve the problem of machine translation quality estimation, kimh and others first apply a machine translation model to a quality estimation task, and propose a translation quality estimation model based on RNN. Fan K and the like replace an RNN-based translation feature extraction module with a Transformer model on the basis of a predictor-evaluator framework, and a bilingual expert model is provided, so that the performance and the interpretability of a quality estimation task are improved.

At present, mainstream quality estimation tasks are all realized based on a predictor-evaluator framework, and a deep network used in a feature extraction stage of the method needs massive parallel corpora as data support. The linguistic data of various natural language processing tasks of Korean are deficient, and particularly, massive parallel linguistic data are lacked between Chinese language pairs, so that the difficulty of a Chinese language machine translation quality estimation task is aggravated by the current situation of small samples.

The cross-language pre-training model can generate word representations of contextualization which are far-distance dependent, but how to generate high-quality sentence representations is still a problem to be solved. In addition, most sentence embedding methods based on pre-trained models employ CLS strategy or pooling operation to generate sentence vectors, but neither of these approaches can contain all the information learned by the model.

Disclosure of Invention

The application provides a mid-heading translation quality estimation method and system based on a cross-language pre-training model, and on the basis of the traditional cross-language pre-training model, the hidden vector of the pre-training model is subjected to sentence embedding by using linguistic attention and term attention so as to solve the problems.

In order to achieve the above purpose, the present application provides the following solutions:

the mid-heading translation quality estimation method based on the cross-language pre-training model comprises the following steps:

splicing a source language sentence and a machine translation of a target language according to a preset format to obtain a spliced sentence, and obtaining an initial characteristic matrix X of the spliced sentence by using an XLM-R model;

performing linguistic attention calculation and lexical attention calculation on the initial feature matrix X, and performing sentence embedding through a convolutional neural network to obtain a sentence vector of the spliced sentence;

and calculating to obtain a quality estimation score by using a fully-connected neural network based on the sentence vector, and finishing translation quality estimation.

Optionally, the method for obtaining the spliced sentence includes:

and the source language sentence and the machine translation are connected back and forth to form a sequence, and meanwhile, special embedding symbols are added between the sequence head and the source language sentence and the machine translation to obtain the spliced sentence.

Optionally, the method for obtaining the initial feature matrix X includes:

obtaining the hidden layer state h of the XLM-R model based on the hidden layer of the XLM-R model and the position of each term_i，h_iSplicing the sentence pairs to be evaluated and the special embedded words in the ith layer of the XLM-R model in sequence;

splicing e characteristics of all hidden layers and word embedding layers of the XLM-R model to obtain the initial characteristic matrix X of the spliced sentence, wherein the initial characteristic matrix X is Concat (e, h)₁，...，h₂₄)。

Optionally, the method for calculating the linguistic attention includes:

based on the initial characteristic matrix X, after compressing the global information of the hidden layer, aggregating the position information of the sentence lexical items by using average pooling operation and maximum pooling operation respectively to generate a first context characteristic

And a second context feature

Based on the first context feature

And the second context feature

Generating a first contextual attention matrix and a second contextual attention matrix by a shared neural network, respectively;

obtaining a linguistic attention matrix A by using a mode of element summation and combination based on the first context attention matrix and the second context attention matrix_L。

For the initial feature matrix X and the linguistic attention matrix A_LAnd after tensor product operation is carried out, a linguistic attention calculation result X' is obtained.

Optionally, the term attention calculation method includes:

compressing the linguistic attention calculation result X' along different hidden layer directions by using average pooling operation and maximum pooling operation respectively, and then performing splicing processing to obtain the characteristic X^T；

Applying a standard convolutional layer to said feature X^TConvolution is carried out to generate a term attention matrix A_T。

Optionally, the method for obtaining the sentence vector includes:

calculating the linguistic attention X' and the term attention matrix A_TCarrying out tensor product operation to obtain a lexical item attention calculation result X';

based on the term attention calculation result X ″, the sentence vector is obtained using a convolutional neural network, and the sentence vector is S ═ scatter (Conv (X ″).

Optionally, the quality estimation score is:

y_score＝σ(w^T(tanh(WS)))

the quality estimation score is trained by minimizing a mean square error, and an objective function of the quality estimation score is:

on the other hand, in order to achieve the above object, the present application further provides a mid-heading translation quality estimation system based on a cross-language pre-training model, including a cross-language feature extraction module, an attention calculation module and a quality estimation module;

the cross-language feature extraction module performs feature extraction on the sentence pairs to be evaluated by using an XLM-R model and generates an initial feature matrix;

the attention calculation module is used for performing linguistic attention calculation and lexical attention calculation on the initial characteristic matrix, and embedding sentences through a convolutional neural network to obtain sentence vectors of spliced sentences;

the quality estimation module is used for calculating a quality estimation score of the translation quality estimation by using a fully-connected neural network based on the sentence vector.

Optionally, the input of the XLM-R model is formed by splicing a source language sentence and a machine translation of a target language, and a special embedding symbol is added between the head of the spliced sequence and the source language sentence and the machine translation.

Optionally, the attention calculating module includes a linguistic attention calculating unit, a term attention calculating unit and a sentence embedding unit;

the attention calculation unit is used for compressing global information of a hidden layer in the XLM-R model based on the initial feature matrix, then aggregating position information of sentence terms by using average pooling operation and maximum pooling operation respectively to generate a first context feature and a second context feature, then generating a first context attention matrix and a second context attention matrix respectively through a shared neural network, and then obtaining a linguistic attention matrix by using a mode of element summation and combination;

the lexical item attention calculation unit is used for compressing the lexical item attention matrix by using average pooling operation and maximum pooling operation along different hidden layer directions respectively based on tensor product operation results of the linguistic attention matrix, and then performing splicing processing to obtain characteristic X^TApplying standard convolution layer to said feature X^TPerforming convolution to generate a term attention matrix;

the sentence embedding unit is used for carrying out tensor product operation on the linguistic attention moment array tensor product operation result and the lexical attention matrix to obtain an attention operation result, and then obtaining a sentence vector by using a convolutional neural network.

The beneficial effect of this application does:

on the basis of a traditional cross-language pre-training model, the hidden vector of the pre-training model is subjected to sentence embedding by using linguistic attention and lexical attention, the sentence embedding quality is high, and the quality estimation is facilitated; because the attention mechanism comprises two parts of linguistic attention and term attention, the performance is better than that of a single attention module, and the performance of the middle-oriented machine translation quality estimation task is effectively improved.

Drawings

In order to more clearly illustrate the technical solution of the present application, the drawings needed to be used in the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for a person skilled in the art to obtain other drawings without any inventive exercise.

Fig. 1 is a schematic flowchart of a mid-orientation translation quality estimation method based on a cross-language pre-training model according to a first embodiment of the present application;

FIG. 2 is a flowchart illustrating a sentence embedding method according to a first embodiment of the present application;

FIG. 3 is a flow chart illustrating linguistic attention calculation according to an embodiment of the present disclosure;

FIG. 4 is a flow chart illustrating the term attention calculation according to the first embodiment of the present application;

fig. 5 is a schematic structural diagram of a mid-orientation translation quality estimation system based on a cross-language pre-training model according to a second embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In order to make the aforementioned objects, features and advantages of the present application more comprehensible, the present application is described in further detail with reference to the accompanying drawings and the detailed description.

In all embodiments, the data set used is from the middle-english-korean corpus constructed from the real data set "middle korean science and technology information processing integrated platform" project. The original corpus comprises more than 3 ten thousand documents and more than 16 ten thousand pairs of parallel sentences, and covers 13 fields of biotechnology, marine environment, aerospace and the like. The quality estimation score uses a Human-mediated transformation Edit Rate (HTER). HTER is an edit distance based metric that captures the distance between the automatic translation and the reference translation based on the number of modifications required to convert one translation to another, and therefore the quality estimation model needs to reasonably predict this score. The HTER score is automatically calculated by the TERCOM tool. The XLM-RoBERTA-large version is selected as a cross-language pre-training model for experiments and comprises 24 hidden layers and a word embedding layer, and the dimension of each hidden layer is 1024. In the quality estimation process, the length of the sentence padding is 80, the gradient optimization algorithm uses an Adam optimizer, and the learning rate is set to be 2 e-5.

Example one

In the first embodiment, a Cross-Language pre-training Model is introduced into a translation quality estimation task, and first, a source Language to be evaluated and a translation sentence pair are embedded into a sentence based on a Cross-Language Model pre-training Model (XLM-R), and then, the translation quality estimation task is implemented by directly using a generated sentence vector. As shown in fig. 1, a schematic flowchart of a middle-oriented translation quality estimation method based on a cross-language pre-training model according to an embodiment of the present application mainly includes three parts:

the method comprises the steps of firstly, splicing a source language sentence and a machine translation of a target language into a spliced sentence according to a preset format, and then obtaining an initial characteristic matrix X of the spliced sentence by using an XLM-R model.

In this embodiment, the input of XLM-R model is composed of machine translation concatenation of source language sentence and target language, and special embedded characters [ CLS ] are added between sequence head and language]And [ SEP ]]Due to the similarity of the quality estimation and the translation language modeling task in the aspects of cross-language understanding, alignment and the like, in the embodiment, each hidden layer and each term position output of the XLM-R model are used for the translation quality estimation task, and attention is added to the mechanism so that the model is helped to pay more attention to the linguistic level and the term position which are beneficial to the quality estimation task. Hidden layer state h_i＝(h_[CLS]，h_src1，...，h_srcm，h_[SEP]，h_tar1...，h_tarn) For sequential splicing of the sentence pairs to be evaluated at the ith layer of the pre-training model and embedding of the special embedded words, splicing the characteristics of 24 hidden layers and a word embedding layer e of an XLM-R model to obtain an initial characteristic matrix X of a spliced sentence:

X＝Concat(e，h₁，...，h₂₄)

because the pre-training model carries out translation language modeling task on large-scale corpus, the initial characteristic matrix X contains translation estimation related prior knowledge.

And secondly, performing linguistic attention calculation and term attention calculation on the initial characteristic matrix X, and embedding sentences through a convolutional neural network to obtain a sentence vector S of the spliced sentences.

In this embodiment, a method for hiding a pre-training model by attention calculation is proposed by taking advantage of the processing mode of high-dimensional data in the computer vision fieldThe vector is sentence embedded. The attention comprises a linguistic attention part and a term attention part, and attention calculation is carried out on different hidden layers of a pre-training language model and different terms of a sentence respectively. As shown in FIG. 2, the initial feature matrix obtained based on the first step

Respectively obtaining linguistic attention moment arrays on different hidden layers of pre-training model

Term attention moment array at different positions of sentence

Wherein L, H, N represents the number of layers, dimensions, and sentence pair lengths of the pre-trained model, respectively. The whole attention module calculation process is as follows:

where LangAttn and TokenAttn represent the mapping operations of linguistic attention and term attention, respectively, and X "is the output after attention calculation. In the tensor product operation process, linguistic attention is propagated along each position of a sentence lexical item, and similarly, lexical attention also acts on each hidden layer of the model.

And finally, completing sentence embedding through a convolutional neural network:

S＝Flatten(Conv(X″))

specifically, the linguistic attention calculation process is introduced first:

in order to integrate diversified linguistic knowledge implied by the pre-training model, the first embodiment utilizes the feature relationship between different hidden layers to generate the linguistic attention matrix A_L. Language attention meterThe calculation process is shown in fig. 3.

In order to enable each hidden layer feature to contain the context information of the sentence, based on the initial feature matrix X obtained in the first step, the global information of the hidden layer is compressed, the position information of the sentence terms is aggregated by using average pooling and maximum pooling respectively, and two different first context features are generated

And a second context feature

The statistical information of the context features can express the whole sentence, and the mutual dependency relationship between different hidden layers can be conveniently captured later. In order to learn the non-linear interaction between different hidden layers in the attention calculation process, the two context features containing global part-of-speech information generate two attention matrixes through a shared neural network, namely a first context attention matrix and a second context attention matrix, and finally, a final linguistic attention moment matrix is output by using element summation and combination

The specific calculation process is as follows:

where σ is sigmoid function, δ is ReLU activation function, W₀And W₁The parameters are shared for the network.

At the same time, this embodiment usesThe lexical attention is supplemented, and a lexical attention matrix A is generated by using the position relation of the sentence pairs to be evaluated_T. The term attention calculation process is shown in fig. 4.

In order to effectively highlight the importance of different lexical item positions, firstly, average pooling and maximum pooling operations are respectively carried out along different hidden layer directions to compress linguistic information, and the characteristic after splicing is that

Wherein

Second applying a standard convolutional layer pair X^TConvolution is carried out to generate a term attention moment array

A_T＝TokenAttn(X′)＝σ(Conv(X^T))

Where σ is the sigmoid function.

Thirdly, calculating a quality score y by using a fully-connected neural network based on the sentence vector S_scoreAnd finishing the translation quality estimation.

Because the high-complexity neural network is liable to generate a large negative influence on the low-resource language, in the regression task at the top, the embodiment does not depend on a complex output layer, and only uses a simple fully-connected neural network to calculate and obtain the quality estimation score:

y_score＝σ(w^T(tanh(WS)))

wherein sigma is sigmoid function, tanh is hyperbolic tangent function, and W and W are all-connection network parameters. The quality estimation score is trained by minimizing the mean square error, with an objective function of:

calculated scalar value y_scoreI.e. the scoring of the translation by the translation quality estimation model.

In order to verify the evaluation performance of the method, the comparison experiment of the middle-orientation machine translation quality evaluation task is performed with the representative translation quality estimation model under the same conditions and the same corpus scale. Wherein Quest + + is WMT2013-WMT2019 official baseline system; bilngual Expert is an advanced result under the predictor-evaluator framework; TransQuest is a quality estimation model based on a cross-language pre-training model, which performs best in WMT2020 multi-language direct estimation. And respectively calculating the correlation and the error between the predicted value and the true value of each model on the test set, wherein the results are shown in table 1.

TABLE 1

As can be seen from table 1, the correlation between the prediction score and the artificial score of the first embodiment of the present application exceeds all baseline models, and the pearson correlation coefficient is improved by 0.226, 0.156, and 0.034, respectively. This shows that the translation quality estimation task performance can be effectively improved by fusing different levels of linguistic knowledge and lexical item position information.

Since the sentence embedding method fusing the cross-layer information is used for embedding the sentences of the sentence pairs to be evaluated, in order to verify the reasonability and the effectiveness of the strategy, the embodiment uses different sentence embedding modes for carrying out the translation quality estimation task experiment. Wherein Last [ CLS ] is embedded as a sentence by using only a top-level [ CLS ] tag vector of an XLM-R model; last + GRU represents that a top matrix of the XLM-R model obtains a sentence vector through a GRU network; conv is directly convolved without performing attention calculation for sentence embedding by using different hidden layers of an XLM-R model; attention is the sentence embedding obtained by performing Attention calculation using different hidden layer information. The quality estimation performance under each sentence embedding method is calculated respectively, and the specific result is shown in table 2.

TABLE 2

As can be seen from Table 2, the convolution using all levels of XLM-R works the least, applying only the top level [ CLS ] label vector. When all hidden layer information is used for direct convolution, the original characteristics contain a large amount of redundant information irrelevant to the translation quality estimation task, and excessive noise is introduced for subsequent downstream tasks without screening, so that the evaluation performance is influenced. The use of the top layer [ CLS ] label only loses the low-level information learned by excessive pre-training models, and the low-level information contains word-level language features of various quality estimation task keys, so the embedding method has poor effect. Therefore, the sentence embedding method provided by the application is more beneficial to the quality estimation task.

In order to effectively fuse the cross-layer information of the model and the lexical item information of different positions, the method introduces an attention mechanism in the sentence embedding method. Since the attention mechanism includes two parts of linguistic attention and term attention, the embodiment also performs ablation experiments and combination sequence experiments on the two parts of linguistic attention and term attention, so as to verify the effectiveness of the attention mechanism design of the application. The results of the experiment are shown in table 3.

TABLE 3

According to the results in table 3, it can be found that both the linguistic attention and the term attention are better than the performance of the single use of either one of them, which indicates that both the linguistic attention and the term attention play a role. Meanwhile, using linguistic attention in the first stage performs slightly better than using term attention, so the present application uses this optimal alignment strategy in the sentence embedding process.

The experiment result shows that the performance of the middle-heading machine translation quality estimation task can be effectively improved.

Example two

As shown in fig. 5, a structural diagram of a mid-heading translation quality estimation system based on a cross-language pre-training model according to a second embodiment of the present application is mainly included, and the structural diagram mainly includes a cross-language feature extraction module, an attention calculation module, and a quality estimation module.

In the second embodiment, the cross-language feature extraction module performs feature extraction on the sentence pair to be evaluated by using an XLM-R model, and generates an initial feature matrix; the attention calculation module is used for performing linguistic attention calculation and lexical attention calculation on the initial characteristic matrix, and performing sentence embedding through a convolutional neural network to obtain a sentence vector of a spliced sentence; and the quality estimation module is used for calculating a quality estimation score of the translation quality estimation by using the fully-connected neural network based on the sentence vector.

In the second embodiment, the input of the XLM-R model in the cross-language feature extraction module is formed by splicing the source language sentence and the machine translation of the target language, and meanwhile, special embedding symbols are added to the head of the spliced sequence and between the source language sentence and the machine translation.

In the second embodiment, the attention calculating module is further specifically divided into a linguistic attention calculating unit, a term attention calculating unit and a sentence embedding unit.

The attention calculation unit is used for compressing global information of a hidden layer in an XLM-R model based on an initial feature matrix, and then aggregating position information of sentence terms by using average pooling operation and maximum pooling operation respectively to generate two different context features which are recorded as a first context feature and a second context feature; respectively generating context attention matrixes through a shared neural network, and recording the context attention matrixes as a first context attention matrix and a second context attention matrix; then, a linguistic attention matrix is obtained by using a mode of element summation and combination; the lexical item attention calculation unit is used for compressing the data in different hidden layer directions by using average pooling operation and maximum pooling operation respectively based on tensor product operation results of the linguistic attention matrix, and then performing splicing processing to obtain the characteristic X^TApplying the standard convolution layer pair feature X^TPerforming convolution to generate a term attention matrix; the sentence embedding unit is used for carrying out tensor product operation on the linguistic attention matrix tensor product operation result and the lexical attention moment array to obtain an attention calculation result, and then obtaining a sentence vector by using the convolutional neural network.

The above-described embodiments are merely illustrative of the preferred embodiments of the present application, and do not limit the scope of the present application, and various modifications and improvements made to the technical solutions of the present application by those skilled in the art without departing from the spirit of the present application should fall within the protection scope defined by the claims of the present application.

Claims

1. The mid-heading translation quality estimation method based on the cross-language pre-training model is characterized by comprising the following steps of:

2. The mid-heading translation quality estimation method based on the cross-language pre-training model according to claim 1, wherein the method for obtaining the spliced sentences comprises:

3. The mid-heading translation quality estimation method based on the cross-language pre-training model according to claim 2, wherein the method for obtaining the initial feature matrix X comprises:

4. The mid-heading translation quality estimation method based on the cross-language pre-training model according to claim 3, wherein the method for linguistic attention calculation comprises:

And a second context feature

Based on the first context feature

And the second context feature

obtaining a linguistic attention matrix A by using a mode of element summation and combination based on the first context attention matrix and the second context attention matrix_L；

5. The method for estimating the mid-heading translation quality based on the cross-language pre-training model according to claim 4, wherein the method for calculating the term attention comprises:

6. The mid-heading translation quality estimation method based on the cross-language pre-training model according to claim 5, wherein the method for obtaining the sentence vector comprises:

7. The mid-heading translation quality estimation method based on the cross-language pre-training model according to claim 6, wherein the quality estimation score is:

y_score＝σ(w^T(tanh(WS)))

the quality estimation score is trained by minimizing the mean square error, the quality estimation score objective function being:

8. the mid-heading translation quality estimation system based on the cross-language pre-training model is characterized by comprising a cross-language feature extraction module, an attention calculation module and a quality estimation module;

9. The system of claim 8, wherein the input of the XLM-R model is a concatenation of a source language sentence and a machine translation of a target language, and a special embedder is added between the beginning of the concatenation sequence and the source language sentence and the machine translation.

10. The mid-heading translation quality estimation system based on the cross-language pre-training model according to claim 9, wherein the attention calculation module comprises a linguistic attention calculation unit, a term attention calculation unit and a sentence embedding unit;