CN114386437A - Mid-heading translation quality estimation method and system based on cross-language pre-training model - Google Patents

Mid-heading translation quality estimation method and system based on cross-language pre-training model Download PDF

Info

Publication number
CN114386437A
CN114386437A CN202210035223.6A CN202210035223A CN114386437A CN 114386437 A CN114386437 A CN 114386437A CN 202210035223 A CN202210035223 A CN 202210035223A CN 114386437 A CN114386437 A CN 114386437A
Authority
CN
China
Prior art keywords
attention
sentence
quality estimation
matrix
language
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210035223.6A
Other languages
Chinese (zh)
Other versions
CN114386437B (en
Inventor
赵亚慧
李飞雨
崔荣一
李德
金国哲
张振国
金晶
姜克鑫
刘帆
王苑儒
夏明会
鲁雅鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yanbian University
Original Assignee
Yanbian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yanbian University filed Critical Yanbian University
Priority to CN202210035223.6A priority Critical patent/CN114386437B/en
Publication of CN114386437A publication Critical patent/CN114386437A/en
Application granted granted Critical
Publication of CN114386437B publication Critical patent/CN114386437B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/51Translation evaluation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Machine Translation (AREA)

Abstract

The application discloses a mid-heading translation quality estimation method and a mid-heading translation quality estimation system based on a cross-language pre-training model, wherein the method comprises the following steps of: splicing a source language sentence and a machine translation, and obtaining an initial characteristic matrix of the spliced sentence by using an XLM-R model; performing attention calculation on the initial characteristic matrix, and performing sentence embedding through a convolutional neural network to obtain a sentence vector; based on the sentence vectors, a quality estimation score is calculated using a fully connected neural network. The system comprises a cross-language feature extraction module, an attention calculation module and a quality estimation module; the cross-language feature extraction module performs feature extraction on the sentence pairs to be evaluated by using an XLM-R model and generates an initial feature matrix; the attention calculation module carries out attention calculation on the initial characteristic matrix to obtain sentence vectors; the quality estimation module calculates a score for obtaining the translation quality estimation. The sentence embedding quality is high, quality estimation is facilitated, and the performance of the Zhongzhu machine translation quality estimation task is effectively improved.

Description

Mid-heading translation quality estimation method and system based on cross-language pre-training model
Technical Field
The application belongs to the field of natural language processing in artificial intelligence, and particularly relates to a mid-heading translation quality estimation method and system based on a cross-language pre-training model.
Background
Quality Estimation (QE) of machine translation is a subtask of machine translation, and is widely concerned by academia and industry due to its important roles in system evaluation, post-translation optimization, corpus screening, etc. of machine translation. Unlike common machine translation evaluation indicators, the QE task is able to automatically evaluate the translation quality of machine-generated translations after supervised training, and does not rely on any reference translations during the prediction phase. With the rapid development of neural machine translation models, many scholars begin to introduce deep learning methods into the translation quality estimation task, which makes great progress in the field of translation quality estimation.
The machine translation quality estimate is different from machine translation evaluation metrics such as BLEU, TER, METEOR, and the like. It can automatically give a prediction of the quality of the machine generated translation without relying on any reference translation. The most common quality score is the Human Translation Editing Rate (HTER). Quest is a model proposed by special et al for the quality estimation task. The model is used as a baseline model of a machine translation quality estimation task and consists of a feature extraction module and a machine learning module. To solve the problem of machine translation quality estimation, kimh and others first apply a machine translation model to a quality estimation task, and propose a translation quality estimation model based on RNN. Fan K and the like replace an RNN-based translation feature extraction module with a Transformer model on the basis of a predictor-evaluator framework, and a bilingual expert model is provided, so that the performance and the interpretability of a quality estimation task are improved.
At present, mainstream quality estimation tasks are all realized based on a predictor-evaluator framework, and a deep network used in a feature extraction stage of the method needs massive parallel corpora as data support. The linguistic data of various natural language processing tasks of Korean are deficient, and particularly, massive parallel linguistic data are lacked between Chinese language pairs, so that the difficulty of a Chinese language machine translation quality estimation task is aggravated by the current situation of small samples.
The cross-language pre-training model can generate word representations of contextualization which are far-distance dependent, but how to generate high-quality sentence representations is still a problem to be solved. In addition, most sentence embedding methods based on pre-trained models employ CLS strategy or pooling operation to generate sentence vectors, but neither of these approaches can contain all the information learned by the model.
Disclosure of Invention
The application provides a mid-heading translation quality estimation method and system based on a cross-language pre-training model, and on the basis of the traditional cross-language pre-training model, the hidden vector of the pre-training model is subjected to sentence embedding by using linguistic attention and term attention so as to solve the problems.
In order to achieve the above purpose, the present application provides the following solutions:
the mid-heading translation quality estimation method based on the cross-language pre-training model comprises the following steps:
splicing a source language sentence and a machine translation of a target language according to a preset format to obtain a spliced sentence, and obtaining an initial characteristic matrix X of the spliced sentence by using an XLM-R model;
performing linguistic attention calculation and lexical attention calculation on the initial feature matrix X, and performing sentence embedding through a convolutional neural network to obtain a sentence vector of the spliced sentence;
and calculating to obtain a quality estimation score by using a fully-connected neural network based on the sentence vector, and finishing translation quality estimation.
Optionally, the method for obtaining the spliced sentence includes:
and the source language sentence and the machine translation are connected back and forth to form a sequence, and meanwhile, special embedding symbols are added between the sequence head and the source language sentence and the machine translation to obtain the spliced sentence.
Optionally, the method for obtaining the initial feature matrix X includes:
obtaining the hidden layer state h of the XLM-R model based on the hidden layer of the XLM-R model and the position of each termi,hiSplicing the sentence pairs to be evaluated and the special embedded words in the ith layer of the XLM-R model in sequence;
splicing e characteristics of all hidden layers and word embedding layers of the XLM-R model to obtain the initial characteristic matrix X of the spliced sentence, wherein the initial characteristic matrix X is Concat (e, h)1,...,h24)。
Optionally, the method for calculating the linguistic attention includes:
based on the initial characteristic matrix X, after compressing the global information of the hidden layer, aggregating the position information of the sentence lexical items by using average pooling operation and maximum pooling operation respectively to generate a first context characteristic
Figure BDA0003468087670000031
And a second context feature
Figure BDA0003468087670000032
Based on the first context feature
Figure BDA0003468087670000033
And the second context feature
Figure BDA0003468087670000034
Generating a first contextual attention matrix and a second contextual attention matrix by a shared neural network, respectively;
obtaining a linguistic attention matrix A by using a mode of element summation and combination based on the first context attention matrix and the second context attention matrixL
For the initial feature matrix X and the linguistic attention matrix ALAnd after tensor product operation is carried out, a linguistic attention calculation result X' is obtained.
Optionally, the term attention calculation method includes:
compressing the linguistic attention calculation result X' along different hidden layer directions by using average pooling operation and maximum pooling operation respectively, and then performing splicing processing to obtain the characteristic XT
Applying a standard convolutional layer to said feature XTConvolution is carried out to generate a term attention matrix AT
Optionally, the method for obtaining the sentence vector includes:
calculating the linguistic attention X' and the term attention matrix ATCarrying out tensor product operation to obtain a lexical item attention calculation result X';
based on the term attention calculation result X ″, the sentence vector is obtained using a convolutional neural network, and the sentence vector is S ═ scatter (Conv (X ″).
Optionally, the quality estimation score is:
yscore=σ(wT(tanh(WS)))
the quality estimation score is trained by minimizing a mean square error, and an objective function of the quality estimation score is:
Figure BDA0003468087670000041
on the other hand, in order to achieve the above object, the present application further provides a mid-heading translation quality estimation system based on a cross-language pre-training model, including a cross-language feature extraction module, an attention calculation module and a quality estimation module;
the cross-language feature extraction module performs feature extraction on the sentence pairs to be evaluated by using an XLM-R model and generates an initial feature matrix;
the attention calculation module is used for performing linguistic attention calculation and lexical attention calculation on the initial characteristic matrix, and embedding sentences through a convolutional neural network to obtain sentence vectors of spliced sentences;
the quality estimation module is used for calculating a quality estimation score of the translation quality estimation by using a fully-connected neural network based on the sentence vector.
Optionally, the input of the XLM-R model is formed by splicing a source language sentence and a machine translation of a target language, and a special embedding symbol is added between the head of the spliced sequence and the source language sentence and the machine translation.
Optionally, the attention calculating module includes a linguistic attention calculating unit, a term attention calculating unit and a sentence embedding unit;
the attention calculation unit is used for compressing global information of a hidden layer in the XLM-R model based on the initial feature matrix, then aggregating position information of sentence terms by using average pooling operation and maximum pooling operation respectively to generate a first context feature and a second context feature, then generating a first context attention matrix and a second context attention matrix respectively through a shared neural network, and then obtaining a linguistic attention matrix by using a mode of element summation and combination;
the lexical item attention calculation unit is used for compressing the lexical item attention matrix by using average pooling operation and maximum pooling operation along different hidden layer directions respectively based on tensor product operation results of the linguistic attention matrix, and then performing splicing processing to obtain characteristic XTApplying standard convolution layer to said feature XTPerforming convolution to generate a term attention matrix;
the sentence embedding unit is used for carrying out tensor product operation on the linguistic attention moment array tensor product operation result and the lexical attention matrix to obtain an attention operation result, and then obtaining a sentence vector by using a convolutional neural network.
The beneficial effect of this application does:
on the basis of a traditional cross-language pre-training model, the hidden vector of the pre-training model is subjected to sentence embedding by using linguistic attention and lexical attention, the sentence embedding quality is high, and the quality estimation is facilitated; because the attention mechanism comprises two parts of linguistic attention and term attention, the performance is better than that of a single attention module, and the performance of the middle-oriented machine translation quality estimation task is effectively improved.
Drawings
In order to more clearly illustrate the technical solution of the present application, the drawings needed to be used in the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for a person skilled in the art to obtain other drawings without any inventive exercise.
Fig. 1 is a schematic flowchart of a mid-orientation translation quality estimation method based on a cross-language pre-training model according to a first embodiment of the present application;
FIG. 2 is a flowchart illustrating a sentence embedding method according to a first embodiment of the present application;
FIG. 3 is a flow chart illustrating linguistic attention calculation according to an embodiment of the present disclosure;
FIG. 4 is a flow chart illustrating the term attention calculation according to the first embodiment of the present application;
fig. 5 is a schematic structural diagram of a mid-orientation translation quality estimation system based on a cross-language pre-training model according to a second embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, the present application is described in further detail with reference to the accompanying drawings and the detailed description.
In all embodiments, the data set used is from the middle-english-korean corpus constructed from the real data set "middle korean science and technology information processing integrated platform" project. The original corpus comprises more than 3 ten thousand documents and more than 16 ten thousand pairs of parallel sentences, and covers 13 fields of biotechnology, marine environment, aerospace and the like. The quality estimation score uses a Human-mediated transformation Edit Rate (HTER). HTER is an edit distance based metric that captures the distance between the automatic translation and the reference translation based on the number of modifications required to convert one translation to another, and therefore the quality estimation model needs to reasonably predict this score. The HTER score is automatically calculated by the TERCOM tool. The XLM-RoBERTA-large version is selected as a cross-language pre-training model for experiments and comprises 24 hidden layers and a word embedding layer, and the dimension of each hidden layer is 1024. In the quality estimation process, the length of the sentence padding is 80, the gradient optimization algorithm uses an Adam optimizer, and the learning rate is set to be 2 e-5.
Example one
In the first embodiment, a Cross-Language pre-training Model is introduced into a translation quality estimation task, and first, a source Language to be evaluated and a translation sentence pair are embedded into a sentence based on a Cross-Language Model pre-training Model (XLM-R), and then, the translation quality estimation task is implemented by directly using a generated sentence vector. As shown in fig. 1, a schematic flowchart of a middle-oriented translation quality estimation method based on a cross-language pre-training model according to an embodiment of the present application mainly includes three parts:
the method comprises the steps of firstly, splicing a source language sentence and a machine translation of a target language into a spliced sentence according to a preset format, and then obtaining an initial characteristic matrix X of the spliced sentence by using an XLM-R model.
In this embodiment, the input of XLM-R model is composed of machine translation concatenation of source language sentence and target language, and special embedded characters [ CLS ] are added between sequence head and language]And [ SEP ]]Due to the similarity of the quality estimation and the translation language modeling task in the aspects of cross-language understanding, alignment and the like, in the embodiment, each hidden layer and each term position output of the XLM-R model are used for the translation quality estimation task, and attention is added to the mechanism so that the model is helped to pay more attention to the linguistic level and the term position which are beneficial to the quality estimation task. Hidden layer state hi=(h[CLS],hsrc1,...,hsrcm,h[SEP],htar1...,htarn) For sequential splicing of the sentence pairs to be evaluated at the ith layer of the pre-training model and embedding of the special embedded words, splicing the characteristics of 24 hidden layers and a word embedding layer e of an XLM-R model to obtain an initial characteristic matrix X of a spliced sentence:
X=Concat(e,h1,...,h24)
because the pre-training model carries out translation language modeling task on large-scale corpus, the initial characteristic matrix X contains translation estimation related prior knowledge.
And secondly, performing linguistic attention calculation and term attention calculation on the initial characteristic matrix X, and embedding sentences through a convolutional neural network to obtain a sentence vector S of the spliced sentences.
In this embodiment, a method for hiding a pre-training model by attention calculation is proposed by taking advantage of the processing mode of high-dimensional data in the computer vision fieldThe vector is sentence embedded. The attention comprises a linguistic attention part and a term attention part, and attention calculation is carried out on different hidden layers of a pre-training language model and different terms of a sentence respectively. As shown in FIG. 2, the initial feature matrix obtained based on the first step
Figure BDA0003468087670000091
Respectively obtaining linguistic attention moment arrays on different hidden layers of pre-training model
Figure BDA0003468087670000092
Term attention moment array at different positions of sentence
Figure BDA0003468087670000093
Wherein L, H, N represents the number of layers, dimensions, and sentence pair lengths of the pre-trained model, respectively. The whole attention module calculation process is as follows:
Figure BDA0003468087670000094
Figure BDA0003468087670000095
where LangAttn and TokenAttn represent the mapping operations of linguistic attention and term attention, respectively, and X "is the output after attention calculation. In the tensor product operation process, linguistic attention is propagated along each position of a sentence lexical item, and similarly, lexical attention also acts on each hidden layer of the model.
And finally, completing sentence embedding through a convolutional neural network:
S=Flatten(Conv(X″))
specifically, the linguistic attention calculation process is introduced first:
in order to integrate diversified linguistic knowledge implied by the pre-training model, the first embodiment utilizes the feature relationship between different hidden layers to generate the linguistic attention matrix AL. Language attention meterThe calculation process is shown in fig. 3.
In order to enable each hidden layer feature to contain the context information of the sentence, based on the initial feature matrix X obtained in the first step, the global information of the hidden layer is compressed, the position information of the sentence terms is aggregated by using average pooling and maximum pooling respectively, and two different first context features are generated
Figure BDA0003468087670000101
And a second context feature
Figure BDA0003468087670000102
Figure BDA0003468087670000103
Figure BDA0003468087670000104
The statistical information of the context features can express the whole sentence, and the mutual dependency relationship between different hidden layers can be conveniently captured later. In order to learn the non-linear interaction between different hidden layers in the attention calculation process, the two context features containing global part-of-speech information generate two attention matrixes through a shared neural network, namely a first context attention matrix and a second context attention matrix, and finally, a final linguistic attention moment matrix is output by using element summation and combination
Figure BDA0003468087670000105
The specific calculation process is as follows:
Figure BDA0003468087670000106
where σ is sigmoid function, δ is ReLU activation function, W0And W1The parameters are shared for the network.
At the same time, this embodiment usesThe lexical attention is supplemented, and a lexical attention matrix A is generated by using the position relation of the sentence pairs to be evaluatedT. The term attention calculation process is shown in fig. 4.
In order to effectively highlight the importance of different lexical item positions, firstly, average pooling and maximum pooling operations are respectively carried out along different hidden layer directions to compress linguistic information, and the characteristic after splicing is that
Figure BDA0003468087670000107
Figure BDA0003468087670000111
Wherein
Figure BDA0003468087670000112
Second applying a standard convolutional layer pair XTConvolution is carried out to generate a term attention moment array
Figure BDA0003468087670000113
AT=TokenAttn(X′)=σ(Conv(XT))
Where σ is the sigmoid function.
Thirdly, calculating a quality score y by using a fully-connected neural network based on the sentence vector SscoreAnd finishing the translation quality estimation.
Because the high-complexity neural network is liable to generate a large negative influence on the low-resource language, in the regression task at the top, the embodiment does not depend on a complex output layer, and only uses a simple fully-connected neural network to calculate and obtain the quality estimation score:
yscore=σ(wT(tanh(WS)))
wherein sigma is sigmoid function, tanh is hyperbolic tangent function, and W and W are all-connection network parameters. The quality estimation score is trained by minimizing the mean square error, with an objective function of:
Figure BDA0003468087670000114
calculated scalar value yscoreI.e. the scoring of the translation by the translation quality estimation model.
In order to verify the evaluation performance of the method, the comparison experiment of the middle-orientation machine translation quality evaluation task is performed with the representative translation quality estimation model under the same conditions and the same corpus scale. Wherein Quest + + is WMT2013-WMT2019 official baseline system; bilngual Expert is an advanced result under the predictor-evaluator framework; TransQuest is a quality estimation model based on a cross-language pre-training model, which performs best in WMT2020 multi-language direct estimation. And respectively calculating the correlation and the error between the predicted value and the true value of each model on the test set, wherein the results are shown in table 1.
TABLE 1
Figure BDA0003468087670000121
As can be seen from table 1, the correlation between the prediction score and the artificial score of the first embodiment of the present application exceeds all baseline models, and the pearson correlation coefficient is improved by 0.226, 0.156, and 0.034, respectively. This shows that the translation quality estimation task performance can be effectively improved by fusing different levels of linguistic knowledge and lexical item position information.
Since the sentence embedding method fusing the cross-layer information is used for embedding the sentences of the sentence pairs to be evaluated, in order to verify the reasonability and the effectiveness of the strategy, the embodiment uses different sentence embedding modes for carrying out the translation quality estimation task experiment. Wherein Last [ CLS ] is embedded as a sentence by using only a top-level [ CLS ] tag vector of an XLM-R model; last + GRU represents that a top matrix of the XLM-R model obtains a sentence vector through a GRU network; conv is directly convolved without performing attention calculation for sentence embedding by using different hidden layers of an XLM-R model; attention is the sentence embedding obtained by performing Attention calculation using different hidden layer information. The quality estimation performance under each sentence embedding method is calculated respectively, and the specific result is shown in table 2.
TABLE 2
Figure BDA0003468087670000122
As can be seen from Table 2, the convolution using all levels of XLM-R works the least, applying only the top level [ CLS ] label vector. When all hidden layer information is used for direct convolution, the original characteristics contain a large amount of redundant information irrelevant to the translation quality estimation task, and excessive noise is introduced for subsequent downstream tasks without screening, so that the evaluation performance is influenced. The use of the top layer [ CLS ] label only loses the low-level information learned by excessive pre-training models, and the low-level information contains word-level language features of various quality estimation task keys, so the embedding method has poor effect. Therefore, the sentence embedding method provided by the application is more beneficial to the quality estimation task.
In order to effectively fuse the cross-layer information of the model and the lexical item information of different positions, the method introduces an attention mechanism in the sentence embedding method. Since the attention mechanism includes two parts of linguistic attention and term attention, the embodiment also performs ablation experiments and combination sequence experiments on the two parts of linguistic attention and term attention, so as to verify the effectiveness of the attention mechanism design of the application. The results of the experiment are shown in table 3.
TABLE 3
Figure BDA0003468087670000131
According to the results in table 3, it can be found that both the linguistic attention and the term attention are better than the performance of the single use of either one of them, which indicates that both the linguistic attention and the term attention play a role. Meanwhile, using linguistic attention in the first stage performs slightly better than using term attention, so the present application uses this optimal alignment strategy in the sentence embedding process.
The experiment result shows that the performance of the middle-heading machine translation quality estimation task can be effectively improved.
Example two
As shown in fig. 5, a structural diagram of a mid-heading translation quality estimation system based on a cross-language pre-training model according to a second embodiment of the present application is mainly included, and the structural diagram mainly includes a cross-language feature extraction module, an attention calculation module, and a quality estimation module.
In the second embodiment, the cross-language feature extraction module performs feature extraction on the sentence pair to be evaluated by using an XLM-R model, and generates an initial feature matrix; the attention calculation module is used for performing linguistic attention calculation and lexical attention calculation on the initial characteristic matrix, and performing sentence embedding through a convolutional neural network to obtain a sentence vector of a spliced sentence; and the quality estimation module is used for calculating a quality estimation score of the translation quality estimation by using the fully-connected neural network based on the sentence vector.
In the second embodiment, the input of the XLM-R model in the cross-language feature extraction module is formed by splicing the source language sentence and the machine translation of the target language, and meanwhile, special embedding symbols are added to the head of the spliced sequence and between the source language sentence and the machine translation.
In the second embodiment, the attention calculating module is further specifically divided into a linguistic attention calculating unit, a term attention calculating unit and a sentence embedding unit.
The attention calculation unit is used for compressing global information of a hidden layer in an XLM-R model based on an initial feature matrix, and then aggregating position information of sentence terms by using average pooling operation and maximum pooling operation respectively to generate two different context features which are recorded as a first context feature and a second context feature; respectively generating context attention matrixes through a shared neural network, and recording the context attention matrixes as a first context attention matrix and a second context attention matrix; then, a linguistic attention matrix is obtained by using a mode of element summation and combination; the lexical item attention calculation unit is used for compressing the data in different hidden layer directions by using average pooling operation and maximum pooling operation respectively based on tensor product operation results of the linguistic attention matrix, and then performing splicing processing to obtain the characteristic XTApplying the standard convolution layer pair feature XTPerforming convolution to generate a term attention matrix; the sentence embedding unit is used for carrying out tensor product operation on the linguistic attention matrix tensor product operation result and the lexical attention moment array to obtain an attention calculation result, and then obtaining a sentence vector by using the convolutional neural network.
The above-described embodiments are merely illustrative of the preferred embodiments of the present application, and do not limit the scope of the present application, and various modifications and improvements made to the technical solutions of the present application by those skilled in the art without departing from the spirit of the present application should fall within the protection scope defined by the claims of the present application.

Claims (10)

1. The mid-heading translation quality estimation method based on the cross-language pre-training model is characterized by comprising the following steps of:
splicing a source language sentence and a machine translation of a target language according to a preset format to obtain a spliced sentence, and obtaining an initial characteristic matrix X of the spliced sentence by using an XLM-R model;
performing linguistic attention calculation and lexical attention calculation on the initial feature matrix X, and performing sentence embedding through a convolutional neural network to obtain a sentence vector of the spliced sentence;
and calculating to obtain a quality estimation score by using a fully-connected neural network based on the sentence vector, and finishing translation quality estimation.
2. The mid-heading translation quality estimation method based on the cross-language pre-training model according to claim 1, wherein the method for obtaining the spliced sentences comprises:
and the source language sentence and the machine translation are connected back and forth to form a sequence, and meanwhile, special embedding symbols are added between the sequence head and the source language sentence and the machine translation to obtain the spliced sentence.
3. The mid-heading translation quality estimation method based on the cross-language pre-training model according to claim 2, wherein the method for obtaining the initial feature matrix X comprises:
obtaining the hidden layer state h of the XLM-R model based on the hidden layer of the XLM-R model and the position of each termi,hiSplicing the sentence pairs to be evaluated and the special embedded words in the ith layer of the XLM-R model in sequence;
splicing e characteristics of all hidden layers and word embedding layers of the XLM-R model to obtain the initial characteristic matrix X of the spliced sentence, wherein the initial characteristic matrix X is Concat (e, h)1,...,h24)。
4. The mid-heading translation quality estimation method based on the cross-language pre-training model according to claim 3, wherein the method for linguistic attention calculation comprises:
based on the initial characteristic matrix X, after compressing the global information of the hidden layer, aggregating the position information of the sentence lexical items by using average pooling operation and maximum pooling operation respectively to generate a first context characteristic
Figure FDA0003468087660000021
And a second context feature
Figure FDA0003468087660000022
Based on the first context feature
Figure FDA0003468087660000023
And the second context feature
Figure FDA0003468087660000024
Generating a first contextual attention matrix and a second contextual attention matrix by a shared neural network, respectively;
obtaining a linguistic attention matrix A by using a mode of element summation and combination based on the first context attention matrix and the second context attention matrixL
For the initial feature matrix X and the linguistic attention matrix ALAnd after tensor product operation is carried out, a linguistic attention calculation result X' is obtained.
5. The method for estimating the mid-heading translation quality based on the cross-language pre-training model according to claim 4, wherein the method for calculating the term attention comprises:
compressing the linguistic attention calculation result X' along different hidden layer directions by using average pooling operation and maximum pooling operation respectively, and then performing splicing processing to obtain the characteristic XT
Applying a standard convolutional layer to said feature XTConvolution is carried out to generate a term attention matrix AT
6. The mid-heading translation quality estimation method based on the cross-language pre-training model according to claim 5, wherein the method for obtaining the sentence vector comprises:
calculating the linguistic attention X' and the term attention matrix ATCarrying out tensor product operation to obtain a lexical item attention calculation result X';
based on the term attention calculation result X ″, the sentence vector is obtained using a convolutional neural network, and the sentence vector is S ═ scatter (Conv (X ″).
7. The mid-heading translation quality estimation method based on the cross-language pre-training model according to claim 6, wherein the quality estimation score is:
yscore=σ(wT(tanh(WS)))
the quality estimation score is trained by minimizing the mean square error, the quality estimation score objective function being:
Figure FDA0003468087660000031
8. the mid-heading translation quality estimation system based on the cross-language pre-training model is characterized by comprising a cross-language feature extraction module, an attention calculation module and a quality estimation module;
the cross-language feature extraction module performs feature extraction on the sentence pairs to be evaluated by using an XLM-R model and generates an initial feature matrix;
the attention calculation module is used for performing linguistic attention calculation and lexical attention calculation on the initial characteristic matrix, and embedding sentences through a convolutional neural network to obtain sentence vectors of spliced sentences;
the quality estimation module is used for calculating a quality estimation score of the translation quality estimation by using a fully-connected neural network based on the sentence vector.
9. The system of claim 8, wherein the input of the XLM-R model is a concatenation of a source language sentence and a machine translation of a target language, and a special embedder is added between the beginning of the concatenation sequence and the source language sentence and the machine translation.
10. The mid-heading translation quality estimation system based on the cross-language pre-training model according to claim 9, wherein the attention calculation module comprises a linguistic attention calculation unit, a term attention calculation unit and a sentence embedding unit;
the attention calculation unit is used for compressing global information of a hidden layer in the XLM-R model based on the initial feature matrix, then aggregating position information of sentence terms by using average pooling operation and maximum pooling operation respectively to generate a first context feature and a second context feature, then generating a first context attention matrix and a second context attention matrix respectively through a shared neural network, and then obtaining a linguistic attention matrix by using a mode of element summation and combination;
the lexical item attention calculation unit is used for compressing the lexical item attention matrix by using average pooling operation and maximum pooling operation along different hidden layer directions respectively based on tensor product operation results of the linguistic attention matrix, and then performing splicing processing to obtain characteristic XTApplying standard convolution layer to said feature XTPerforming convolution to generate a term attention matrix;
the sentence embedding unit is used for carrying out tensor product operation on the linguistic attention moment array tensor product operation result and the lexical attention matrix to obtain an attention operation result, and then obtaining a sentence vector by using a convolutional neural network.
CN202210035223.6A 2022-01-13 2022-01-13 Mid-orientation translation quality estimation method and system based on cross-language pre-training model Active CN114386437B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210035223.6A CN114386437B (en) 2022-01-13 2022-01-13 Mid-orientation translation quality estimation method and system based on cross-language pre-training model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210035223.6A CN114386437B (en) 2022-01-13 2022-01-13 Mid-orientation translation quality estimation method and system based on cross-language pre-training model

Publications (2)

Publication Number Publication Date
CN114386437A true CN114386437A (en) 2022-04-22
CN114386437B CN114386437B (en) 2022-09-27

Family

ID=81201540

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210035223.6A Active CN114386437B (en) 2022-01-13 2022-01-13 Mid-orientation translation quality estimation method and system based on cross-language pre-training model

Country Status (1)

Country Link
CN (1) CN114386437B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116431831A (en) * 2023-04-18 2023-07-14 延边大学 Supervised relation extraction method based on label contrast learning
CN117436460A (en) * 2023-12-22 2024-01-23 武汉大学 Translation quality assessment method, device, equipment and storage medium
CN117910482A (en) * 2024-03-19 2024-04-19 江西师范大学 Automatic machine translation evaluation method based on depth difference characteristics

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150286632A1 (en) * 2014-04-03 2015-10-08 Xerox Corporation Predicting the quality of automatic translation of an entire document
CN110245364A (en) * 2019-06-24 2019-09-17 中国科学技术大学 The multi-modal neural machine translation method of zero parallel corpora
CN110390110A (en) * 2019-07-30 2019-10-29 阿里巴巴集团控股有限公司 The method and apparatus that pre-training for semantic matches generates sentence vector
CN110874536A (en) * 2018-08-29 2020-03-10 阿里巴巴集团控股有限公司 Corpus quality evaluation model generation method and bilingual sentence pair inter-translation quality evaluation method
CN111458148A (en) * 2020-04-26 2020-07-28 上海电机学院 CBAM-based convolutional neural network rolling bearing fault diagnosis method
CN112052692A (en) * 2020-08-12 2020-12-08 内蒙古工业大学 Mongolian Chinese neural machine translation method based on grammar supervision and deep reinforcement learning
CN112347795A (en) * 2020-10-04 2021-02-09 北京交通大学 Machine translation quality evaluation method, device, equipment and medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150286632A1 (en) * 2014-04-03 2015-10-08 Xerox Corporation Predicting the quality of automatic translation of an entire document
CN110874536A (en) * 2018-08-29 2020-03-10 阿里巴巴集团控股有限公司 Corpus quality evaluation model generation method and bilingual sentence pair inter-translation quality evaluation method
CN110245364A (en) * 2019-06-24 2019-09-17 中国科学技术大学 The multi-modal neural machine translation method of zero parallel corpora
CN110390110A (en) * 2019-07-30 2019-10-29 阿里巴巴集团控股有限公司 The method and apparatus that pre-training for semantic matches generates sentence vector
CN111458148A (en) * 2020-04-26 2020-07-28 上海电机学院 CBAM-based convolutional neural network rolling bearing fault diagnosis method
CN112052692A (en) * 2020-08-12 2020-12-08 内蒙古工业大学 Mongolian Chinese neural machine translation method based on grammar supervision and deep reinforcement learning
CN112347795A (en) * 2020-10-04 2021-02-09 北京交通大学 Machine translation quality evaluation method, device, equipment and medium

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116431831A (en) * 2023-04-18 2023-07-14 延边大学 Supervised relation extraction method based on label contrast learning
CN116431831B (en) * 2023-04-18 2023-09-22 延边大学 Supervised relation extraction method based on label contrast learning
CN117436460A (en) * 2023-12-22 2024-01-23 武汉大学 Translation quality assessment method, device, equipment and storage medium
CN117436460B (en) * 2023-12-22 2024-03-12 武汉大学 Translation quality assessment method, device, equipment and storage medium
CN117910482A (en) * 2024-03-19 2024-04-19 江西师范大学 Automatic machine translation evaluation method based on depth difference characteristics
CN117910482B (en) * 2024-03-19 2024-05-28 江西师范大学 Automatic machine translation evaluation method based on depth difference characteristics

Also Published As

Publication number Publication date
CN114386437B (en) 2022-09-27

Similar Documents

Publication Publication Date Title
CN114386437B (en) Mid-orientation translation quality estimation method and system based on cross-language pre-training model
Sharma et al. Efficient Classification for Neural Machines Interpretations based on Mathematical models
CN110502749B (en) Text relation extraction method based on double-layer attention mechanism and bidirectional GRU
Nwankpa et al. Activation functions: Comparison of trends in practice and research for deep learning
CN112667818B (en) GCN and multi-granularity attention fused user comment sentiment analysis method and system
CN112163426A (en) Relationship extraction method based on combination of attention mechanism and graph long-time memory neural network
CN107729311B (en) Chinese text feature extraction method fusing text moods
Wang et al. Text generation based on generative adversarial nets with latent variables
CN109214006B (en) Natural language reasoning method for image enhanced hierarchical semantic representation
CN108959482A (en) Single-wheel dialogue data classification method, device and electronic equipment based on deep learning
CN110516070B (en) Chinese question classification method based on text error correction and neural network
CN111259768A (en) Image target positioning method based on attention mechanism and combined with natural language
CN111966812A (en) Automatic question answering method based on dynamic word vector and storage medium
US20230237725A1 (en) Data-driven physics-based models with implicit actuations
Siddique et al. English to bangla machine translation using recurrent neural network
KR20190134965A (en) A method and system for training of neural networks
CN113361617A (en) Aspect level emotion analysis modeling method based on multivariate attention correction
CN112686056A (en) Emotion classification method
Ghorbani et al. Auto-labelling of markers in optical motion capture by permutation learning
CN114254645A (en) Artificial intelligence auxiliary writing system
Senthilkumar et al. An AI-based chatbot using deep learning
CN114492459A (en) Comment emotion analysis method and system based on convolution of knowledge graph and interaction graph
Javaid et al. A Novel Action Transformer Network for Hybrid Multimodal Sign Language Recognition.
CN114757310B (en) Emotion recognition model and training method, device, equipment and readable storage medium thereof
CN110543567A (en) Chinese text emotion classification method based on A-GCNN network and ACELM algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant