CN116956946A

CN116956946A - Machine translation text fine granularity error type identification and positioning method

Info

Publication number: CN116956946A
Application number: CN202310863738.XA
Authority: CN
Inventors: 潘丽婷; 陈件; 张井
Original assignee: Shanghai Yizhe Information Technology Co ltd
Current assignee: Shanghai Yizhe Information Technology Co ltd
Priority date: 2023-07-14
Filing date: 2023-07-14
Publication date: 2023-10-27

Abstract

The invention discloses a machine translation fine granularity error type identification and positioning method, which relates to the technical field of machine translation identification. Therefore, the fine-granularity error type prediction is regarded as a text sequence labeling task, the original text and the machine translation are input, the text characteristics of the original text and the translation are obtained through the double-tower encoder, the cross characteristics between the original text and the translation are obtained through the cross encoder, and finally the word level error type of the machine translation can be output, so that a user can conveniently and quickly find out the error type and the error position of the machine translation. The method solves the problem that the conventional coarse-grained evaluation method only needs to evaluate the whole translation of the machine and needs the translator to search the error position by himself, so that inconvenience exists in the use process.

Description

Machine translation text fine granularity error type identification and positioning method

Technical Field

The invention relates to the technical field of machine translation recognition, in particular to a machine translation text fine granularity error type recognition and positioning method.

Background

With the popularity of machine translation, machine translation text quality assessment is becoming a focus of attention for post-translation editing work. The general evaluation form is expressed as a translation quality score and a translation error type mark.

These evaluation methods are coarse-grained evaluation, and only provide an overall evaluation of the translated version of the machine, but have limited help for post-translation editing and also require the translator to find the error location by himself. Thus, there is a problem in that it is inconvenient in use.

Disclosure of Invention

Aiming at the defects existing in the prior art, the invention aims to provide a machine translation text fine granularity error type identification and positioning method, which aims to solve the technical problems.

In order to achieve the above purpose, the present invention provides the following technical solutions: a method for identifying a type of error of fine granularity of a machine translation text, the method comprising the steps of:

step one: acquiring original text features and machine translation text features, and segmenting the original text features and the machine translation text features to obtain a set of words or characters segmented by the original text and a set of words or characters segmented by the machine translation text;

step two: generating an original text word vector and an original text position vector from a set of words or characters of the original text after word segmentation, and adding the original text word vector and the original text position vector to obtain a vector representation of the original text word or character; regularizing the original text word vector and the original text position vector to obtain an original text word or character vector representation set;

Step three: generating a translation word vector and a translation position vector from a set of words or characters of the translated text after word segmentation, and adding the translation word vector and the translation position vector to obtain a vector representation of the translation word; regularizing the translation word vector and the translation position vector to obtain a translation word or character vector characterization set;

step four: inputting the vector representation of the original text word or character, and outputting the vector representation reflecting the context feature of the original text;

step five: inputting the vector representation of the translation words or characters, and outputting the vector representation reflecting the context characteristics of the translation;

step six: inputting the vector representation of the original text word or character, taking the vector representation of the translated text word or character as a benchmark, and outputting an interaction vector of the vector representation of the original text word or character and the vector representation of the translated text word or character;

step seven: adding and splicing the vector characterization of the translated word or character and the interaction vector, and outputting a spliced vector;

step eight: and inputting the splice vector, outputting the error type prediction probability distribution of all the translated words or characters, selecting the error type with the highest probability for each translated word or character, and outputting the prediction value corresponding to the error type of the word level according to the assignment of different error types.

As a further scheme of the invention: the method for word segmentation of the original text and the machine translation text in the first step further comprises the following steps: obtaining text characteristics text of original text ^src Text feature text of sum machine translation text ^tgt Text character text of original text by using space and punctuation marks ^src Text feature text of sum machine translation text ^tgt Performing word segmentation, and judging whether the word subjected to word segmentation exists in a multilingual dictionary;

if the word exists in the multilingual dictionary, the word is marked as token;

if the word does not exist in the multi-language dictionary, character-by-character cutting is carried out from the last character of the word to the first character of the word, the word is cut into two sub-words, and the sub-words are respectively marked as a front character string and a rear character string until the front character string exists in the multi-language dictionary; the front character string is regarded as a qualified sub word of the word and is marked as a token; repeating character cutting operation on the rear character string until the text of the original text ^src Text feature text of sum machine translation text ^tgt Can be composed of qualified subword token;

obtaining the TOKENS set of words or characters after the word segmentation of the original text ^src ：

Word or character set token after word segmentation of machine translation text ^tgt ：

Wherein n represents n original words or characters after original word segmentation, and m represents m translated words or characters after translation word segmentation.

As a further scheme of the invention: in the second step,: word or character set token after segmentation from original text ^src Wherein the original text word vector is generated according to the following formula

Generating a textual location vector according to the following formula

The original text word vector is calculated according to the following formulaAnd original text position vector->Adding to obtain vector representation of original word or character +.>

Vector of original text wordsAnd original text position vector->Regularizing to output vector representation of original word or character according to the following formula>Set E of (2) ^src ：

wherein ,is the i-th original word or character, < ->Is the position of the i-th original word or character.

As a further scheme of the invention: in the third step: word or character set token after text segmentation of text translated from machine ^tgt In which a translation word vector is generated according to the following formula

Generating a translation position vector according to the following formula

The translated word vector is calculated according to the following formulaAnd translation position vector->Adding to obtain vector representation of translation word or character +.>

Will translate word vectorAnd translation position vector->Regularizing to output the vector representation of translated word or character according to the following formula>Set E of (2) ^tgt ：

As a further scheme of the invention: in the fourth step: constructing a 6-layer 12-head self-attention layer original text encoder, inputting vector representation of original text words or characters, and outputting vector representation reflecting the context characteristics of the original text;

each self-attention layer consists of an input vector part, a QKV vector generation part with 12 heads, an attention calculation part and an output original text vector part;

in the input vector section, the input vector of the first layer attention layerRepresentation of vectors for textual words or characters +.>Input vector of 2-6 attention layer +.>Hidden vector outputted for the upper attention layer +.>

wherein ,is the original text input vector of the kth self-attention layer,>when k=1, the original input vector +.>Set E characterizing textual words or character vectors ^src When 1<When k is less than or equal to 6, the original text input vector is +.>Concealment vector for k-1 layer output +.>

In the QKV vector generation part of 12 heads, the vector is input by the original textRespectively generating an original text query vector, a key vector and a value vector, and respectively mapping the query vector, the key vector and the value vector into 12 different subspaces, wherein each subspace represents different text characteristics;

In the attention calculating part, similarity between the query vector and the key vector used for calculating different attention heads is calculated, attention weight scores among the original words or characters are generated, and the attention weight scores and the value vectors are multiplied to obtain the original context vectors of the different attention heads; will be differentThe original context vector of the attention head is spliced and mapped, and a residual block I is added _k Generating a new context vector after regularization;

in the output vector section, the context vector of the original text is processed, and the hidden vector of the attention layer is output.

As a further scheme of the invention: in the fifth step: constructing a translation encoder of a 6-layer 12-head self-attention layer, inputting a vector representation of a translation word or character, and outputting a vector representation reflecting the context feature of the translation;

each self-attention layer consists of an input vector part, a QKV vector generation part with 12 heads, an attention calculation part and an output translation text vector part;

in the input vector section, the input vector of the first layer attention layerRepresentation of vectors for translation words or characters +.>Input vector of 2-6 attention layer +.>Hidden vector outputted for the upper attention layer +. >

wherein ,is the translation input vector of the kth layer self-attention layer,>when k=1, translation is inputVector->Set E for characterizing translation words or character vectors ^tgt When 1<When k is less than or equal to 6, the translation input vector is +.>Concealment vector for k-1 layer output +.>

In the QKV vector generation part of 12 heads, a vector is input by a translationRespectively generating a translation query vector, a key vector and a value vector, and respectively mapping the query vector, the key vector and the value vector into 12 different subspaces, wherein each subspace represents different text characteristics;

in the attention calculating part, similarity between the query vector and the key vector used for calculating different attention heads is calculated, attention weight scores among the words or characters of each translation are generated, and the attention weight scores and the value vectors are multiplied to obtain translation context vectors of different attention heads; splicing and mapping the translation context vectors of different attention heads, and adding a residual block I _k Generating a new translation context vector after regularization;

in the output vector section, a context vector of the translation is processed, and a hidden vector of the attention layer is output.

As a further scheme of the invention: in the sixth step: cross encoder for constructing 6 layers of 12 head attention layers and vector characterization of input original words or characters Vector characterization of a translation word or character +.>

Each crossed self-attention layer consists of an input vector part, a QKV vector generation part with 12 heads, an attention calculation part and an output original text vector part; wherein the query vector of the cross encoder is derived from the translated query vector or the interaction vector of the cross attention layer of the upper layer, and the key vector and the value vector of the cross encoder are derived from the key vector and the value vector of the original text;

in the input vector section, each layer of cross-attention layers is characterized by a set of translation-based word vectorsAnd word vector characterization set based on original text +.>Composition;

wherein ,is a word vector representation set based on translation, < ->In the first layer cross-attention layer, < >>Last layer word hidden vector from translation encoderIn the second to six layers of cross-attention layers, -/-, a third layer of cross-attention layer is formed>Is a hidden vector from the cross attention layer output of a layer on the translation encoder +.>Is a word vector representation set based on the original text,

in a QKV vector generation part of 12 heads, generating a query vector by using an input vector taking a translation as a reference, respectively generating a key vector and a value vector by using an input vector taking an original text as a reference, setting 12 attention heads, and respectively mapping the translation query vector, the original key vector and the original value vector into 12 different subspaces;

In the attention calculating part, learning interaction information of the original text and the translated text, calculating similarity between translated text query vectors and original text key vectors of different attention heads, generating attention weight scores between each translated text word and the original text word, and multiplying the attention weight scores and the original text value vectors to obtain interaction vectors between the translated text and the original text of different attention heads; splicing and mapping interaction vectors of different attention heads, and adding residual blocksGenerating a new interaction vector representation after regularization;

in the output vector section, the interaction vector representation is processed, and the interaction vector of the vector representation of the text word or character and the vector representation of the translated word or character in the attention layer is output.

As a further scheme of the invention: in step eight, the vector H is spliced ^o As the input vector of the ERROR type prediction module, after the input vector is processed, the ERROR type prediction probability distribution ERROR of all translated words or characters is output:

ERROR＝softmax(Layernorm(H ^o W ^error1 )W ^error2 )；

ERROR is the ERROR type predictive probability distribution for all translated words or characters,W ^error1 is the first full connection layer, +.>W ^error2 Is the second full-connection layer, which is the first full-connection layer,

the invention also provides a method for positioning the error type of the fine granularity of the machine translation text, which realizes the positioning of the error word after the error type of the word is identified by the method for identifying the error type of the fine granularity of the machine translation text.

The invention also provides a training method for identifying the error type of the fine granularity of the machine translation text, which is used for training the error type identification method of the fine granularity of the machine translation text, identifying the predicted value corresponding to the error type, and comprises the following steps:

step one: the translation person performs correction and labeling of error types on the machine translation text, labels a word-level specific machine translation text sentence segment, labels a corresponding real error type based on the error type of the machine translation text fine-granularity error type identification method, and labels a real value corresponding to the real error type according to assignment of different error types; the default error type of the translation sentence segment without manual annotation is error-free;

step two: the cross entropy mean of the true and predicted values for all word error types of a translated version is calculated by the following formula:

wherein ,y_i, Is true that the ith translation word or character belongs to the jth error type probabilityThe real value, if the word belongs to a certain error type, then the corresponding y is 1, the y of other error types is 0,is the predicted value of the probability that the ith translation word or character belongs to the jth error type,/for the translation word or character>The value is 0 to 1, m is the number of translated words, and l is the number of error types.

Compared with the prior art, the invention has the following beneficial effects:

the present invention can help a translator better locate and modify a translation error using such fine-grained evaluation. Therefore, the fine-granularity error type prediction is regarded as a text sequence labeling task, the original text and the machine translation are input, the text characteristics of the original text and the translation are obtained through the double-tower encoder, the cross characteristics between the original text and the translation are obtained through the cross encoder, and finally the word level error type of the machine translation can be output, so that a user can conveniently and quickly find out the error type and the error position of the machine translation. The method solves the problems that the traditional evaluation modes are coarse-grained evaluation, only the whole evaluation of the translation of the machine is provided, but the assistance of the translated editing is limited, and the translator is required to search the error position by himself. Thus, there is a problem in that it is inconvenient in use.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, it being obvious that the drawings described below are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow diagram of a method for identifying fine granularity error types of machine translation;

FIG. 2 is an interface diagram of a machine translation fine-grained error type identification process.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is evident that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In the description of the present invention, it should be understood that the terms "length," "width," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like indicate orientations or positional relationships based on the orientation or positional relationships shown in the drawings, merely to facilitate describing the present invention and simplify the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and therefore should not be construed as limiting the present invention.

Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the present invention, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.

Embodiment one:

referring to fig. 1 to 2, a method for identifying fine granularity error types of machine translation text, the method comprises the following steps:

step one: acquiring original text features and machine translation text features through a double-tower encoder, and segmenting the original text features and the machine translation text features to obtain a set of words or characters after segmentation of the original text and a set of words or characters after segmentation of the machine translation text; preferably, the maximum limit length of the original text and the machine translated text is 512 words (characters).

step eight: inputting splice vectors, outputting error type prediction probability distribution of all translated words or characters, selecting the error type with the highest probability for each translated word or character, and outputting a predicted value corresponding to the error type of the word level according to assignment of different error types;

Step nine: by manually labeling error type data, a cross entropy loss function is designed, and model weights are trained by using a random gradient descent method and an error back propagation method.

The invention adopts a quality evaluation mode of fine granularity evaluation, and can realize the word or character position of a specific machine translation, as shown in fig. 2, for example, if the machine translation 'The major projects of Trinity Group' has term translation errors, the 'mistranslation' label needs to be marked to the specific word 'Trinity', and other words are marked as 'no errors' by default. Such fine-grained evaluation can help a translator better locate and modify translation errors. Therefore, the fine-granularity error type prediction is regarded as a text sequence labeling task, the original text and the machine translation are input, the text characteristics of the original text and the translation are obtained through the double-tower encoder, the cross characteristics between the original text and the translation are obtained through the cross encoder, and finally the word level error type of the machine translation can be output, so that a user can conveniently and quickly find out the error type and the error position of the machine translation.

Embodiment two:

on the basis of implementing the step one, the method for word segmentation of the original text and the machine translation text in the step one further comprises the following steps: obtaining text characteristics text of original text ^src Text feature text of sum machine translation text ^tgt Text character text of original text by using space and punctuation marks ^src Text feature text of sum machine translation text ^tgt Performing word segmentation, and judging whether the word subjected to word segmentation exists in a multilingual dictionary;

if the word exists in the multilingual dictionary, the word is marked as token;

if the word does not exist in the multilingual dictionary, the word is segmented by using a greedy matching algorithm of the longest character string, specifically, character-by-character segmentation is carried out from the last character of the word to the first character of the word, the word is segmented into two sub-words, and the sub-words are marked as a front character string and a rear character string respectively until the front character string exists in the multilingual dictionary; the front character string is regarded as a qualified sub word of the word and is marked as a token; repeating character cutting operation on the rear character string until the text of the original text ^src Text feature text of sum machine translation text ^tgt Can be composed of qualified subword token;

Embodiment III:

on the basis of implementation one, in the second step: word embedding is carried out on words or characters of the original text, the words or characters of each original text are mapped into 768-dimensional word vectors, the positions of the words or characters are mapped into 768-dimensional position vectors, and the words or characters are segmented from the original text to form a TOKENS set ^src Wherein the original text word vector is generated according to the following formula

Generating a textual location vector according to the following formula

The original text word vector is calculated according to the following formulaAnd original text position vector->Adding to obtain original words or charactersVector characterization->

Embodiment four:

on the basis of implementing the first step, in the third step: word embedding is carried out on words or characters of original texts, the words or characters of each original text are mapped into 768-dimensional word vectors, the positions of the words or characters are mapped into 768-dimensional position vectors, and a set TOKENS of words or characters after text word segmentation is translated from a machine ^tgt In which a translation word vector is generated according to the following formula

Generating a translation position vector according to the following formula

Will translate word vectorAnd translation position vector->Regularizing to output the vector representation of translated word or character according to the following formula >Set E of (2) ^tgt ：

Fifth embodiment:

on the basis of implementing one, in the fourth step: constructing a 6-layer 12-head self-attention layer original text encoder, and inputting vector representation of original text words or charactersOutputting vector characterization reflecting the context characteristics of the original text;

wherein ,is the original text input vector of the kth self-attention layer,>when k=1, the original input vector +.>Set E characterizing textual words or character vectors ^src When 1<When k is less than or equal to 6, the original text input vector is +.>Concealment vector for k-1 layer output +.> W above is a trainable weight.

In the QKV vector generation part of 12 heads, the vector is input by the original textAnd respectively generating an original text query vector, a key vector and a value vector, wherein the query vector is a query vector of an original text word and is suitable for calculating the association degree of the word and key vectors of other words, the key vector is a key vector of the original text word and is suitable for calculating the association degree of the other word query vector and the word, and the value vector is a value vector of the original text word and is suitable for constructing a new vector representation of the other words according to the attention weight. In order to learn different original text features, 12 attention heads are set, and query vectors, key vectors and value vectors are respectively mapped into 12 different subspaces, wherein each subspace represents different text features.

wherein ,Q^src Is the original text query vector, and the original text query vector,W ^src_quety is the weight of the original text input variable mapped into query vector,>K ^src is an original text key vector,/>W ^src_key Is the weight of the original text input variable mapped into key vector,/for> V ^src Is the value vector of the original text and,W ^src_value is the weight of the original text input variable mapped to the value vector, is the original text query vector of the j-th attention header,> is the weight of the original text query vector mapped to the j-th attention head vector,/-, and> is the original key vector of the j-th attention head,> is the weight of the original text key vector mapped to the j-th attention head vector,/for the attention head vector> Is the original value vector of the j-th attention header, is the original text valuThe e-vector maps to the weight of the j-th attention header vector, w above is a trainable weight.

In order to learn context information of the original text, similarity between query vectors and key vectors of different attention heads is calculated, attention weight scores between words or characters of the original text are generated through a softmax function, and the attention weight scores and the value vectors are multiplied to obtain the context vectors of the original text of the different attention heads. Finally, the context vectors of the different attention heads are spliced and mapped, and a residual block I is added _k And generating a new context vector after regularization.

wherein ,is the original context vector representation of the j-th attention header,/for example> Is the original context vector representation formed by splicing and integrating,/-> Is the weight of the original text vector splice, +.> For avoiding->Excessive variance of d ^src Default to 64. W above is a trainable weight.

In the output vector part, context vector representation of the original text outputs a hidden vector of the attention layer after a series of calculation operations of two full connection layers, an activation function, a residual block and regularization.

wherein ,is a first full-connection layer, < >> Is a second full-connection layer, +.> Is the hidden vector of the kth layer output from the attention layer,/->W above is a trainable weight.

Example six:

on the basis of implementing one, in the fifth step: constructing a translation encoder of a 6-layer 12-head self-attention layer, inputting a vector representation of a translation word or character, and outputting a vector representation reflecting the context feature of the translation;

in the input vector section, the input vector of the first layer attention layer Representation of vectors for translation words or characters +.>Input vector of 2-6 attention layer +.>Hidden vector outputted for the upper attention layer +.>/>

wherein ,is the translation input vector of the kth layer self-attention layer,>when k=1, translation input vector +.>Set E for characterizing translation words or character vectors ^tgt When 1<When k is less than or equal to 6, the translation input vector is +.>Concealment vector for k-1 layer output +.> W above is a trainable weight.

In the QKV vector generation part of 12 heads, a vector is input by a translationRespectively generating a translation query vector, a key vector and a value vector, and respectively mapping the query vector, the key vector and the value vector into 12 different subspaces for learning text features of different translations, wherein each subspace represents different text features;

wherein ,Q^tgt Is the translation query vector of the translation,W ^tgt_query is the weight of the translation input variable mapped into query vector,/for>K ^tgt Is a translation key vector,/->W ^tgt_key Is the weight of the translation input variable mapped into key vector,/for> V ^tgt Is a value vector of the translation,W ^tgt_value is the weight of the value vector mapped by the translation input variable, is the translation query vector of the j-th attention header,> is the weight of the translation query vector mapped to the j-th attention header vector,/-, and> is the translation key vector of the j-th attention header, > Is the weight of the translation key vector mapped to the j-th attention head vector,/-, and> is the translation value vector of the j-th attention header, is the weight of the translation value vector mapped to the j-th attention header vector, w above is a trainable weight.

In order to learn the context information, the attention calculating part calculates the similarity between the query vector and the key vector of different attention heads, generates attention weight scores among the words or characters of each translation through a softmax function, and multiplies the attention weight scores and the value vector to obtain the context vector of the translation of different attention heads; splicing and mapping the translation context vectors of different attention heads, and adding a residual block I _k Generating a new translation context vector after regularization;

wherein ,is a translation context vector characterization of the jth attention header,/for the j> Is a translation context vector representation formed by splicing and integrating,/-> Is the weight of translation vector concatenation, +.> For avoiding->Excessive variance of d ^tgt Default to 64. W above is a trainable weight.

In the output vector part, after the translation context vector representation is subjected to a series of calculation operations of two full connection layers, an activation function, a residual block and regularization, a hidden vector of the attention layer is output;

Embodiment seven:

based on implementation five and six, in the step six: cross encoder for constructing 6 layers of 12 head attention layers and vector characterization of input original words or charactersVector characterization of a translation word or character +.>Outputting the text interaction vector based on the translation. The structure of the cross encoder is similar to that of the original encoder and the translated text encoder, the remarkable difference is that the query vector of the cross encoder is derived from the translated text vector or the interaction vector of the cross attention layer of the upper layer, the key vector and the value vector are derived from the original text vector, the purpose of learning the relation between the original text and the translated text is achieved, the original text encoder and the translated text encoder are both self-learning attention layers, and all QKV vectors (the query vector, the key vector and the value vector) are derived from the original text only or derived from the translated text only, and the purpose of learning the characteristics inside the text is achieved.

Each crossed self-attention layer consists of an input vector part, a QKV vector generation part with 12 heads, an attention calculation part and an output original text vector part;

In the input vector section, each layer of cross-attention layers is characterized by a set of translation-based word vectorsAnd word vector characterization set based on original text +.>Composition; />

wherein ,is a word vector representation set based on translation, < ->In the first layer cross-attention layer, < >>Last layer word hidden vector from translation encoderIn the second to six layers of cross-attention layers, -/-, a third layer of cross-attention layer is formed>Is a hidden vector from the cross attention layer output of a layer on the translation encoder +.>Is a word vector representation set based on the original text,w above is a trainable weight.

wherein ,Q^cross Is the cross-attention layer query vector,W ^cross_query is a cross-attention layerWeights of query vector, ++>K ^cross Is a cross-attention layer key vector,W ^cross_key is the weight of the cross-attention layer input variable mapped to a key vector,V ^cross is a cross attention layer value vector, < - >W ^cross_value Is the weight of the cross attention layer input variable mapped to value vector, +.> Is the cross attention layer query vector of the j-th attention head,/for the j-th attention head> Is the weight of the cross attention layer query vector mapped to the j-th attention head vector,/> Is the cross attention layer key vector of the jth attention head,/for the j-th attention head> Is the weight of the cross attention layer key vector mapped to the j-th attention head vector,/for the cross attention layer key vector> Is the cross attention layer value vector of the jth attention head,/for the j-th attention head> Is the weight of the cross attention layer value vector mapped to the j-th attention head vector, +.>W above is a trainable weight.

In the attention calculating part, learning interaction information of the original text and the translated text, calculating similarity between translated text query vectors and original text key vectors of different attention heads, generating attention weight scores between each translated text word and the original text word through a softmax function, and multiplying the attention weight scores and the original text value vectors to obtain interaction vectors between the translated text and the original text of different attention heads; splicing and mapping interaction vectors of different attention heads, and adding residual blocksGenerating a new interaction vector representation after regularization;

wherein ,is a cross vector characterization of the jth attention head,/ > Is a cross vector representation formed by splicing and integrating +.> Is the weight of the interaction vector splice, +.>d ^cross Default to 64. W above is a trainable weight.

In the output vector part, after a series of calculation operations of two full connection layers, an activation function, a residual block and regularization, the interaction vector of the vector representation of the original word or character in the attention layer and the interaction vector of the vector representation of the translated word or character are output.

wherein ,is the first full connection layer, +.> Is the second full connection layer, +.> Is the hidden vector of the kth layer output from the attention layer,/->W above is a trainable weight.

Example eight:

on the basis of implementing seven, the quality of the machine translation is influenced by the translation accuracy and the translation fluency, the interaction vector of the original text and the translation reflects the translation accuracy, and the translation vector reflects the fluency of the translation, so that the two factors are comprehensively considered, the vector representation and the interaction vector of the translation word or character are added, and the spliced vector is output.

wherein ,H^o Is a splice vector of the two-dimensional data, is the last layer vector representation of the cross encoder, is-> Is the last layer vector representation of the translation encoder,/->

Example nine:

On the basis of implementation eight, in step eight, to spellVector H ^o As the input vector of the ERROR type prediction module, after the input vector is processed, the ERROR type prediction probability distribution ERROR of all translated words or characters is output:

ERROR＝softmax(Layernorm(H ^o W ^error1 )W ^error2 )；

and outputting a predicted value corresponding to the error type of the word level according to the assignment of the different error types. In the present invention, l is the number of error types, defaulting to 20, the error types are specifically as follows:

/>

example ten:

referring to fig. 2, the present embodiment further provides a method for locating a fine-granularity error type of a machine translation text, where the locating method implements locating of an error word after identifying an error type of the word by using the method for identifying a fine-granularity error type of a machine translation text.

Example eleven:

in order to train the above steps of Embedding and W, the present patent uses 12 ten thousand Chinese-English translation data samples, please refer to fig. 1, and this embodiment further provides a training method for identifying a machine translation fine granularity error type, which is used for training a machine translation fine granularity error type identifying method to identify a predicted value corresponding to an error type, where the training method includes:

Step one: the translation person performs correction and labeling of error types on the machine translation text, labels a word-level specific machine translation text sentence segment, labels a corresponding real error type based on the error type of the machine translation text fine-granularity error type identification method, and labels a real value corresponding to the real error type according to assignment of different error types; the default error type of the translation sentence segment without manual annotation is error-free; when there is an error in person in the translation, the error word is marked by bold and underline as shown in the table below:

step two: using the cross entropy function (CE) as a loss function, the cross entropy mean value loss of the true and predicted values for all word error types of the translated version is computed by the following formula: the random gradient descent algorithm and the error back propagation algorithm (prior art) are used for training the Embedding and the W, the training iteration epoch is 20, the batch size is 64, the learning rate is 0.00005, and the weight decay rate is 0.01.

/>

wherein ,y_i,j Is the true value of the probability that the ith translation word or character belongs to the jth error type, if the word belongs to a certain error type, then the corresponding y is 1, the y of the other error types is 0, Is the predicted value of the probability that the ith translation word or character belongs to the jth error type,/for the translation word or character>The value is 0 to 1, m is the number of translated words, and l is the number of error types.

The above description is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above examples, and all technical solutions belonging to the concept of the present invention belong to the protection scope of the present invention. It should be noted that modifications and adaptations to the present invention may occur to one skilled in the art without departing from the principles of the present invention and are intended to be within the scope of the present invention.

Claims

1. A method for identifying a fine granularity error type of a machine translation text, which is characterized by comprising the following steps:

2. The method for identifying fine granularity error types of machine-translated text according to claim 1, wherein the method for word segmentation of original text and machine-translated text in the step one further comprises: obtaining text characteristics text of original text ^src Text feature text of sum machine translation text ^tgt Text character text of original text by using space and punctuation marks ^src Text feature text of sum machine translation text ^tgt Performing word segmentation, and judging whether the word subjected to word segmentation exists in a multilingual dictionary;

if the word exists in the multilingual dictionary, the word is marked as token;

3. The method for identifying the type of the fine granularity error of the machine translation text according to claim 2, wherein in the second step: word or character set token after segmentation from original text ^src Wherein the original text word vector is generated according to the following formula

Generating a textual location vector according to the following formula

4. A machine translation fine granularity error type identification method according to claim 3 whichCharacterized in that in the third step: word or character set token after text segmentation of text translated from machine ^tgt In which a translation word vector is generated according to the following formula

Generating a translation position vector according to the following formula

The translated word vector is calculated according to the following formula And translation position vector->Adding to obtain vector representation of translation word or character +.>

5. The method for identifying fine granularity error types of machine translation according to claim 4, wherein in the fourth step: constructing a 6-layer 12-head self-attention layer original text encoder, inputting vector representation of original text words or characters, and outputting vector representation reflecting the context characteristics of the original text;

in the input vector section, the input vector of the first layer attention layerVector characterization for textual words or charactersInput vector of 2-6 attention layer +.>Hidden vector outputted for the upper attention layer +.>

in the attention calculating part, similarity between the query vector and the key vector used for calculating different attention heads is calculated, attention weight scores among the original words or characters are generated, and the attention weight scores and the value vectors are multiplied to obtain the original context vectors of the different attention heads; splicing and mapping the context vectors of the different attention heads, and adding a residual block I _k Generating a new context vector after regularization;

6. The method for identifying fine granularity error types of machine translation according to claim 5, wherein in the fifth step: constructing a translation encoder of a 6-layer 12-head self-attention layer, inputting a vector representation of a translation word or character, and outputting a vector representation reflecting the context feature of the translation;

in the input vector section, the input vector of the first layer attention layerVector characterization for translated words or charactersInput vector of 2-6 attention layer +.>Hidden vector outputted for the upper attention layer +.>

wherein ,is the translation input vector of the kth layer self-attention layer,>when k=1, translation input vector +.>Set E for characterizing translation words or character vectors ^tgt When 1<When k is less than or equal to 6, the translation input vector is +.>Concealment vector for k-1 layer output +.>

7. The method for identifying fine-grained error types of machine translation according to claim 6, wherein in the step six:

cross encoder for constructing 6 layers of 12 head attention layers and vector characterization of input original words or charactersVector characterization of a translation word or character +.>

wherein ,is a word vector representation set based on translation, < ->In the first layer cross-attention layer, < > >Last layer word hidden vector from translation encoderIn the second to six layers of cross-attention layers, -/-, a third layer of cross-attention layer is formed>Is a hidden vector from the cross attention layer output of a layer on the translation encoder +.> Is a word vector representation set based on the original text,

in the attention calculating section, learningThe interaction information of the original text and the translated text is learned, similarity between the translated text query vectors and the original text key vectors of different attention heads is calculated, attention weight scores between each translated text word and the original text word are generated, and the interaction vectors between the translated text and the original text of different attention heads are obtained by multiplying the attention weight scores and the original text value vectors; splicing and mapping interaction vectors of different attention heads, and adding residual blocksGenerating a new interaction vector representation after regularization;

8. The method of claim 2, wherein in step eight, the vector H is concatenated ^o As the input vector of the ERROR type prediction module, after the input vector is processed, the ERROR type prediction probability distribution ERROR of all translated words or characters is output:

ERROR＝softmax(Layernorm(H ^o W ^error1 )W ^error2 )；

9. the method for locating the error type of the fine granularity of the machine translation text is characterized in that the method for locating the error word is realized after the error type of the word is identified according to the method for identifying the error type of the fine granularity of the machine translation text in any one of claims 1-8.

10. A training method for identifying a machine-translated text fine-granularity error type, characterized in that it is used for training a machine-translated text fine-granularity error type identification method according to any one of claims 1 to 8, to identify a predicted value corresponding to an error type, said training method comprising:

step one: the translation person collates the machine translation text and marks the error type, and marks the machine translation text sentence segment with specific word level; marking a corresponding real error type based on the error type of the machine translation fine granularity error type identification method; marking a true value corresponding to the true error type according to the assignment of different error types; the default error type of the translation sentence segment without manual annotation is error-free;

Step two: the cross entropy mean value loss of the true and predicted values for all word error types of the translated version is calculated by the following formula:

wherein ,y_i, Is the true value of the probability that the ith translation word or character belongs to the jth error type, if the word belongs to a certain error type, then the corresponding y is 1, the y of the other error types is 0,is the predicted value of the probability that the ith translation word or character belongs to the jth error type,/for the translation word or character>The value of the water-based paint is 0 to 1,m is the number of translated words and l is the number of error types.