CN116011470A

CN116011470A - Translation, countermeasure sample generation and model robustness enhancement method and related device

Info

Publication number: CN116011470A
Application number: CN202211619814.4A
Authority: CN
Inventors: 韩雯; 方明; 陈霆; 刘鹏
Original assignee: Shandong Kexun Information Technology Co ltd
Current assignee: Shandong Kexun Information Technology Co ltd
Priority date: 2022-12-15
Filing date: 2022-12-15
Publication date: 2023-04-25

Abstract

The method comprises the steps of translating a text to be translated by using a pre-trained robust machine translation model to obtain a text translation result, wherein the robust machine translation model carries out contrast training by using a training contrast sample marked with a reference translation text label as training data, the training contrast sample is obtained by replacing a candidate synonym by a candidate synonym in a training source input text, the candidate synonym is information divergence of the text before and after disturbance information corresponding to a word in a candidate word set of adding the candidate word to be replaced according to the training source input text, and semantic similarity of the text before and after the word in the candidate word set is replaced, and the word is screened out from the candidate word set. According to the method, the training countermeasure sample is obtained under the disturbance as small as possible, the model is trained, the translation accuracy of the model is improved, the model training stage does not need to additionally construct a network, and the training efficiency is higher.

Description

Translation, countermeasure sample generation and model robustness enhancement method and related device

Technical Field

The present disclosure relates to the field of translation technologies, and in particular, to a method and an apparatus for translating, generating an countermeasure sample, and enhancing model robustness.

Background

Neural network machine translation (Neural Machine Tranlation, NMT) is an important research direction of machine translation task, and the security of neural network machine translation model is a problem that needs to be considered seriously at present, and research shows that the model can be made mistakes by applying a slight disturbance to the original input of the model to generate an countermeasure sample, and accordingly, the robustness of the model can be improved by performing countermeasure training on the model.

In existing work, one challenge training process for a machine translation model is to input an original sample into a generated challenge network, map the original sample into a distribution space in the network, find a text embedded representation therein that is subject to the same distribution as the input, obtain a challenge sample, and then perform a challenge training on the machine translation model based on the challenge sample. When the process generates the countermeasure sample, a countermeasure network needs to be constructed in advance, network parameter processing is complex, workload is complex, the generation efficiency of the countermeasure sample is low, and the training efficiency of the model is low.

The text challenge task is expressed as a constrained optimization problem, the existing optimization technologies such as gradient optimization and genetic algorithm optimization are utilized to solve, a synonym-replaced challenge sample is obtained, and then the challenge training is carried out on the machine translation model based on the challenge sample. However, the method only considers similar words of the word to be replaced, ignores the association degree between the semantics before and after replacement, and causes that the generated challenge sample has weak aggressiveness to the model, and the model trained based on the challenge sample has poor robustness, so that the trained model has low translation accuracy of the text to be translated.

Disclosure of Invention

In view of the foregoing, the present application has been made to provide a method and related apparatus for translating, countering sample generation, and model robustness enhancement, which are used to solve the problems of low accuracy of text translation and low model training efficiency in the prior art. The specific scheme is as follows:

in a first aspect, a translation method is provided, including:

acquiring a text to be translated in a first language;

translating the text to be translated by utilizing a pre-trained robust machine translation model to obtain a text translation result of a second language corresponding to the text to be translated;

The training process of the machine translation model with enhanced robustness comprises the following steps:

acquiring a training source input text, and determining a candidate word set of a word to be replaced in the training source input text, wherein the word to be replaced is a word segment with a semantic association degree with a context lower than a preset association degree threshold value in the training source input text, and the semantic similarity between the word in the candidate word set and the word to be replaced is higher than a preset similarity degree threshold value;

calculating semantic similarity of the text before and after the word replacement to be replaced of the words in the candidate word set;

calculating information divergence of the text before and after regularized disturbance information corresponding to the words in the candidate word set is added to the training source input text;

screening candidate synonyms corresponding to the words to be replaced from the candidate word set according to the semantic similarity and the information divergence, and taking the text after the candidate synonyms replace the words to be replaced as a training countermeasure sample;

and performing countermeasure training on the pre-trained machine translation model by taking the training countermeasure sample marked with the reference translation text label as training data to obtain the machine translation model with enhanced robustness.

In a second aspect, there is provided a challenge sample generating method comprising:

and screening candidate synonyms corresponding to the word to be replaced from the candidate word set according to the semantic similarity and the information divergence, and taking the text after the candidate synonyms replace the word to be replaced as a training countermeasure sample.

In a third aspect, a method for enhancing robustness of a machine translation model is provided, including:

generating the training challenge sample by the method;

and performing resistance training on a pre-trained machine translation model by taking the training resistance sample marked with the reference translation text label as training data to obtain a machine translation model with enhanced robustness, wherein the pre-trained machine translation model is obtained by taking the training source input text marked with the reference translation text label as training data.

In a fourth aspect, there is provided a translation apparatus comprising:

the text to be translated obtaining unit is used for obtaining the text to be translated of the first language;

the model translation unit is used for translating the text to be translated by utilizing a pre-trained robust machine translation model to obtain a text translation result of a second language corresponding to the text to be translated;

In a fifth aspect, there is provided an electronic device comprising: a memory and a processor;

the memory is used for storing programs;

the processor is configured to execute the program, implement the steps of the translation method as set forth in any one of the above, or implement the steps of the challenge sample generation method as set forth above, or implement the steps of the machine translation model robustness enhancement method as set forth above.

By means of the technical scheme, the text to be translated is obtained, the text to be translated is translated by means of the pre-trained machine translation model with enhanced robustness, and a text translation result corresponding to the text to be translated is obtained. Because the generation process of training the challenge sample for training the robust enhanced machine translation model considers the synonyms of the words to be replaced and the semantic similarity of the text before and after replacement, and regularized disturbance information is added on the input text of the training source instead of directly adding the disturbance information, the training challenge sample with strong challenge is generated under the condition that the least disturbance is added on the input text of the training source, the model is trained based on the challenge sample, the robustness of the robust enhanced machine translation model is improved, the text to be translated is translated based on the robust enhanced machine translation model, and a more accurate text translation result can be obtained; meanwhile, other networks are not required to be constructed in the training process of the machine translation model with the enhanced robustness, so that the efficiency of generating training countermeasure samples is higher, and the model training efficiency is higher.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:

FIG. 1 is a schematic flow chart of a translation method according to an embodiment of the present disclosure;

FIG. 2 illustrates a schematic diagram of a robust machine translation model;

FIG. 3 shows a schematic diagram of a training process of a robust machine translation model;

FIG. 4 illustrates a process diagram for determining a candidate word set for a word to be replaced in source input text;

FIG. 5 illustrates a process diagram for determining a saliency score for masked word segments;

FIG. 6 illustrates a schematic diagram of adding regularized disturbance information;

FIG. 7 is a flow chart of a challenge sample generation method according to an embodiment of the present disclosure;

FIG. 8 is a flow chart of a method for enhancing robustness of a machine translation model according to an embodiment of the present disclosure;

fig. 9 is a schematic structural diagram of a translation device according to an embodiment of the present application;

Fig. 10 is a schematic structural diagram of an challenge sample generating device according to an embodiment of the present application;

FIG. 11 is a schematic structural diagram of a robustness enhancement device for a machine translation model according to an embodiment of the present application;

fig. 12 is a block diagram of a hardware structure of an electronic device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

The translation method can be suitable for machine translation scenes, such as English-Chinese, chinese-English-Chinese and the like.

The translation method can be applied to the machine equipment in the machine translation scene; the method can also be applied to other devices which are communicated with the machine device, such as a server, a cloud or other terminals; alternatively, the method can be applied to both the machine device and other devices in communication with the machine device in the machine translation scenario.

Next, as shown in connection with fig. 1, the translation method of the present application may include the following steps:

and S100, acquiring a text to be translated.

Here, the text to be translated refers to a text in a first language to be translated, and the text to be translated can be input into a machine translation model with enhanced robustness for translation through the following steps, so that a text translation result in a second language is obtained.

And step S110, translating the text to be translated by utilizing a pre-trained robust machine translation model to obtain a text translation result corresponding to the text to be translated.

Alternatively, the above-mentioned robust machine translation model may be an Encoder-Decoder model, and as shown in fig. 2, a schematic structural diagram of a robust machine translation model is shown, where the robust machine translation model is a Seq2Seq sequence-to-sequence model that is widely used currently, and in fig. 2, "abc" represents an input sentence (i.e., text to be translated), and "ABCD" represents a generated output sentence (i.e., text translation result), and "< EOS >" represents a terminator mark.

Alternatively, referring to FIG. 3, the training process of the robust machine translation model may include:

Step 200, acquiring a training source input text, and determining a candidate word set of a word to be replaced in the training source input text.

The word to be replaced is a word segment with the semantic association degree with the context in the training source input text lower than a preset association degree threshold, and the semantic similarity between the word in the candidate word set and the word to be replaced is higher than a preset similarity degree threshold.

Specifically, the training source input text is input into the machine translation model, and the reference translation text of the training source input text can be obtained.

Alternatively, the training source input text may be text in a WMT19 dataset, where the WMT19 dataset contains chinese-english sentence pairs, and the dataset contains the contents as shown in table 1 below.

Table 1WMT19 dataset content

Data set	Training set	Test set
			WMT19	7270695	2983

In table 1, the WMT19 dataset contains a training set containing 7270695 number of training samples and a test set containing 2983 number of test samples.

Considering that when the number of the words included in the training source input text is greater than a preset number threshold, the attack difficulty is greater and the meaning is not great, therefore, the embodiment can use the text of which the number of the words included in the WMT19 dataset is less than or equal to the preset number threshold as the training source input text. Optionally, the number threshold is 50.

For example, training source input text may be: s= [ _today_is_not_a_good_day ], the corresponding reference translation text may be: y= [ _gjindin_not ] _good_day_child _ ].

In the machine translation scene, the training countermeasure sample can be obtained by replacing one or more segmented words in the training source input text, so that if the training countermeasure sample is generated, the training source input text is segmented first, and then the word to be replaced is selected from the segmented words contained in the training source input text.

Unlike the prior art in which the word to be replaced is randomly selected or preset, after the working principle of the machine translation model is studied in depth, the inventor finds that when the word with higher semantic association degree with the context in the training source input text is replaced as the word to be replaced, the machine translation model can often output a correct translation result according to the context semantic, and when the word with lower semantic association degree with the context in the training source input text is replaced as the word to be replaced, the machine translation model can not refer to the context semantic, or the context semantic information which can be referred to is less, so that the machine translation model can easily output an incorrect translation result. Therefore, the method and the device can take the word with the semantic association degree with the context lower than the preset association degree threshold value in the training source input text as the word to be replaced.

Further, it is necessary to determine which word segments can be used as word segments for replacing the word to be replaced, and for convenience of subsequent description, the word segments for replacing the word to be replaced in the training source input text are defined as candidate words.

In this step, the candidate word set may be determined by semantic similarity with the word to be replaced, where if the semantic similarity between a word segment and the word to be replaced is higher than a preset similarity threshold, the word segment is used as a candidate word of the word to be replaced, and multiple candidate word sets form the candidate word set. The synonyms of the words to be replaced can be accurately screened out through semantic similarity conditions, so that the aggression of training against samples is improved.

And step S210, calculating semantic similarity of the text before and after the word replacement to be replaced of the word in the candidate word set.

Specifically, the semantic similarity condition of the previous step can ensure that the semantic meaning of the word to be replaced is the same as that of the candidate word as much as possible, but after the candidate word is used for replacing the word to be replaced in the training source input text, the text to be replaced may be possibly caused to be not smooth, therefore, the semantic similarity of the text to be replaced and the training source input text after the word in the candidate word set is replaced can be calculated through the step, and the smoothness of the text to be replaced can be ensured through the semantic similarity of the text before and after the replacement.

Step S220, calculating information divergence of the text before and after regularized disturbance information corresponding to the words in the candidate word set is added to the training source input text.

Specifically, the regularized disturbance information corresponding to each candidate word in the candidate word set can be added to the training source input text respectively, the disturbance information is input into the machine translation model after disturbance, and the model can output the text translation result of the replacement text after each candidate word replaces the word to be replaced.

The regularized disturbance information can be added to the training source input text instead of directly adding the disturbance information, so that the machine translation model can output as large an output error as possible on the premise of adding the disturbance information as small as possible.

For example, the training source input text s= [ _today_is_not_a_good_day_day ], the reference translation text is y= [ _to_day_not_is_good_day_sub_ ], the word to be replaced is "good", and one candidate word of the word to be replaced is "beer", then this step adds disturbance information after regularization corresponding to "beer" to the training source input text, that is, corresponds to inputting the replacement text x= [ _day_is_not_a_beer_day_ ] to the machine translation model, and then the translation result Y' = [ _to_day_is_best_day_ ], which is output by the model, and the model is output in error.

In this step, the information divergence of the text before and after the disturbance can be calculated, and the information divergence can be used for measuring the similarity between the text before and after the disturbance, that is, the similarity between the text after the disturbance and the training source input text can be measured, wherein the smaller the calculated information divergence is, the higher the similarity between the text after the disturbance and the training source input text is represented.

And step 230, screening candidate synonyms corresponding to the to-be-replaced words from the candidate word set according to the semantic similarity and the information divergence, and taking the text after the candidate synonyms are replaced by the to-be-replaced words as a countermeasure sample.

Specifically, the information divergence obtained under disturbance regularization processing can be used as a hard constraint, meanwhile, semantic similarity soft constraints are overlapped, and under the condition that as small as possible disturbance is added, a training countermeasure sample which is more similar to the semantic meaning of the text input by the training source, more correct in grammar and smoother is obtained.

And step 240, performing resistance training on the pre-trained machine translation model by taking the training resistance sample marked with the reference translation text label as training data to obtain a machine translation model with enhanced robustness.

Here, the pre-trained machine translation model refers to a model obtained by training with the training source input text labeled with the reference translation text label as training data.

According to the method and the device, the pre-trained machine translation model is subjected to resistance training through training the resistance sample, so that the robustness of the machine translation model with enhanced robustness is improved, and the text translation result with higher accuracy can be obtained by the machine translation model with enhanced robustness.

In summary, the text to be translated is obtained, and the text to be translated is translated by utilizing a pre-trained robust machine translation model, so that a text translation result corresponding to the text to be translated is obtained. Because the generation process of training the challenge sample for training the robust enhanced machine translation model considers the synonyms of the words to be replaced and the semantic similarity of the text before and after replacement, and regularized disturbance information is added on the input text of the training source instead of directly adding the disturbance information, the training challenge sample with strong challenge is generated under the condition that the least disturbance is added on the input text of the training source, the model is trained based on the challenge sample, the robustness of the robust enhanced machine translation model is improved, the text to be translated is translated based on the robust enhanced machine translation model, and a more accurate text translation result can be obtained; meanwhile, other networks are not required to be constructed in the training process of the machine translation model with the enhanced robustness, so that the efficiency of generating training countermeasure samples is higher, and the model training efficiency is higher.

In some embodiments of the present application, the process of determining the candidate word set of the word to be replaced in the training source input text in the foregoing step S200 is described.

As shown in fig. 4, the process of determining the candidate word set of the word to be replaced in the training source input text may include:

and step S01, masking each word included in the training source input text in turn, and determining the saliency score of the masked word based on the text after masking the word after masking one word each time.

Wherein, the higher the saliency score of a word, the lower the semantic association of the word with the context is characterized.

Specifically, to determine the semantic association of a word included in the training source input text with the context, the word may be masked, for example using < UNK > masking, and then a saliency score of the masked word in the source input text is determined based on the text after masking the word.

In one possible implementation, the prominence score of the masked word segment may be derived based on a pre-trained mask language model (Mask Language Model, MLM), i.e., after masking one word segment at a time, text after masking the word segment is entered into the pre-trained MLM model to obtain the prominence score of the masked word segment output by the model. Here, the MLM model is configured to: training the training text after masking the word segmentation, which is marked with the significance score label of the masked word segmentation, as training data. Referring to fig. 5, training source input text may be expressed as: s= { S ₁ ,...,s _i ,...,s _n -wherein s _i Representing the word segmentation at the i position in the training source input text, i=1, 2.

If the application determines s _i Saliency score of (2), then s _i By using<UNK>Masking, (shown in fig. 5 as mask), then the masked word text is entered into the MLM model, where the masked word saliency score is calculated by equation (1).

Wherein L is _ws Representing masked word segment s _i Is a score of the significance of (a),

representing text after masked word segmentation, i.e.

M represents the MLM mask language model.

As shown in FIG. 5, if masked word segmentation is s _i The MLM model may output s _i A significance score w of (2); if covered and segmented into s ₁ The MLM model may output s ₁ A significance score q of (2); if covered and segmented into s _n The MLM model may output s _n Significance score e of (c), etc.

And step S02, respectively determining at least one word segment with the highest significance score in the training source input text as the word to be replaced.

In the application, the higher the saliency score of a word is, the lower the semantic association degree of the word and the context is represented, and then the word is attacked, so that the machine translation model can output an incorrect translation result. For this purpose, the present application may determine, as the words to be replaced, one or more word segments with the highest significant score in the training source input text, respectively.

For example, through calculation of an MLM model, a training source inputs a text S= [ _today_is_not_a_good_day_ ], the saliency score of the word "good" is high, and the word can be used as a word to be replaced in the method.

And S03, screening candidate words with Euclidean distance smaller than a preset distance threshold value from a preset dictionary set, and forming the candidate word set by the screened candidate words.

In the prior art, a candidate word set is obtained by adopting a K nearest neighbor algorithm, and although candidate words which are similar to the semantics of the word to be replaced can be found to a certain extent, the found candidate words can be far away from the word to be replaced.

In view of this, the present application provides a method of euclidean distance constraint to determine a candidate word set of words to be replaced, and in particular, the process may include: determining word vectors of the word to be replaced and each word contained in the dictionary set in a space with preset dimensions, calculating the Euclidean distance between the word and the word vector of the word to be replaced in the space with preset dimensions for each word contained in the dictionary set, and taking the word as the candidate word if the calculated Euclidean distance is smaller than the distance threshold.

Alternatively, the present application may employ a generic sentence encoder (Universal Sentence Encoder, USE) to obtain word vectors for the words to be replaced and for each word contained in the dictionary set in a space of a preset dimension, respectively.

Taking the space with preset dimension as k-dimensional space as an example, using word vectors of words to be replaced in the k-dimensional space

Representing, wordsWord vectors for words contained in the dictionary set in k-dimensional space>

The calculation formula of the euclidean distance between the two word vectors is expressed as follows:

where E represents the euclidean distance between two word vectors, which may represent the similarity between the two word vectors, and the smaller the W value, the more similar the two word vectors are.

According to the method and the device, after the Euclidean distance between each word contained in the dictionary set and the word vector of the word to be replaced is calculated, the candidate word set can be screened out by combining a preset distance threshold.

Alternatively, the dictionary set may be the dictionary space D.

It should be noted that the preset distance threshold may be set according to practical situations, for example, the distance threshold may be 0.8, which may have a better screening result. Of course, the distance threshold may be other, which is not specifically limited in this application.

According to the method, the device and the system, the word to be replaced is determined based on semantic association relation between the word segmentation and the context, a candidate word set which is similar to the word to be replaced as much as possible is constructed based on Euclidean distance constraint, words in the candidate word set are synonyms of the word to be replaced, and similarity of training countermeasure samples to be generated and training source input texts is guaranteed.

In some embodiments of the present application, the process of calculating the semantic similarity of the text before and after the word replacement to be replaced in the candidate word set in the foregoing step S210 is described.

In this embodiment, the semantic similarity between the text before and after replacement may be evaluated by adopting a cosine distance calculation method, where the text before replacement refers to the training source input text, and the text after replacement refers to the text after the word in the candidate word set replaces the word to be replaced in the training source input text.

Optionally, the process of calculating the semantic similarity of the text before and after the word replacement to be replaced by the word in the candidate word set may include: determining a first vector representation of the training source input text before replacement, and determining a second vector representation of the text after the word in the candidate word set replaces the word to be replaced; and calculating cosine similarity of the first vector representation and a second vector representation of the text after the word to be replaced is replaced by the word in the candidate word set, wherein the cosine similarity is used as the semantic similarity.

Taking one candidate word in the candidate word set as an example, the text (which is defined as a replacement text in the present application for convenience of subsequent description) after the candidate word replaces the word to be replaced is x= { X ₁ ，...，x _i ，...，x _n Training source input text is s= { S } ₁ ，...，s _i ，...，s _n The first vector determined in this embodiment is expressed as

The second vector of the replacement text after the candidate word has replaced the word to be replaced is expressed as +.>

The cosine similarity between the two vector representations can be calculated according to the following equation (3).

Wherein L is _cosine (S, X) represents the calculated cosine similarity.

The cosine similarity can be used as a soft constraint condition for determining the countermeasure sample, and the fluency of the sentence after the synonym replacement can be ensured based on the soft constraint condition.

In some embodiments of the present application, the foregoing step S220 is introduced to a process of calculating information divergence of the text before and after regularized disturbance information corresponding to the word in the candidate word set is added to the training source input text.

The embodiment first describes "regularized disturbance information corresponding to words in the candidate word set" so as to be convenient for those skilled in the art to understand.

Optionally, the determining process of the regularized disturbance information corresponding to the words in the candidate word set includes: generating initial disturbance information based on the words to be replaced and the words in the candidate word set, and regularizing the initial disturbance information to obtain regularized disturbance information corresponding to the words in the candidate word set.

It can be appreciated that although the words in the candidate word set are synonymous with the word to be replaced having a very high semantic similarity, there is still a difference between the two words. The method and the device can generate initial disturbance information corresponding to the words in the candidate word set by utilizing the difference between the two words, specifically calculate the difference between the word vector of the word to be replaced and the word in the candidate word set, and generate the initial disturbance information corresponding to the word in the candidate word set according to the calculated difference. Here, the word vectors of the word to be replaced and the word in the candidate word set may be obtained by using the universal sentence encoder USE, and of course, the present application may also be obtained by using other manners, which are not specifically limited herein.

Inputting the ith segmentation word s contained in the text by the training source _i To be replaced by s _i For example, the present application may employ the following formula (4) to regularize the initial disturbance information corresponding to the jth candidate word.

In the formula g _ij Representing the word s to be replaced _i Initial disturbance information corresponding to the jth candidate word, epsilon representing a preset disturbance degree, r _ij Representing the word s to be replaced _i Regularized disturbance information corresponding to the j-th candidate word.

Furthermore, the regularized disturbance information corresponding to each candidate word in the candidate word set can be respectively added to the training source input text, so that the machine translation model can respectively output text translation results of the replacement text after each candidate word respectively replaces the word to be replaced.

For example, referring to the schematic diagram of adding regularized disturbance information shown in FIG. 6, the Encoder and Decoder of FIG. 6 represent the Encoder and Decoder, respectively, of the machine translation model, { p ₁ ,p ₂ ,...,p _m First vector representation representing training source input text, { r ₁ ,r ₂ ,...,r _m The regularized disturbance information added on the training source input text is represented by { y ₁ ,y ₂ ,. yo represents the text translation result of the replacement text obtained after the candidate word output by the machine translation model replaces the word to be replaced.

Alternatively, at r ₁ ～r _m The regularized disturbance information at the position corresponding to the word to be replaced is not 0, and the specific value is related to the candidate word corresponding to the regularized disturbance information added currently. For example, the training source input text only contains one word to be replaced, and the word vector of the word to be replaced is p _i R is then ₁ ～r _m Of only r _i If the value of the candidate word is not 0 and regularized disturbance information corresponding to the j candidate word is added currently, r _i Specifically r _ij 。

In fig. 6, after regularized disturbance information is added to the training source input text, the training source input text is equivalent to inputting the candidate word to the machine translation model to replace the replacement text after replacing the word to be replaced, so that the machine translation model can output the text translation result of the replacement text.

Furthermore, the information divergence of the text before and after the regularized disturbance information corresponding to the words in the candidate word set is added to the training source input text can be calculated, wherein the text before disturbance is added as the training source input text, and the text after disturbance is added is defined as the text after disturbance.

Optionally, the process of calculating the information divergence of the text before and after the regularized disturbance information corresponding to the words in the candidate word set is added to the training source input text may include: performing distribution estimation processing according to the training source input text to obtain a first probability distribution function corresponding to the training source input text before disturbance; and carrying out distribution estimation processing according to regularized disturbance information corresponding to the words in the candidate word set and the training source input text to obtain a second probability distribution function corresponding to the disturbed text, and calculating KL divergence of the second probability distribution function and the first probability distribution function.

Specifically, the present application defines a loss function of regularized disturbance processing as follows:

L _AR (S,r _i ,·,π)＝KL(p(·|S,π)||p(·|S,r _i pi)) formula (5)

Wherein L is _AR (S,r _i And (pi) represents information divergence (in this embodiment, KL divergence specifically) of the text before and after disturbance, pi represents a group of trainable parameters in the machine translation model, KL (||·) represents a calculation method adopting KL divergence, and a specific calculation formula is shown in formula (6) and is used for measuring a distance between two probability distribution functions.

KL(P(Z)||Q(Z))＝∑ _z∈Z [P(z)log Q(z)P(z)]＝E∑ _z～P(z) [log Q(z)P(z)]Formula (6)

According to the method, distribution estimation processing is carried out according to the training source input text, and a first probability distribution function p (|S, pi) corresponding to the training source input text before disturbance is obtained; carrying out distribution estimation processing according to regularized disturbance information corresponding to words in the candidate word set and training source input text to obtain a second probability distribution function p (|S, r) corresponding to the disturbed text _i Pi) and then taking the first probability distribution function P (·|s, pi) as P (Z) in equation (6) and taking the second probability distribution function P (·|s, r) _i Pi) is used as Q (Z) in the formula (6), and the formula (6) is adopted to calculate the KL divergence of the second probability distribution function and the first probability distribution function, wherein the KL divergence is the information divergence of the regularized disturbance information front and rear texts corresponding to the words in the added candidate word set.

In some embodiments of the present application, the process of screening the candidate synonyms corresponding to the candidate word to be replaced from the candidate word set according to the semantic similarity and the information divergence in the foregoing step S230 is described.

Here, the process of screening the candidate synonyms corresponding to the candidate word to be replaced from the candidate word set according to the semantic similarity and the information divergence may include: for each word contained in the candidate word set, carrying out weighted summation on the semantic similarity of the text before and after the word is replaced by the word and the information divergence of the text before and after regularized disturbance information corresponding to the word is added to the training source input text, wherein the weighted summation value is used as a comprehensive loss value corresponding to the word; and taking at least one word with the minimum comprehensive loss value in the candidate word set as the synonym of the candidate.

Specifically, according to the semantic similarity of the text before and after the word replacement to be replaced and the information divergence of the text before and after the regularized disturbance information corresponding to the word in the candidate word set is added, an integral optimization function is provided, and the calculation formula is as follows:

L _All ＝μ ₁ *L _AR (S,r _i ,·,π)+μ ₂ *L _cosine (S, X) formula (7)

Wherein L is _All Represents the calculated integrated loss value, mu ₁ Weighting factors representing information divergence for measuring importance of information divergence in attack process, mu ₂ And the weighting factor representing the semantic similarity is used for measuring the importance degree of the semantic similarity in the attack process.

For each candidate word contained in the candidate word set, the method and the device can substitute the semantic similarity of the text to be replaced after the candidate word is replaced with the text to be replaced and the text to be input by the training source, and the information divergence of the text before and after the regularized disturbance information corresponding to the candidate word is added into a formula (7) to carry out weighted summation, so that a comprehensive loss value corresponding to the candidate word is obtained, and the comprehensive loss value reflects the comprehensive loss caused by replacing the word to be replaced with the candidate word for the text to be input by the training source.

According to the embodiment, at least one candidate word with the minimum comprehensive loss value is screened from the candidate word set of the to-be-replaced words according to the comprehensive loss corresponding to each candidate word in the candidate word set, the candidate words are synonyms of the candidates corresponding to the to-be-replaced words, and the replacement text after the to-be-replaced words are replaced is a training countermeasure sample in the application.

According to the embodiment, the semantic similarity and the information divergence can be used as constraint conditions for screening candidate synonyms, candidate synonyms are screened out from a candidate word set of the word to be replaced under the constraint conditions, and training countermeasure samples can be obtained based on the screened candidate synonyms.

As described in the foregoing step S240, the training challenge sample may be used to perform a challenge training on the pre-trained machine translation model.

It is contemplated that although the training challenge samples generated by the process of steps S200-S230 are relatively more aggressive, it is still not excluded that there may be some ineffective challenge samples. Here, an ineffective challenge sample refers to a training challenge sample that causes the pre-trained machine translation model to output the correct text translation result (i.e., output the reference translated text).

For example, the training source input text is s= [ _today_is_not_a_good_day ] ], the corresponding reference translation text is y= [ _to_day_not_is_good_day_sub _ ], the generated training challenge sample is x= [ _today_is_not_a_date_day ] ], and if the training challenge sample is input into the machine translation model, the model outputs the reference translation text Y, the training challenge sample is an invalid challenge sample; if the training challenge sample is input into the machine translation model, the model outputs a text translation result that is different from the reference translation text, e.g., the model outputs Y' = [ _jjjjd_jd_is _best _jd_day_ ] (the "good" in the training source input text is replaced with its synonym "button" and the translation result is quite different, this is a phenomenon that is easily input by non-native english language users when using translation software, and the training challenge sample generated by the present application can well mimic this phenomenon), and then the training challenge sample is a valid challenge sample.

The ineffective challenge sample has no training effect on the internal parameters of the model, and the ineffective challenge sample can be screened out before step S240, so as to improve the training speed of the machine translation model with enhanced robustness. Based on this, in some embodiments of the present application, the process of screening out ineffective challenge samples in embodiments of the present application may include: inputting the training countermeasure sample into a pre-trained machine translation model to obtain a text translation result of the training countermeasure sample output by the model, screening out the countermeasure sample with the text translation result identical to the reference translation text in the training countermeasure sample, and taking the rest of the countermeasure sample as the training countermeasure sample in the second stage training.

Here, the pre-trained machine translation model is trained with the training source input text labeled with the reference translation text label as training data.

In this embodiment, when the text translation result output by the machine translation model after the training source inputs the text is different from the reference translation text, the attack is considered to be successful, and the training challenge sample at this time is an effective challenge sample.

The effective training countermeasure sample obtained by the embodiment can be used for performing countermeasure training on the pre-trained machine translation model so as to improve the robustness of the machine translation model. Because the ineffective training countermeasure sample has no training effect on the machine translation model, the training countermeasure sample is screened out, and the training efficiency of the machine translation model trained by the rest effective training countermeasure sample is improved.

The present application also provides a challenge sample generating method, referring to fig. 7, where the challenge sample generating method provided by the present application may include the following steps:

step S300, acquiring a training source input text, and determining a candidate word set of a word to be replaced in the training source input text.

And step S310, calculating the semantic similarity of the text before and after the word replacement to be replaced of the word in the candidate word set.

Step 320, calculating information divergence of the text before and after regularized disturbance information corresponding to the words in the candidate word set is added to the training source input text.

And step S330, screening candidate synonyms corresponding to the to-be-replaced words from the candidate word set according to the semantic similarity and the information divergence, and taking the text after the candidate synonyms are replaced by the to-be-replaced words as a countermeasure sample.

The above processes correspond to the steps S200 to S230 one by one, and the detailed description of the foregoing embodiments will be referred to herein, which is not repeated.

The application provides a word-level countermeasure sample generation method, which is used for acquiring a source input text, considering that word segmentation in the training source input text, which has lower semantic association degree with a context, is easier to cause errors in the machine translation model translation of a neural network after being attacked. And then calculating the semantic similarity of the replacement text after each candidate word in the candidate word set replaces the word to be replaced and the training source input text, adding regularized disturbance information corresponding to each candidate word on the training source input text, calculating the information divergence of the text before and after disturbance, screening candidate synonyms from the candidate word set according to the semantic similarity and the information divergence, and generating a training countermeasure sample according to the semantic similarity and the information divergence.

Therefore, when the candidate word set of the word to be replaced is determined, the relevance of the upper and lower Wen Yuyi and the semantic similarity of the candidate word and the word to be replaced are considered, so that the training countermeasure sample generated later is more aggressive. Further, the regularized disturbance information is added on the training source input text instead of directly adding the disturbance information, so that the machine translation model is enabled to output a translation result corresponding to the replacement text under the condition that the smallest disturbance is added on the training source input text, and the training countermeasures are screened by taking the information divergence and the semantic similarity under the small disturbance as constraint conditions, so that the training countermeasures which are most similar to the semantics of the training source input text can be obtained under the smallest disturbance, and the aggressiveness of the training countermeasures is further improved. Meanwhile, a network model is not required to be constructed when the training countermeasure sample is generated, so that the generation efficiency of the training countermeasure sample is improved.

The whole process of generating training countermeasure samples does not need to obtain internal parameters of the model, and the method can be suitable for scenes of black box attacks; disturbance regularization processing is added for the neural machine translation model, a training countermeasure sample is generated through the disturbance as small as possible, and the completion of a text attack task and the speed of generating the training countermeasure sample are ensured to a certain extent; the Euclidean distance constraint is added for the candidate word set of the word to be replaced, firstly, a universal sentence encoder USE is adopted to obtain vector representation of the word in the embedded space, the candidate word set which is as similar as possible to the word to be replaced is constructed, and the similarity of the training countermeasure sample and the training source input text is ensured; aiming at the phase of generating the antagonism sample, soft constraint of hard constraint and semantic similarity for disturbance regularization processing is added, so that smooth proceeding of attack is ensured, and fluency of training antagonism sample generated after the attack is also ensured.

In some embodiments of the present application, as described above, a method for enhancing robustness of a machine translation model is further provided, as shown in fig. 8, where the method may include:

step S400, generating the training challenge sample by adopting a challenge sample generation method.

Specifically, the present application may employ the challenge sample generating method provided in any of the foregoing embodiments to generate a challenge sample.

And step S410, performing resistance training on the pre-trained machine translation model by taking the training resistance sample marked with the reference translation text label as training data to obtain a machine translation model with enhanced robustness.

The pre-trained machine translation model is obtained by training the training source input text marked with the reference translation text label as training data.

In the embodiment, the generated training countermeasure sample is used as input and is input into the pre-trained Seq2Seq model architecture again, and the correct translation of the training countermeasure sample is completed through the self-attention mechanism-based encoder-decoder model, so that the robustness of the machine translation model is improved.

The translation device provided in the embodiments of the present application will be described below, and the translation device described below and the translation method described above may be referred to correspondingly.

Referring to fig. 9, fig. 9 is a schematic structural diagram of a translation device according to an embodiment of the present application.

As shown in fig. 9, the apparatus may include:

a text to be translated obtaining unit 11, configured to obtain a text to be translated;

The model translation unit 12 is configured to translate the text to be translated by using a pre-trained robust machine translation model, so as to obtain a text translation result corresponding to the text to be translated;

Optionally, the challenge sample generating device provided in the embodiment of the present application may further include:

the training countermeasure sample input unit is used for inputting the training countermeasure sample into a machine translation model after first-stage training to obtain a text translation result of the training countermeasure sample output by the model, wherein the machine translation model after first-stage training is obtained by training with the training source input text marked with a reference translation text label as training data;

and the challenge sample screening unit is used for screening out the challenge samples with the text translation results identical to the reference translation text in the training challenge samples, and the rest challenge samples are used as training challenge samples in the second stage of training.

Optionally, the process of determining the candidate word set of the word to be replaced in the training source input text by the model translation unit may include:

masking each word included in the text input by the training source in turn, inputting the text after masking each word into a pre-trained mask language model after masking each word, and obtaining a saliency score of the masked word output by the model, wherein the higher the saliency score of one word is, the lower the semantic association degree of the word and the context is represented, and the mask language model is obtained by training the training text after masking the word with the saliency score label of the masked word as training data;

Respectively determining at least one word segment with the highest significance score in the training source input text as the word to be replaced;

screening candidate words with Euclidean distance smaller than a preset distance threshold value from a preset dictionary set, and forming the candidate word set by the screened candidate words.

Optionally, the process of calculating the semantic similarity of the text before and after the word replacement to be replaced by the word in the candidate word set by the model translation unit may include:

determining a first vector representation of the training source input text before replacement, and determining a second vector representation of the text after the word in the candidate word set replaces the word to be replaced;

and calculating cosine similarity of the first vector representation and a second vector representation of the text after the word to be replaced is replaced by the word in the candidate word set, wherein the cosine similarity is used as the semantic similarity.

Optionally, the process of calculating, by the model translation unit, the information divergence of the text before and after regularized disturbance information corresponding to the word in the candidate word set is added to the training source input text may include:

performing distribution estimation processing according to the training source input text to obtain a first probability distribution function corresponding to the training source input text before disturbance;

And carrying out distribution estimation processing according to regularized disturbance information corresponding to the words in the candidate word set and the training source input text to obtain a second probability distribution function corresponding to the disturbed text, and calculating KL divergence of the second probability distribution function and the first probability distribution function.

Optionally, the process of the model translation unit screening the candidate synonyms corresponding to the candidate word set from the candidate word set according to the semantic similarity and the information divergence may include:

for each word contained in the candidate word set, carrying out weighted summation on the semantic similarity of the text before and after the word is replaced by the word and the information divergence of the text before and after regularized disturbance information corresponding to the word is added to the training source input text, wherein the weighted summation value is used as a comprehensive loss value corresponding to the word;

and taking at least one word with the minimum comprehensive loss value in the candidate word set as the synonym of the candidate.

Also provided in embodiments of the present application is an challenge sample generating device, as shown with reference to fig. 10, which may include:

a candidate word set determining unit 21, configured to obtain a training source input text, and determine a candidate word set of a word to be replaced in the training source input text, where the word to be replaced is a word segment in the training source input text, the semantic association degree with a context of the word is lower than a preset association degree threshold, and the semantic similarity between the word in the candidate word set and the word to be replaced is higher than a preset similarity degree threshold;

A semantic similarity calculating unit 22, configured to calculate semantic similarity of text before and after the word in the candidate word set replaces the word to be replaced;

an information divergence calculating unit 23, configured to calculate information divergences of the text before and after regularized disturbance information corresponding to the word in the candidate word set is added to the training source input text;

and the countermeasure sample determining unit 24 is configured to screen candidate synonyms corresponding to the candidate word to be replaced from the candidate word set according to the semantic similarity and the information divergence, and use a text after the candidate synonyms replace the candidate word as a training countermeasure sample.

The embodiment of the present application further provides a machine translation model robustness enhancing device, referring to fig. 11, the machine translation model robustness enhancing device may include:

an challenge sample generation unit 31 for generating the training challenge sample using the challenge sample generation method of any one of the above;

the model training unit 32 is configured to perform an antagonistic training on the machine translation model trained in the first stage by using the training countermeasure sample labeled with the reference translation text label as training data, so as to obtain a machine translation model with enhanced robustness, where the machine translation model trained in the first stage is obtained by training the training source input text labeled with the reference translation text label as training data.

The translation device or the challenge sample generation device or the machine translation model robustness enhancement device provided by the embodiment of the application can be applied to electronic equipment, such as a terminal: cell phones, computers, etc. Alternatively, fig. 12 shows a block diagram of a hardware structure of the electronic device, and referring to fig. 12, the hardware structure of the electronic device may include: at least one processor 1, at least one communication interface 2, at least one memory 3 and at least one communication bus 4;

in the embodiment of the application, the number of the processor 1, the communication interface 2, the memory 3 and the communication bus 4 is at least one, and the processor 1, the communication interface 2 and the memory 3 complete communication with each other through the communication bus 4;

processor 1 may be a central processing unit CPU, or a specific integrated circuit ASIC (Application Specific Integrated Circuit), or one or more integrated circuits configured to implement embodiments of the present invention, etc.;

the memory 3 may comprise a high-speed RAM memory, and may further comprise a non-volatile memory (non-volatile memory) or the like, such as at least one magnetic disk memory;

wherein the memory stores a program, and the processor may invoke the program stored in the memory for implementing the steps of the foregoing translation method, or implementing the steps of the challenge sample generation method, or implementing the steps of the machine translation model robustness enhancement method.

Alternatively, the refinement function and the extension function of the program may be described with reference to the above.

The embodiment of the present application also provides a storage medium, which may store a program adapted for execution by a processor, for implementing the steps of the foregoing translation method, or implementing the steps of the challenge sample generation method, or implementing the steps of the machine translation model robustness enhancement method.

Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In the present specification, each embodiment is described in a progressive manner, and each embodiment focuses on the difference from other embodiments, and may be combined according to needs, and the same similar parts may be referred to each other.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method of translation, comprising:

acquiring a text to be translated;

translating the text to be translated by using a pre-trained robust machine translation model to obtain a text translation result corresponding to the text to be translated;

2. The method of claim 1, wherein the determining the candidate word set for the word to be replaced in the training source input text comprises:

3. The method of claim 1, wherein the calculating semantic similarity of text before and after the word in the candidate word set replaces the word to be replaced comprises:

4. The method of claim 1, wherein the calculating the information divergence of the text before and after the regularized perturbation information corresponding to the word in the candidate word set is added to the training source input text comprises:

5. The method according to claim 1, wherein the screening the synonyms of the candidates corresponding to the candidate word set from the candidate word set according to the semantic similarity and the information divergence comprises:

6. The method of claim 1, further comprising, prior to said training challenge sample labeled with said reference translation text label as training data to challenge a pre-trained machine translation model:

Inputting the training countermeasure sample into a pre-trained machine translation model to obtain a text translation result of the training countermeasure sample output by the model, wherein the pre-trained machine translation model is obtained by training the training source input text marked with the reference translation text label as training data;

screening out the countermeasure samples with the text translation results identical to the reference translation text in the training countermeasure samples, and taking the rest of the countermeasure samples as training countermeasure samples in the second stage of training.

7. A method of challenge sample generation, comprising:

8. A method for enhancing robustness of a machine translation model, comprising:

generating the training challenge sample using the method of claim 7;

and performing resistance training on the pre-trained machine translation model by taking the training resistance sample marked with the reference translation text label as training data to obtain a machine translation model with enhanced robustness, wherein the pre-trained machine translation model is obtained by training by taking the training source input text marked with the reference translation text label as training data.

9. A translation apparatus, comprising:

the text to be translated obtaining unit is used for obtaining the text to be translated;

the model translation unit is used for translating the text to be translated by utilizing a pre-trained robust machine translation model to obtain a text translation result corresponding to the text to be translated;

10. An electronic device, comprising: a memory and a processor;

The memory is used for storing programs;

the processor is configured to execute the program, implement the respective steps of the translation method as set forth in any one of claims 1 to 6, or implement the respective steps of the challenge sample generation method as set forth in claim 7, or implement the respective steps of the machine translation model robustness enhancement method as set forth in claim 8.