CN112329481B

CN112329481B - Training method of multi-language machine translation model for relieving language-to-difference conflict

Info

Publication number: CN112329481B
Application number: CN202011167339.2A
Authority: CN
Inventors: 苏劲松; 周楚伦; 刘鑫; 王鸿吉
Original assignee: Xiamen University
Current assignee: Xiamen University
Priority date: 2020-10-27
Filing date: 2020-10-27
Publication date: 2022-07-19
Anticipated expiration: 2040-10-27
Also published as: CN112329481A

Abstract

The invention provides a method for training a multi-language machine translation model for relieving language-to-difference conflicts, which comprises the following steps: obtaining a training corpus, wherein the training corpus comprises a plurality of language pairs; establishing a multi-language machine translation model, and training the multi-language machine translation model according to each language of the training corpus; in the training process, calculating the corresponding derivatives of all language pairs in the training corpus, and performing conflict adjustment on the corresponding derivatives of any two language pairs to obtain the corresponding derivatives of all adjusted language pairs; updating parameters of the multi-language machine translation model according to the adjusted corresponding derivatives of all the languages so as to obtain a trained multi-language machine translation model; therefore, the problem of derivative conflict of the training examples of different language pairs on model parameter updating is solved by carrying out conflict adjustment on corresponding derivatives of any two language pairs, and the overall effect of the multi-language machine translation model on a plurality of language pairs is improved.

Description

Training method of multi-language machine translation model for relieving language-to-difference conflict

Technical Field

The invention relates to the technical field of machine translation, in particular to a training method of a multi-language machine translation model for relieving language-to-difference conflicts and a computer-readable storage medium.

Background

In the related art, the multilingual machine translation aims at constructing a model capable of simultaneously translating among a plurality of language pairs, and compared with a bilingual machine translation model, the multilingual machine translation has the main advantages that the problems that in a practical scene, a large number of bilingual translation models need to be deployed on line and maintained due to the existence of a plurality of different language pairs are greatly reduced; in addition, the multi-language machine translation model also enables transfer learning between multi-language pairs, and the translation effect between low-resource and even zero-resource language pairs can greatly exceed that of a common bilingual translation model trained on a single language pair; because the multi-language translation model needs to be capable of translating between a plurality of language pairs, and the distribution of different languages has respective characteristics, in the training process, the problem of derivative conflict exists in the updating of the shared parameters of the multi-language translation model by the parallel corpora from different language pairs, and the overall effect of the multi-language machine translation model on the plurality of language pairs is greatly reduced due to the derivative conflict.

Disclosure of Invention

The present invention is directed to solving, at least to some extent, one of the technical problems in the art described above. Therefore, one objective of the present invention is to provide a training method for a multi-language machine translation model, which reduces the problem of derivative conflict of training examples of different language pairs on model parameter update by performing conflict adjustment on corresponding derivatives of any two language pairs, thereby improving the overall effect of the multi-language machine translation model on multiple language pairs.

A second object of the invention is to propose a computer-readable storage medium.

In order to achieve the above object, an embodiment of a first aspect of the present invention provides a method for training a multilingual machine translation model, including the following steps: obtaining a training corpus, wherein the training corpus comprises a plurality of language pairs; establishing a multi-language machine translation model, and training the multi-language machine translation model according to each language of the training corpus; in the training process, calculating the derivatives corresponding to all the language pairs in the training corpus, and performing conflict adjustment on the derivatives corresponding to any two language pairs to obtain the derivatives corresponding to all the adjusted language pairs; and updating the parameters of the multi-language machine translation model according to the corresponding derivatives of all the adjusted language pairs to obtain a trained multi-language machine translation model.

According to the training method of the multi-language machine translation model, a training corpus is obtained firstly, wherein the training corpus comprises a plurality of language pairs, then the multi-language machine translation model is established, the multi-language machine translation model is trained according to each language pair of the training corpus, in the training process, derivatives corresponding to all the language pairs in the training corpus are calculated, conflict adjustment is carried out on the derivatives corresponding to any two languages, the derivatives corresponding to all the adjusted language pairs are obtained, parameters of the multi-language machine translation model are updated according to the derivatives corresponding to all the adjusted language pairs, and the trained multi-language machine translation model is obtained; therefore, the problem of derivative conflict of the training examples of different language pairs on model parameter updating is solved by carrying out conflict adjustment on corresponding derivatives of any two language pairs, and the overall effect of the multi-language machine translation model on a plurality of language pairs is improved.

In addition, the training method of the multi-language machine translation model proposed by the above embodiment of the present invention may also have the following additional technical features:

optionally, performing conflict adjustment on the corresponding derivatives of any two language pairs to obtain the corresponding derivatives of all the adjusted language pairs, including: calculating cosine similarity between derivatives corresponding to any two language pairs; judging whether a conflict relation exists between derivatives corresponding to the two language pairs or not according to the cosine similarity; if a conflict relationship exists, projecting any one derivative of the derivatives corresponding to any two language pairs onto an orthogonal plane of the other derivative to obtain a projected derivative of the any one derivative, and replacing the projected derivative with the any one derivative to complete conflict adjustment of the derivatives corresponding to any two language pairs; if no conflict relationship exists, no conflict adjustment is performed.

Optionally, judging whether a conflict exists between corresponding derivatives of the two arbitrary language pairs according to the cosine similarity includes: if the cosine similarity value is a negative number, judging that a conflict relation exists between derivatives corresponding to any two language pairs; and if the cosine similarity value is a non-negative number, judging that no conflict relation exists between the derivatives corresponding to any two language pairs.

Optionally, the post-projection derivative is obtained by the following formula:

wherein, g'_lFor the derivative of the l-th language pair after projection, g_lDerivative of the l-th language pair, g_l′Is the derivative of the l' th language pair.

Optionally, in the training process, the learning rate of the parameters of the multi-language machine translation model is adaptively adjusted according to the direction similarity and the amplitude similarity between the derivatives corresponding to any two language pairs, so that the parameters of the multi-language machine translation model are updated according to the adjusted learning rate and the derivatives corresponding to all the language pairs after adjustment, and a trained multi-language machine translation model is obtained.

Optionally, adaptively adjusting the learning rate of the multi-language machine translation model parameter according to the direction similarity and the amplitude similarity between corresponding derivatives of any two languages, including: acquiring corresponding derivatives of any two language pairs in all the language pairs to calculate direction similarity and amplitude similarity between the corresponding derivatives of the any two language pairs; calculating final similarity between the derivatives corresponding to any two language pairs according to the direction similarity and the amplitude similarity between the derivatives corresponding to any two language pairs, and calculating average similarity of all language pairs according to the final similarity of the derivatives corresponding to any two language pairs in all language pairs; and adaptively adjusting the learning rate of the parameters of the multi-language machine translation model according to the average similarity of all the language pairs.

Optionally, the direction similarity and magnitude similarity between the corresponding derivatives of the arbitrary two language pairs are calculated according to the following formulas:

wherein ds_ll′Is the direction similarity, ms_ll′To the amplitude similarity, g_lDerivative of the l-th language pair, g_l′Is the derivative of the l' th language pair,cos_sim(g_l,g_l′) Is the cosine similarity between the derivative of the l-th language pair and the derivative of the l' th language pair, | · | | purple₂Representing the norm of L2.

Optionally, the final similarity between the corresponding derivatives of any two language pairs and the average similarity of all language pairs are calculated according to the following formulas:

wherein s is_ll′As derivative g of the l-th language pair_lAnd the derivative g for the l' th language pair_l′Final similarity between them, s_iterThere are a total of L language pairs for the average similarity of the iter-th training step for all language pairs.

Optionally, the learning rate of the parameters of the multi-language machine translation model is adaptively adjusted according to the following formula:

lr′_iter＝s_iter×lr_iter

wherein, l 'r'_iterFor the adjusted learning rate of the current model parameters, lr_iterFor the learning rate of the model parameters before adjustment, iter represents the iter-th training step.

In order to achieve the above object, a second aspect of the present invention provides a computer-readable storage medium, on which a training program for a multi-language machine translation model is stored, and when the training program for the multi-language machine translation model is executed by a processor, the training method for the multi-language machine translation model is implemented.

According to the computer readable storage medium of the embodiment of the invention, the training program of the multi-language machine translation model is stored, so that the training program of the multi-language machine translation model is executed by the processor to realize the training method of the multi-language machine translation model, and therefore, the problem of derivative conflict of the training examples of different language pairs on model parameter updating is reduced by performing conflict adjustment on corresponding derivatives of any two language pairs, and the overall effect of the multi-language machine translation model on a plurality of language pairs is improved.

Drawings

FIG. 1 is a flow diagram illustrating a method for training a multi-lingual machine translation model according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a PCGrad derivative projection process according to an embodiment of the invention;

fig. 3 is a typical schematic diagram of the relationship between any two derivatives according to an embodiment of the present invention, where subgraphs (a) and (b) depict the extreme case of theoretically great difference in direction or amplitude between the two derivatives, and subgraphs (c) and (d) represent the two general cases of no collision and collision between the derivatives, respectively.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.

In order to better understand the above technical solutions, exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.

In order to better understand the technical solution, the technical solution will be described in detail with reference to the drawings and the specific embodiments.

FIG. 1 is a flowchart illustrating a method for training a multi-language machine translation model according to an embodiment of the present invention, as shown in FIG. 1, the method for training a multi-language machine translation model according to an embodiment of the present invention includes the following steps:

step 101, obtaining a corpus, wherein the corpus includes a plurality of language pairs.

It is noted that each language pair is made up of a plurality of parallel sentence pairs, each language pair including a source language sentence and a corresponding target language sentence.

That is, all the language pairs of the parallel sentence pairs constitute a total training corpus D ═ D¹∪D²∪…∪D^lWherein

All pairs of parallel sentences, x, representing the l-th language pair^lSource language sentences representing the l-th language pair, y^lA target language sentence representing the l-th language pair.

Step 102, establishing a multi-language machine translation model, and training the multi-language machine translation model according to each language of the training corpus.

It should be noted that the above-mentioned multilingual machine translation model may use a transform [ Vaswani et al, neuroips 2017] as a basic model structure, which is not specifically limited in the present invention.

As an embodiment, to distinguish parallel sentence pairs in different languages, we add a language tag in the target language before the source language sentence of each parallel sentence pair; the parameter theta is updated by adopting small-batch training on the multi-language translation model, and a Negative Log-likelihood (Negative Log-likelihood) function is adopted as an overall optimized objective function:

where | V | represents the dictionary size, | y | represents the length of the target language sentence, 1{. is an indicative function of when the internal condition holds a value of 1 and when the internal condition holds a value of 0, y_tRepresenting the t-th word in the output target language sentence.

Step 103, in the training process, calculating the derivatives corresponding to all language pairs in the training corpus, and performing conflict adjustment on the derivatives corresponding to any two language pairs to obtain the derivatives corresponding to all the adjusted language pairs.

It should be noted that, the derivatives corresponding to all the language pairs in the corpus can be calculated through the objective function.

As an example, cosine similarity between corresponding derivatives of any two language pairs is calculated; if the cosine similarity value is a negative number, judging that a conflict relation exists between corresponding derivatives of any two languages; if the cosine similarity value is a non-negative number, judging that no conflict relation exists between corresponding derivatives of any two languages; if a conflict relationship exists, projecting any one derivative of the corresponding derivatives of any two languages to an orthogonal plane of the other derivative to obtain a projected derivative of any one derivative, and replacing any one derivative with the projected derivative to finish conflict adjustment of the corresponding derivatives of any two languages; if no conflict relationship exists, no conflict adjustment is performed.

As a specific embodiment, in the process of training a multi-language machine translation model in a small batch, randomly sampling a fixed total number of parallel sentence pairs from the whole training corpus D each time to form a small batch B which is B¹∪B²∪…∪B^LIn which B is^lAll parallel sentence pairs belonging to the ith language pair in the currently sampled parallel sentence pairs are included; let g be_lAnd g_l′Respectively representing the calculated derivatives of the training examples of any two different language pairs in the B to the objective function, and carrying out the following specific steps on the derivatives from the different language pairs in the training process of the multi-language model by utilizing a PCGrad algorithm:

first, g is calculated_lAnd g_l′Cosine similarity between them cos _ sim (g)_′,g_l′) (ii) a As shown in FIG. 2, e.g.Fruit cos _ sim (g)_l,g_l′) If the value of (b) is negative, g is judged_lAnd g_l′If there is a conflict relationship between them, the derivative g from the l language pair is used_lProjection onto derivative g_l′The formula is as follows:

wherein, g'_lRepresents g_lDerivative after projection.

Note that the derivative g 'after projection'_lNo longer has a negative conflicting impact on the l' th language pair when the model parameters are updated; if cos _ sim (g)_l,g_l′) If not negative, the original g is maintained_l。

And then, repeatedly executing the steps for any two different language pairs to finally obtain the derivative of each language pair after the conflict between each language pair and all other language pairs is solved.

It should be noted that through the above steps, the negative influence of the derivative from each language pair on the derivatives of other language pairs is mitigated, so that the multilingual translation model is better optimized jointly, thereby achieving better overall performance over multiple language pairs.

And step 104, updating parameters of the multi-language machine translation model according to the corresponding derivatives of all the adjusted languages so as to obtain the trained multi-language machine translation model.

In addition, in the training process, the learning rate of the parameters of the multi-language machine translation model is adaptively adjusted according to the direction similarity and the amplitude similarity between the corresponding derivatives of any two languages, so that the parameters of the multi-language machine translation model are updated according to the adjusted learning rate and the corresponding derivatives of all the adjusted languages, and the trained multi-language machine translation model is obtained.

As an example, the corresponding derivatives of any two language pairs in all language pairs are first obtained to calculate the directional similarity and the magnitude similarity between the corresponding derivatives of any two language pairs:

wherein ds_ll′Is the direction similarity, ms_ll′To the amplitude similarity, g_lDerivative of the l-th language pair, g_l′Derivative of the l' th language pair, cos _ sim (g)_l,g_l′) Is the cosine similarity between the derivative of the l-th language pair and the derivative of the l' th language pair, | · | | purple₂Representing the norm of L2.

It is noted that the value is [ -1,1 [ ]]Cosine similarity cos _ sim (g) of (1)_l,g_l′) Is mapped to [0,1 ]]Interval, ds_ll′A larger value of (d) means that the direction between the two derivatives is more similar; ms is_ll′Is also a real number between 0 and 1.

Then, calculating final similarity between the derivatives corresponding to any two language pairs according to the direction similarity and the amplitude similarity between the derivatives corresponding to any two language pairs, and calculating average similarity of all language pairs according to the final similarity of the derivatives corresponding to any two language pairs in all language pairs:

wherein s is_ll′As derivative g of the l-th language pair_lAnd is the derivative g of the l' th language pair_l′Final similarity between them, s_iterAverage of the iter training steps for all language pairsSimilarity, there are a total of L language pairs.

Note that s_ll′Is also in the value range of [0,1 ]]The closer the value is to 1, the more g means_lAnd g_l′The more similar, both in terms of direction and magnitude; considering that there are derivatives from L different language pairs, there is a total

Derivative pairs of different sources, so that at the iter training step, the average similarity of all derivative pairs of different sources is s_iterAt the current training step, s_iterA larger value of (c) represents a higher degree of similarity between the derivative pairs.

As a specific example, four representative cases that may exist between two derivatives and the corresponding similarity values ds are given, as shown in FIG. 3_ll′，ms_ll′And s_ll′(ii) a Sub-graphs (a) and (b) depict the extreme case of theoretically large differences in direction or amplitude between the two derivatives, while sub-graphs (c) and (d) represent the two general cases of no and no collision between the derivatives, respectively.

And finally, carrying out self-adaptive adjustment on the learning rate of the multi-language machine translation model parameters according to the average similarity of all language pairs:

lr′_iter＝s_iter×lr_iter

It should be noted that, in the joint optimization of model shared parameters, the more similar the derivatives from different sources mean that the model can perform parameter update along the current direction with a greater learning rate; on the contrary, the learning rate of the current model should be correspondingly reduced; considering the extreme case of conflict between two derivatives, when the two derivatives are identical in length but opposite in direction, they cancel each other; when the two derivatives have the same direction but the module length is greatly different, the great difference of the two derivatives in the updating amplitude can cause the incoordination of the joint optimization of the model; therefore, a learning rate self-adaptive adjustment mechanism is provided on the basis of the original PCGrad algorithm, and the problem is further solved.

Finally, according to the training method of the multi-language machine translation model of the embodiment of the invention, firstly, a training corpus is obtained, wherein the training corpus comprises a plurality of language pairs, then the multi-language machine translation model is established, the multi-language machine translation model is trained according to each language pair of the training corpus, in the training process, the derivatives of all the language pairs in the training corpus are calculated, the corresponding derivatives of any two languages are subjected to conflict adjustment, so that the corresponding derivatives of all the adjusted language pairs are obtained, and the parameters of the multi-language machine translation model are updated according to the corresponding derivatives of all the adjusted language pairs, so that the trained multi-language machine translation model is obtained; therefore, the problem of derivative conflict of the training examples of different language pairs on model parameter updating is solved by carrying out conflict adjustment on corresponding derivatives of any two language pairs, and the overall effect of the multi-language machine translation model on a plurality of language pairs is improved.

In addition, an embodiment of the present invention further provides a computer-readable storage medium, on which a training program of a multi-language machine translation model is stored, and when the training program of the multi-language machine translation model is executed by a processor, the method for training the multi-language machine translation model as described above is implemented.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It should be noted that in the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

In the description of the present invention, it is to be understood that the terms "first", "second" and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.

In the present invention, unless otherwise expressly stated or limited, the terms "mounted," "connected," "secured," and the like are to be construed broadly and can, for example, be fixedly connected, detachably connected, or integrally formed; can be mechanically or electrically connected; either directly or indirectly through intervening media, either internally or in any other relationship. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.

In the present invention, unless otherwise expressly stated or limited, the first feature "on" or "under" the second feature may be directly contacting the first and second features or indirectly contacting the first and second features through an intermediate. Also, a first feature "on," "over," and "above" a second feature may be directly or diagonally above the second feature, or may simply indicate that the first feature is at a higher level than the second feature. A first feature "under," "beneath," and "under" a second feature may be directly under or obliquely under the second feature, or may simply mean that the first feature is at a lesser elevation than the second feature.

In the description of the specification, reference to the description of "one embodiment," "some embodiments," "an example," "a specific example," or "some examples" or the like means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above should not be understood to necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Although embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are exemplary and not to be construed as limiting the present invention, and that changes, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims

1. A method for training a multi-language machine translation model for mitigating language-to-difference conflicts, comprising the steps of:

obtaining a training corpus, wherein the training corpus comprises a plurality of language pairs;

establishing a multi-language machine translation model, and training the multi-language machine translation model according to each language of the training corpus;

in the training process, calculating the corresponding derivatives of all language pairs in the training corpus, and performing conflict adjustment on the corresponding derivatives of any two language pairs to obtain the corresponding derivatives of all adjusted language pairs;

updating the parameters of the multi-language machine translation model according to the corresponding derivatives of all the adjusted language pairs to obtain a trained multi-language machine translation model;

wherein, performing conflict adjustment on the corresponding derivatives of any two language pairs to obtain the corresponding derivatives of all the adjusted language pairs includes:

calculating cosine similarity between derivatives corresponding to any two language pairs;

judging whether a conflict relation exists between derivatives corresponding to any two language pairs according to the cosine similarity;

if a conflict relationship exists, projecting any one derivative of the derivatives corresponding to any two language pairs onto an orthogonal plane of the other derivative to obtain a projected derivative of the any one derivative, and replacing the projected derivative with the any one derivative to complete conflict adjustment of the derivatives corresponding to any two language pairs;

if no conflict relation exists, no conflict adjustment is carried out;

wherein, judging whether a conflict exists between the derivatives corresponding to the two language pairs according to the cosine similarity comprises:

if the cosine similarity value is a negative number, judging that a conflict relation exists between derivatives corresponding to any two language pairs;

if the cosine similarity value is a non-negative number, judging that no conflict relation exists between derivatives corresponding to any two language pairs;

wherein the post-projection derivative is obtained by the following formula:

wherein, g'_lFor the derivative of the l-th language pair after projection, g_lDerivative of the l-th language pair, g_l′Is the derivative of the l' th language pair;

the method is characterized in that the learning rate of the parameters of the multi-language machine translation model is adaptively adjusted according to the direction similarity and the amplitude similarity between corresponding derivatives of any two language pairs in the training process, so that the parameters of the multi-language machine translation model are updated according to the adjusted learning rate and the adjusted derivatives corresponding to all language pairs, and a trained multi-language machine translation model is obtained.

2. The method for training the multi-language machine translation model for mitigating language-pair difference conflicts of claim 1, wherein adaptively adjusting the learning rate of the parameters of the multi-language machine translation model according to the direction similarity and the magnitude similarity between the corresponding derivatives of any two language pairs comprises:

acquiring corresponding derivatives of any two language pairs in all the language pairs to calculate direction similarity and amplitude similarity between the corresponding derivatives of any two language pairs;

calculating final similarity between the derivatives corresponding to any two language pairs according to the direction similarity and the amplitude similarity between the derivatives corresponding to any two language pairs, and calculating average similarity of all language pairs according to the final similarity of the derivatives corresponding to any two language pairs in all language pairs;

and adaptively adjusting the learning rate of the parameters of the multi-language machine translation model according to the average similarity of all the language pairs.

3. The method of claim 2, wherein the directional similarity and the magnitude similarity between the corresponding derivatives of any two language pairs are calculated according to the following formulas:

wherein ds_ll′Is the direction similarity, ms_ll′For amplitude similarity, g_lDerivative of the l-th language pair, g_l′Derivative of the l' th language pair, cos _ sim (g)_l,g_l′) Is the cosine similarity between the derivative of the l-th language pair and the derivative of the l' th language pair, | · | | purple₂Representing the L2 norm.

4. The method of claim 3, wherein the final similarity between the corresponding derivatives of any two language pairs and the average similarity of all language pairs are calculated according to the following formulas:

wherein s is_ll′As derivative g of the l-th language pair_lAnd the derivative g for the l' th language pair_l′Final similarity between them, s_iterFor all that isThe average similarity of the iter-th training step of a language pair, for a total of L language pairs.

5. The method for training the multi-lingual machine translation model to mitigate language-to-difference conflicts of claim 4, wherein the learning rate of the parameters of the multi-lingual machine translation model is adaptively adjusted according to the following formula:

lr′_iter＝s_iter×lr_iter

wherein, lr'_iterFor the learning rate of the adjusted current model parameters, lr_iterFor the learning rate of the model parameters before adjustment, iter represents the third training step.

6. A computer-readable storage medium, having stored thereon a training program of a multi-language machine translation model for mitigating language-to-difference conflicts, which, when executed by a processor, implements the method of training of a multi-language machine translation model for mitigating language-to-difference conflicts according to any one of claims 1 to 5.