CN112329481B - Training method of multi-language machine translation model for relieving language-to-difference conflict - Google Patents

Training method of multi-language machine translation model for relieving language-to-difference conflict Download PDF

Info

Publication number
CN112329481B
CN112329481B CN202011167339.2A CN202011167339A CN112329481B CN 112329481 B CN112329481 B CN 112329481B CN 202011167339 A CN202011167339 A CN 202011167339A CN 112329481 B CN112329481 B CN 112329481B
Authority
CN
China
Prior art keywords
language
pairs
similarity
derivatives
machine translation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011167339.2A
Other languages
Chinese (zh)
Other versions
CN112329481A (en
Inventor
苏劲松
周楚伦
刘鑫
王鸿吉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen University
Original Assignee
Xiamen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen University filed Critical Xiamen University
Priority to CN202011167339.2A priority Critical patent/CN112329481B/en
Publication of CN112329481A publication Critical patent/CN112329481A/en
Application granted granted Critical
Publication of CN112329481B publication Critical patent/CN112329481B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides a method for training a multi-language machine translation model for relieving language-to-difference conflicts, which comprises the following steps: obtaining a training corpus, wherein the training corpus comprises a plurality of language pairs; establishing a multi-language machine translation model, and training the multi-language machine translation model according to each language of the training corpus; in the training process, calculating the corresponding derivatives of all language pairs in the training corpus, and performing conflict adjustment on the corresponding derivatives of any two language pairs to obtain the corresponding derivatives of all adjusted language pairs; updating parameters of the multi-language machine translation model according to the adjusted corresponding derivatives of all the languages so as to obtain a trained multi-language machine translation model; therefore, the problem of derivative conflict of the training examples of different language pairs on model parameter updating is solved by carrying out conflict adjustment on corresponding derivatives of any two language pairs, and the overall effect of the multi-language machine translation model on a plurality of language pairs is improved.

Description

Training method of multi-language machine translation model for relieving language-to-difference conflict
Technical Field
The invention relates to the technical field of machine translation, in particular to a training method of a multi-language machine translation model for relieving language-to-difference conflicts and a computer-readable storage medium.
Background
In the related art, the multilingual machine translation aims at constructing a model capable of simultaneously translating among a plurality of language pairs, and compared with a bilingual machine translation model, the multilingual machine translation has the main advantages that the problems that in a practical scene, a large number of bilingual translation models need to be deployed on line and maintained due to the existence of a plurality of different language pairs are greatly reduced; in addition, the multi-language machine translation model also enables transfer learning between multi-language pairs, and the translation effect between low-resource and even zero-resource language pairs can greatly exceed that of a common bilingual translation model trained on a single language pair; because the multi-language translation model needs to be capable of translating between a plurality of language pairs, and the distribution of different languages has respective characteristics, in the training process, the problem of derivative conflict exists in the updating of the shared parameters of the multi-language translation model by the parallel corpora from different language pairs, and the overall effect of the multi-language machine translation model on the plurality of language pairs is greatly reduced due to the derivative conflict.
Disclosure of Invention
The present invention is directed to solving, at least to some extent, one of the technical problems in the art described above. Therefore, one objective of the present invention is to provide a training method for a multi-language machine translation model, which reduces the problem of derivative conflict of training examples of different language pairs on model parameter update by performing conflict adjustment on corresponding derivatives of any two language pairs, thereby improving the overall effect of the multi-language machine translation model on multiple language pairs.
A second object of the invention is to propose a computer-readable storage medium.
In order to achieve the above object, an embodiment of a first aspect of the present invention provides a method for training a multilingual machine translation model, including the following steps: obtaining a training corpus, wherein the training corpus comprises a plurality of language pairs; establishing a multi-language machine translation model, and training the multi-language machine translation model according to each language of the training corpus; in the training process, calculating the derivatives corresponding to all the language pairs in the training corpus, and performing conflict adjustment on the derivatives corresponding to any two language pairs to obtain the derivatives corresponding to all the adjusted language pairs; and updating the parameters of the multi-language machine translation model according to the corresponding derivatives of all the adjusted language pairs to obtain a trained multi-language machine translation model.
According to the training method of the multi-language machine translation model, a training corpus is obtained firstly, wherein the training corpus comprises a plurality of language pairs, then the multi-language machine translation model is established, the multi-language machine translation model is trained according to each language pair of the training corpus, in the training process, derivatives corresponding to all the language pairs in the training corpus are calculated, conflict adjustment is carried out on the derivatives corresponding to any two languages, the derivatives corresponding to all the adjusted language pairs are obtained, parameters of the multi-language machine translation model are updated according to the derivatives corresponding to all the adjusted language pairs, and the trained multi-language machine translation model is obtained; therefore, the problem of derivative conflict of the training examples of different language pairs on model parameter updating is solved by carrying out conflict adjustment on corresponding derivatives of any two language pairs, and the overall effect of the multi-language machine translation model on a plurality of language pairs is improved.
In addition, the training method of the multi-language machine translation model proposed by the above embodiment of the present invention may also have the following additional technical features:
optionally, performing conflict adjustment on the corresponding derivatives of any two language pairs to obtain the corresponding derivatives of all the adjusted language pairs, including: calculating cosine similarity between derivatives corresponding to any two language pairs; judging whether a conflict relation exists between derivatives corresponding to the two language pairs or not according to the cosine similarity; if a conflict relationship exists, projecting any one derivative of the derivatives corresponding to any two language pairs onto an orthogonal plane of the other derivative to obtain a projected derivative of the any one derivative, and replacing the projected derivative with the any one derivative to complete conflict adjustment of the derivatives corresponding to any two language pairs; if no conflict relationship exists, no conflict adjustment is performed.
Optionally, judging whether a conflict exists between corresponding derivatives of the two arbitrary language pairs according to the cosine similarity includes: if the cosine similarity value is a negative number, judging that a conflict relation exists between derivatives corresponding to any two language pairs; and if the cosine similarity value is a non-negative number, judging that no conflict relation exists between the derivatives corresponding to any two language pairs.
Optionally, the post-projection derivative is obtained by the following formula:
Figure GDA0003593767130000021
wherein, g'lFor the derivative of the l-th language pair after projection, glDerivative of the l-th language pair, gl′Is the derivative of the l' th language pair.
Optionally, in the training process, the learning rate of the parameters of the multi-language machine translation model is adaptively adjusted according to the direction similarity and the amplitude similarity between the derivatives corresponding to any two language pairs, so that the parameters of the multi-language machine translation model are updated according to the adjusted learning rate and the derivatives corresponding to all the language pairs after adjustment, and a trained multi-language machine translation model is obtained.
Optionally, adaptively adjusting the learning rate of the multi-language machine translation model parameter according to the direction similarity and the amplitude similarity between corresponding derivatives of any two languages, including: acquiring corresponding derivatives of any two language pairs in all the language pairs to calculate direction similarity and amplitude similarity between the corresponding derivatives of the any two language pairs; calculating final similarity between the derivatives corresponding to any two language pairs according to the direction similarity and the amplitude similarity between the derivatives corresponding to any two language pairs, and calculating average similarity of all language pairs according to the final similarity of the derivatives corresponding to any two language pairs in all language pairs; and adaptively adjusting the learning rate of the parameters of the multi-language machine translation model according to the average similarity of all the language pairs.
Optionally, the direction similarity and magnitude similarity between the corresponding derivatives of the arbitrary two language pairs are calculated according to the following formulas:
Figure GDA0003593767130000031
Figure GDA0003593767130000032
wherein dsll′Is the direction similarity, msll′To the amplitude similarity, glDerivative of the l-th language pair, gl′Is the derivative of the l' th language pair,cos_sim(gl,gl′) Is the cosine similarity between the derivative of the l-th language pair and the derivative of the l' th language pair, | · | | purple2Representing the norm of L2.
Optionally, the final similarity between the corresponding derivatives of any two language pairs and the average similarity of all language pairs are calculated according to the following formulas:
Figure GDA0003593767130000033
Figure GDA0003593767130000034
wherein s isll′As derivative g of the l-th language pairlAnd the derivative g for the l' th language pairl′Final similarity between them, siterThere are a total of L language pairs for the average similarity of the iter-th training step for all language pairs.
Optionally, the learning rate of the parameters of the multi-language machine translation model is adaptively adjusted according to the following formula:
lr′iter=siter×lriter
wherein, l 'r'iterFor the adjusted learning rate of the current model parameters, lriterFor the learning rate of the model parameters before adjustment, iter represents the iter-th training step.
In order to achieve the above object, a second aspect of the present invention provides a computer-readable storage medium, on which a training program for a multi-language machine translation model is stored, and when the training program for the multi-language machine translation model is executed by a processor, the training method for the multi-language machine translation model is implemented.
According to the computer readable storage medium of the embodiment of the invention, the training program of the multi-language machine translation model is stored, so that the training program of the multi-language machine translation model is executed by the processor to realize the training method of the multi-language machine translation model, and therefore, the problem of derivative conflict of the training examples of different language pairs on model parameter updating is reduced by performing conflict adjustment on corresponding derivatives of any two language pairs, and the overall effect of the multi-language machine translation model on a plurality of language pairs is improved.
Drawings
FIG. 1 is a flow diagram illustrating a method for training a multi-lingual machine translation model according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a PCGrad derivative projection process according to an embodiment of the invention;
fig. 3 is a typical schematic diagram of the relationship between any two derivatives according to an embodiment of the present invention, where subgraphs (a) and (b) depict the extreme case of theoretically great difference in direction or amplitude between the two derivatives, and subgraphs (c) and (d) represent the two general cases of no collision and collision between the derivatives, respectively.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
In order to better understand the above technical solutions, exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
In order to better understand the technical solution, the technical solution will be described in detail with reference to the drawings and the specific embodiments.
FIG. 1 is a flowchart illustrating a method for training a multi-language machine translation model according to an embodiment of the present invention, as shown in FIG. 1, the method for training a multi-language machine translation model according to an embodiment of the present invention includes the following steps:
step 101, obtaining a corpus, wherein the corpus includes a plurality of language pairs.
It is noted that each language pair is made up of a plurality of parallel sentence pairs, each language pair including a source language sentence and a corresponding target language sentence.
That is, all the language pairs of the parallel sentence pairs constitute a total training corpus D ═ D1∪D2∪…∪DlWherein
Figure GDA0003593767130000041
All pairs of parallel sentences, x, representing the l-th language pairlSource language sentences representing the l-th language pair, ylA target language sentence representing the l-th language pair.
Step 102, establishing a multi-language machine translation model, and training the multi-language machine translation model according to each language of the training corpus.
It should be noted that the above-mentioned multilingual machine translation model may use a transform [ Vaswani et al, neuroips 2017] as a basic model structure, which is not specifically limited in the present invention.
As an embodiment, to distinguish parallel sentence pairs in different languages, we add a language tag in the target language before the source language sentence of each parallel sentence pair; the parameter theta is updated by adopting small-batch training on the multi-language translation model, and a Negative Log-likelihood (Negative Log-likelihood) function is adopted as an overall optimized objective function:
Figure GDA0003593767130000051
Figure GDA0003593767130000052
where | V | represents the dictionary size, | y | represents the length of the target language sentence, 1{. is an indicative function of when the internal condition holds a value of 1 and when the internal condition holds a value of 0, ytRepresenting the t-th word in the output target language sentence.
Step 103, in the training process, calculating the derivatives corresponding to all language pairs in the training corpus, and performing conflict adjustment on the derivatives corresponding to any two language pairs to obtain the derivatives corresponding to all the adjusted language pairs.
It should be noted that, the derivatives corresponding to all the language pairs in the corpus can be calculated through the objective function.
As an example, cosine similarity between corresponding derivatives of any two language pairs is calculated; if the cosine similarity value is a negative number, judging that a conflict relation exists between corresponding derivatives of any two languages; if the cosine similarity value is a non-negative number, judging that no conflict relation exists between corresponding derivatives of any two languages; if a conflict relationship exists, projecting any one derivative of the corresponding derivatives of any two languages to an orthogonal plane of the other derivative to obtain a projected derivative of any one derivative, and replacing any one derivative with the projected derivative to finish conflict adjustment of the corresponding derivatives of any two languages; if no conflict relationship exists, no conflict adjustment is performed.
As a specific embodiment, in the process of training a multi-language machine translation model in a small batch, randomly sampling a fixed total number of parallel sentence pairs from the whole training corpus D each time to form a small batch B which is B1∪B2∪…∪BLIn which B islAll parallel sentence pairs belonging to the ith language pair in the currently sampled parallel sentence pairs are included; let g belAnd gl′Respectively representing the calculated derivatives of the training examples of any two different language pairs in the B to the objective function, and carrying out the following specific steps on the derivatives from the different language pairs in the training process of the multi-language model by utilizing a PCGrad algorithm:
first, g is calculatedlAnd gl′Cosine similarity between them cos _ sim (g),gl′) (ii) a As shown in FIG. 2, e.g.Fruit cos _ sim (g)l,gl′) If the value of (b) is negative, g is judgedlAnd gl′If there is a conflict relationship between them, the derivative g from the l language pair is usedlProjection onto derivative gl′The formula is as follows:
Figure GDA0003593767130000053
wherein, g'lRepresents glDerivative after projection.
Note that the derivative g 'after projection'lNo longer has a negative conflicting impact on the l' th language pair when the model parameters are updated; if cos _ sim (g)l,gl′) If not negative, the original g is maintainedl
And then, repeatedly executing the steps for any two different language pairs to finally obtain the derivative of each language pair after the conflict between each language pair and all other language pairs is solved.
It should be noted that through the above steps, the negative influence of the derivative from each language pair on the derivatives of other language pairs is mitigated, so that the multilingual translation model is better optimized jointly, thereby achieving better overall performance over multiple language pairs.
And step 104, updating parameters of the multi-language machine translation model according to the corresponding derivatives of all the adjusted languages so as to obtain the trained multi-language machine translation model.
In addition, in the training process, the learning rate of the parameters of the multi-language machine translation model is adaptively adjusted according to the direction similarity and the amplitude similarity between the corresponding derivatives of any two languages, so that the parameters of the multi-language machine translation model are updated according to the adjusted learning rate and the corresponding derivatives of all the adjusted languages, and the trained multi-language machine translation model is obtained.
As an example, the corresponding derivatives of any two language pairs in all language pairs are first obtained to calculate the directional similarity and the magnitude similarity between the corresponding derivatives of any two language pairs:
Figure GDA0003593767130000061
Figure GDA0003593767130000062
wherein dsll′Is the direction similarity, msll′To the amplitude similarity, glDerivative of the l-th language pair, gl′Derivative of the l' th language pair, cos _ sim (g)l,gl′) Is the cosine similarity between the derivative of the l-th language pair and the derivative of the l' th language pair, | · | | purple2Representing the norm of L2.
It is noted that the value is [ -1,1 [ ]]Cosine similarity cos _ sim (g) of (1)l,gl′) Is mapped to [0,1 ]]Interval, dsll′A larger value of (d) means that the direction between the two derivatives is more similar; ms isll′Is also a real number between 0 and 1.
Then, calculating final similarity between the derivatives corresponding to any two language pairs according to the direction similarity and the amplitude similarity between the derivatives corresponding to any two language pairs, and calculating average similarity of all language pairs according to the final similarity of the derivatives corresponding to any two language pairs in all language pairs:
Figure GDA0003593767130000063
Figure GDA0003593767130000064
wherein s isll′As derivative g of the l-th language pairlAnd is the derivative g of the l' th language pairl′Final similarity between them, siterAverage of the iter training steps for all language pairsSimilarity, there are a total of L language pairs.
Note that sll′Is also in the value range of [0,1 ]]The closer the value is to 1, the more g meanslAnd gl′The more similar, both in terms of direction and magnitude; considering that there are derivatives from L different language pairs, there is a total
Figure GDA0003593767130000065
Derivative pairs of different sources, so that at the iter training step, the average similarity of all derivative pairs of different sources is siterAt the current training step, siterA larger value of (c) represents a higher degree of similarity between the derivative pairs.
As a specific example, four representative cases that may exist between two derivatives and the corresponding similarity values ds are given, as shown in FIG. 3ll′,msll′And sll′(ii) a Sub-graphs (a) and (b) depict the extreme case of theoretically large differences in direction or amplitude between the two derivatives, while sub-graphs (c) and (d) represent the two general cases of no and no collision between the derivatives, respectively.
And finally, carrying out self-adaptive adjustment on the learning rate of the multi-language machine translation model parameters according to the average similarity of all language pairs:
lr′iter=siter×lriter
wherein, l 'r'iterFor the adjusted learning rate of the current model parameters, lriterFor the learning rate of the model parameters before adjustment, iter represents the iter-th training step.
It should be noted that, in the joint optimization of model shared parameters, the more similar the derivatives from different sources mean that the model can perform parameter update along the current direction with a greater learning rate; on the contrary, the learning rate of the current model should be correspondingly reduced; considering the extreme case of conflict between two derivatives, when the two derivatives are identical in length but opposite in direction, they cancel each other; when the two derivatives have the same direction but the module length is greatly different, the great difference of the two derivatives in the updating amplitude can cause the incoordination of the joint optimization of the model; therefore, a learning rate self-adaptive adjustment mechanism is provided on the basis of the original PCGrad algorithm, and the problem is further solved.
Finally, according to the training method of the multi-language machine translation model of the embodiment of the invention, firstly, a training corpus is obtained, wherein the training corpus comprises a plurality of language pairs, then the multi-language machine translation model is established, the multi-language machine translation model is trained according to each language pair of the training corpus, in the training process, the derivatives of all the language pairs in the training corpus are calculated, the corresponding derivatives of any two languages are subjected to conflict adjustment, so that the corresponding derivatives of all the adjusted language pairs are obtained, and the parameters of the multi-language machine translation model are updated according to the corresponding derivatives of all the adjusted language pairs, so that the trained multi-language machine translation model is obtained; therefore, the problem of derivative conflict of the training examples of different language pairs on model parameter updating is solved by carrying out conflict adjustment on corresponding derivatives of any two language pairs, and the overall effect of the multi-language machine translation model on a plurality of language pairs is improved.
In addition, an embodiment of the present invention further provides a computer-readable storage medium, on which a training program of a multi-language machine translation model is stored, and when the training program of the multi-language machine translation model is executed by a processor, the method for training the multi-language machine translation model as described above is implemented.
According to the computer readable storage medium of the embodiment of the invention, the training program of the multi-language machine translation model is stored, so that the training program of the multi-language machine translation model is executed by the processor to realize the training method of the multi-language machine translation model, and therefore, the problem of derivative conflict of the training examples of different language pairs on model parameter updating is reduced by performing conflict adjustment on corresponding derivatives of any two language pairs, and the overall effect of the multi-language machine translation model on a plurality of language pairs is improved.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It should be noted that in the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.
In the description of the present invention, it is to be understood that the terms "first", "second" and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.
In the present invention, unless otherwise expressly stated or limited, the terms "mounted," "connected," "secured," and the like are to be construed broadly and can, for example, be fixedly connected, detachably connected, or integrally formed; can be mechanically or electrically connected; either directly or indirectly through intervening media, either internally or in any other relationship. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.
In the present invention, unless otherwise expressly stated or limited, the first feature "on" or "under" the second feature may be directly contacting the first and second features or indirectly contacting the first and second features through an intermediate. Also, a first feature "on," "over," and "above" a second feature may be directly or diagonally above the second feature, or may simply indicate that the first feature is at a higher level than the second feature. A first feature "under," "beneath," and "under" a second feature may be directly under or obliquely under the second feature, or may simply mean that the first feature is at a lesser elevation than the second feature.
In the description of the specification, reference to the description of "one embodiment," "some embodiments," "an example," "a specific example," or "some examples" or the like means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above should not be understood to necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Although embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are exemplary and not to be construed as limiting the present invention, and that changes, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (6)

1. A method for training a multi-language machine translation model for mitigating language-to-difference conflicts, comprising the steps of:
obtaining a training corpus, wherein the training corpus comprises a plurality of language pairs;
establishing a multi-language machine translation model, and training the multi-language machine translation model according to each language of the training corpus;
in the training process, calculating the corresponding derivatives of all language pairs in the training corpus, and performing conflict adjustment on the corresponding derivatives of any two language pairs to obtain the corresponding derivatives of all adjusted language pairs;
updating the parameters of the multi-language machine translation model according to the corresponding derivatives of all the adjusted language pairs to obtain a trained multi-language machine translation model;
wherein, performing conflict adjustment on the corresponding derivatives of any two language pairs to obtain the corresponding derivatives of all the adjusted language pairs includes:
calculating cosine similarity between derivatives corresponding to any two language pairs;
judging whether a conflict relation exists between derivatives corresponding to any two language pairs according to the cosine similarity;
if a conflict relationship exists, projecting any one derivative of the derivatives corresponding to any two language pairs onto an orthogonal plane of the other derivative to obtain a projected derivative of the any one derivative, and replacing the projected derivative with the any one derivative to complete conflict adjustment of the derivatives corresponding to any two language pairs;
if no conflict relation exists, no conflict adjustment is carried out;
wherein, judging whether a conflict exists between the derivatives corresponding to the two language pairs according to the cosine similarity comprises:
if the cosine similarity value is a negative number, judging that a conflict relation exists between derivatives corresponding to any two language pairs;
if the cosine similarity value is a non-negative number, judging that no conflict relation exists between derivatives corresponding to any two language pairs;
wherein the post-projection derivative is obtained by the following formula:
Figure FDA0003593767120000011
wherein, g'lFor the derivative of the l-th language pair after projection, glDerivative of the l-th language pair, gl′Is the derivative of the l' th language pair;
the method is characterized in that the learning rate of the parameters of the multi-language machine translation model is adaptively adjusted according to the direction similarity and the amplitude similarity between corresponding derivatives of any two language pairs in the training process, so that the parameters of the multi-language machine translation model are updated according to the adjusted learning rate and the adjusted derivatives corresponding to all language pairs, and a trained multi-language machine translation model is obtained.
2. The method for training the multi-language machine translation model for mitigating language-pair difference conflicts of claim 1, wherein adaptively adjusting the learning rate of the parameters of the multi-language machine translation model according to the direction similarity and the magnitude similarity between the corresponding derivatives of any two language pairs comprises:
acquiring corresponding derivatives of any two language pairs in all the language pairs to calculate direction similarity and amplitude similarity between the corresponding derivatives of any two language pairs;
calculating final similarity between the derivatives corresponding to any two language pairs according to the direction similarity and the amplitude similarity between the derivatives corresponding to any two language pairs, and calculating average similarity of all language pairs according to the final similarity of the derivatives corresponding to any two language pairs in all language pairs;
and adaptively adjusting the learning rate of the parameters of the multi-language machine translation model according to the average similarity of all the language pairs.
3. The method of claim 2, wherein the directional similarity and the magnitude similarity between the corresponding derivatives of any two language pairs are calculated according to the following formulas:
Figure FDA0003593767120000021
Figure FDA0003593767120000022
wherein dsll′Is the direction similarity, msll′For amplitude similarity, glDerivative of the l-th language pair, gl′Derivative of the l' th language pair, cos _ sim (g)l,gl′) Is the cosine similarity between the derivative of the l-th language pair and the derivative of the l' th language pair, | · | | purple2Representing the L2 norm.
4. The method of claim 3, wherein the final similarity between the corresponding derivatives of any two language pairs and the average similarity of all language pairs are calculated according to the following formulas:
Figure FDA0003593767120000023
Figure FDA0003593767120000024
wherein s isll′As derivative g of the l-th language pairlAnd the derivative g for the l' th language pairl′Final similarity between them, siterFor all that isThe average similarity of the iter-th training step of a language pair, for a total of L language pairs.
5. The method for training the multi-lingual machine translation model to mitigate language-to-difference conflicts of claim 4, wherein the learning rate of the parameters of the multi-lingual machine translation model is adaptively adjusted according to the following formula:
lr′iter=siter×lriter
wherein, lr'iterFor the learning rate of the adjusted current model parameters, lriterFor the learning rate of the model parameters before adjustment, iter represents the third training step.
6. A computer-readable storage medium, having stored thereon a training program of a multi-language machine translation model for mitigating language-to-difference conflicts, which, when executed by a processor, implements the method of training of a multi-language machine translation model for mitigating language-to-difference conflicts according to any one of claims 1 to 5.
CN202011167339.2A 2020-10-27 2020-10-27 Training method of multi-language machine translation model for relieving language-to-difference conflict Active CN112329481B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011167339.2A CN112329481B (en) 2020-10-27 2020-10-27 Training method of multi-language machine translation model for relieving language-to-difference conflict

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011167339.2A CN112329481B (en) 2020-10-27 2020-10-27 Training method of multi-language machine translation model for relieving language-to-difference conflict

Publications (2)

Publication Number Publication Date
CN112329481A CN112329481A (en) 2021-02-05
CN112329481B true CN112329481B (en) 2022-07-19

Family

ID=74296895

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011167339.2A Active CN112329481B (en) 2020-10-27 2020-10-27 Training method of multi-language machine translation model for relieving language-to-difference conflict

Country Status (1)

Country Link
CN (1) CN112329481B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108519971A (en) * 2018-03-23 2018-09-11 中国传媒大学 A kind of across languages theme of news similarity comparison methods based on Parallel Corpus
CN108874785A (en) * 2018-06-01 2018-11-23 清华大学 A kind of translation processing method and system
CN108960317A (en) * 2018-06-27 2018-12-07 哈尔滨工业大学 Across the language text classification method with Classifier combination training is indicated based on across language term vector
CN110543640A (en) * 2019-08-09 2019-12-06 沈阳雅译网络技术有限公司 attention mechanism-based neural machine translation inference acceleration method
CN110941964A (en) * 2019-12-11 2020-03-31 北京小米移动软件有限公司 Bilingual corpus screening method and device and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2014235599A (en) * 2013-06-03 2014-12-15 独立行政法人情報通信研究機構 Translation device, learning device, translation method, and program

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108519971A (en) * 2018-03-23 2018-09-11 中国传媒大学 A kind of across languages theme of news similarity comparison methods based on Parallel Corpus
CN108874785A (en) * 2018-06-01 2018-11-23 清华大学 A kind of translation processing method and system
CN108960317A (en) * 2018-06-27 2018-12-07 哈尔滨工业大学 Across the language text classification method with Classifier combination training is indicated based on across language term vector
CN110543640A (en) * 2019-08-09 2019-12-06 沈阳雅译网络技术有限公司 attention mechanism-based neural machine translation inference acceleration method
CN110941964A (en) * 2019-12-11 2020-03-31 北京小米移动软件有限公司 Bilingual corpus screening method and device and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于深度学习与主题模型的问句相似度计算;周强;《中国优秀硕士学位论文全文数据库》;20161115;I138-477 *

Also Published As

Publication number Publication date
CN112329481A (en) 2021-02-05

Similar Documents

Publication Publication Date Title
JP7382964B2 (en) Detection model training method and device, terminal equipment and program
WO2023071743A1 (en) Network model training method and apparatus, and computer-readable storage medium
WO2020177432A1 (en) Multi-tag object detection method and system based on target detection network, and apparatuses
JP7291183B2 (en) Methods, apparatus, devices, media, and program products for training models
WO2016095068A1 (en) Pedestrian detection apparatus and method
WO2022089143A1 (en) Method for generating analog image, and electronic device and storage medium
CN111310464A (en) Word vector acquisition model generation method and device and word vector acquisition method and device
CN110874590A (en) Training and visible light infrared visual tracking method based on adapter mutual learning model
CN112070777B (en) Method and device for organ-at-risk segmentation under multiple scenes based on incremental learning
CN112329481B (en) Training method of multi-language machine translation model for relieving language-to-difference conflict
CN111950579A (en) Training method and training device for classification model
CN116595130B (en) Corpus expansion method and device under multiple tasks based on small language model
KR20220094967A (en) Method and system for federated learning of artificial intelligence for diagnosis of depression
CN110287999B (en) Story generation method and device based on hidden variable model
US20200065657A1 (en) Machine learning system and boltzmann machine calculation method
CN112733873A (en) Chromosome karyotype graph classification method and device based on deep learning
CN112132841A (en) Medical image cutting method and device
CN117253071A (en) Semi-supervised target detection method and system based on multistage pseudo tag enhancement
CN111898465B (en) Method and device for acquiring face recognition model
CN113408482B (en) Training sample generation method and generation device
JP2020135438A (en) Basis presentation device, basis presentation method and basis presentation program
CN111612021B (en) Error sample identification method, device and terminal
CN114818859A (en) Method and device for diagnosing condition of heat distribution pipe network, terminal equipment and storage medium
US11599783B1 (en) Function creation for database execution of deep learning model
JP7024262B2 (en) Learning methods, how to use learning results, learning programs and learning devices

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant