CN116579352A - Translation model training method and device, mobile terminal and storage medium - Google Patents

Translation model training method and device, mobile terminal and storage medium Download PDF

Info

Publication number
CN116579352A
CN116579352A CN202310458178.XA CN202310458178A CN116579352A CN 116579352 A CN116579352 A CN 116579352A CN 202310458178 A CN202310458178 A CN 202310458178A CN 116579352 A CN116579352 A CN 116579352A
Authority
CN
China
Prior art keywords
translation model
text content
complete text
translation
destroyed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310458178.XA
Other languages
Chinese (zh)
Inventor
于鹏
邢启洲
李健
陈明
武卫东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuxi Jietong Digital Intelligence Technology Co ltd
Original Assignee
Wuxi Jietong Digital Intelligence Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuxi Jietong Digital Intelligence Technology Co ltd filed Critical Wuxi Jietong Digital Intelligence Technology Co ltd
Priority to CN202310458178.XA priority Critical patent/CN116579352A/en
Publication of CN116579352A publication Critical patent/CN116579352A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/55Rule-based translation
    • G06F40/56Natural language generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Machine Translation (AREA)

Abstract

The application provides a translation model training method, a translation model training device, a mobile terminal and a storage medium. The translation model training method comprises the steps of carrying out destructive processing on multilingual corpus carrying parallel relations to obtain first destroyed complete text content, wherein the parallel relations comprise parallel relations of source languages and target languages; inputting the destroyed first complete text content into a preset translation model, and executing a text reconstruction action to obtain reconstructed first complete text content; and adjusting the translation model according to the reconstructed first complete text content and multilingual corpus to obtain a first translation model. The text generating capacity of the first translation model obtained after training is higher than that of the original translation model, so that the confusion of the translation output when the translation model is used for machine translation is reduced, and the translation quality is improved.

Description

Translation model training method and device, mobile terminal and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method and apparatus for training a translation model, a mobile terminal, and a storage medium.
Background
The automatic machine translation system is a system for executing translation work without manual intervention, and ensures the accuracy and fluency of a target language while ensuring that a source language in an input system is translated into the target language. The automatic machine translation system depends on translation models generated by data and algorithms, and the translation models need to input massive linguistic data and pretrain to obtain language characterization independent of specific tasks. However, the corpus with a small scale carries a large amount of noise, and the fact that the convergence speed of the translation model is low due to the fact that the pre-training of the noise cannot be eliminated, and the effect of an automatic machine translation system applying the translation model is poor.
Aiming at the problems, in the prior art, after segment masking processing is carried out on single language materials, the single language materials are input into a model to predict masked segments, and the model is adjusted according to a prediction result and an actual result. However, this approach only allows the model to learn the feature-related representation of the language part during the pre-training process, and does not have the text generation capability of training the model, and automatic machine translation according to the trained model cannot consistently generate complete sentence expressions.
Disclosure of Invention
The embodiment of the application provides a translation model training method and device, a mobile terminal and a storage medium. According to the translation model training method provided by the application, the first complete text content which is covered and destroyed is subjected to text reconstruction through the translation model according to multilingual corpus carrying parallel relations, and the translation model is adjusted to obtain a first translation model. The application promotes the translation model to output the complete sentence according to the destroyed sentence, adjusts the model according to the difference between the complete sentence and the multilingual corpus carrying parallel relation, and the text generating capacity of the adjusted first translation model is higher than that of the original translation model, thereby reducing the confusion of the translation output when the translation model is used for machine translation and improving the translation quality.
In order to solve the technical problems, the application provides a translation model training method, which comprises the following steps:
performing destruction processing on multilingual corpus carrying parallel relations to obtain destroyed first complete text content, wherein the parallel relations comprise parallel relations of source languages and target languages;
inputting the destroyed first complete text content into a preset translation model, and executing a text reconstruction action to obtain reconstructed first complete text content;
and adjusting the translation model according to the reconstructed first complete text content and the multilingual corpus to obtain a first translation model.
Optionally, the translation model training method provided by the application further comprises the following steps:
the multilingual corpus is spliced to obtain the first complete text content;
and carrying out noise addition on the region positioned in the splicing result in the first complete text content to obtain the destroyed first complete text content and the first masked fragment.
Optionally, the translation model training method provided by the application further comprises the following steps:
and masking the region in the splicing result in the first complete text content according to the preset span parameter and poisson factor to obtain the destroyed first complete text content.
Optionally, the translation model training method provided by the application further comprises the following steps:
inputting the destroyed first complete text content into the first translation model, and executing a prediction action to obtain a predicted first masked segment;
and adjusting the first translation model according to the predicted first masked fragment and the first complete text content to obtain a second translation model.
Optionally, the translation model training method provided by the application further comprises the following steps:
masking according to the single language material of the source language and the target language to obtain destroyed second complete text content;
inputting the destroyed second complete text content into the second translation model, and executing a text reconstruction action to obtain reconstructed second complete text content;
and adjusting the second translation model according to the reconstructed second complete text content and the single language materials of the source language and the target language, and obtaining a third translation model.
Optionally, the translation model training method provided by the application further comprises the following steps:
inputting the destroyed second complete text content into the third translation model to execute a prediction action to obtain a predicted second masked segment;
and adjusting the third translation model according to the predicted second masked fragment and the second complete text content to obtain a fourth translation model.
Optionally, the translation model training method provided by the application further comprises the following steps:
and obtaining the destroyed first complete text content according to multilingual corpus carrying pseudo-parallel relations.
The application also provides a translation model training device, which comprises:
the first text destruction module is used for performing destruction processing on multilingual corpus carrying parallel relations to obtain destroyed first complete text content, wherein the parallel relations comprise parallel relations of source languages and target languages;
the first text reconstruction module is used for inputting the destroyed first complete text content into a preset translation model to execute text reconstruction action to obtain reconstructed first complete text content;
and the first model generation module is used for adjusting the translation model according to the reconstructed first complete text content and the multilingual corpus to obtain a first translation model.
The application also provides a mobile terminal, comprising:
at least one processor; the method comprises the steps of,
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor to enable the at least one processor to implement the translation model training method described above.
The application also provides a computer readable storage medium storing a computer program which when executed by a processor is capable of implementing the translation model training method described above.
According to the translation model training method provided by the application, the first complete text content which is covered and destroyed is subjected to text reconstruction through the translation model according to multilingual corpus carrying parallel relations, and the translation model is adjusted to obtain a first translation model. The application promotes the translation model to output complete sentences according to the destroyed sentences, adjusts the model according to the difference between the complete sentences and multilingual corpus carrying parallel relations, improves the text generation capacity of the translation model, reduces the confusion degree of the translation output when the translation model is used for machine translation, and improves the translation quality.
The foregoing description is only an overview of the technical solutions provided by the present application, and may be implemented according to the content of the specification in order to make the technical means of the present application more clearly understood, and in order to make the above and other objects, features and advantages of the present application more clearly understood, the following specific embodiments of the present application are specifically described.
Drawings
One or more embodiments are illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements, and in which the figures of the drawings are not to be taken in a limiting sense, unless otherwise indicated.
FIG. 1 is a schematic diagram of a translation model training method according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a translation model training method according to an embodiment of the present application;
FIG. 3 is a third schematic diagram of a translation model training method according to an embodiment of the present application;
FIG. 4 is a text reconstruction illustration of a translation model provided by an embodiment of the present application;
FIG. 5 is a schematic diagram of a translation model training method according to an embodiment of the present application;
FIG. 6 is a schematic representation of translation model mask word prediction provided by an embodiment of the present application;
FIG. 7 is a schematic diagram of a translation model training method according to an embodiment of the present application;
FIG. 8 is a schematic diagram of a translation model training method according to an embodiment of the present application;
FIG. 9 is a schematic diagram of a translation model training method according to an embodiment of the present application;
FIG. 10 is a schematic diagram of a translation model training device according to an embodiment of the present application;
FIG. 11 is a schematic diagram of a translation model training device according to an embodiment of the present application;
FIG. 12 is a third schematic diagram of a translation model training device according to an embodiment of the present application;
FIG. 13 is a schematic diagram of a translation model training device according to an embodiment of the present application;
FIG. 14 is a schematic diagram of a translation model training device according to an embodiment of the present application;
FIG. 15 is a schematic diagram of a translation model training device according to an embodiment of the present application;
FIG. 16 is a schematic diagram of a translation model training device according to an embodiment of the present application;
fig. 17 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Exemplary embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the application to those skilled in the art.
The address information processing method provided by the application relies on a standard address library of a character prefix tree and dictionary tree structure, and related concepts are explained and illustrated below:
the training of the translation model provided by the application belongs to the pre-training of the translation model in a machine translation system, and a large amount of training corpus is used, and the translation model obtains general language features independent of specific tasks through enough training steps. The pre-training can effectively reduce the difficulty and uncertainty of the translation model when fitting the downstream task, quicken the convergence rate of the model, and further improve the performance of the translation model when processing the downstream task.
The training method provided by the application comprises a training method for text reconstruction. The text reconstruction task is a training task for obtaining text relation representation by the neural network model, the text reconstruction is applied to training of the translation model, the structure of the training corpus is randomly destroyed, for example, the structure of the training corpus is changed in a masking, replacing or sequence disturbing mode, the translation model carries out text reconstruction according to the destroyed training corpus, and parameters of the translation model are adjusted according to a reconstruction result and the original training corpus, so that the translation model carries text relation characteristics, and the text generation capacity of the translation model is improved.
The training method provided by the application further comprises a training method for mask word prediction. The masking prediction training provided by the application partially masks the training corpus, the translation model predicts the covered content in the training corpus, and the prediction result is compared with the training corpus and the translation model parameters are adjusted, so that the translation model carries language features, and the context association capability of the translation model is improved.
A first embodiment of the present application relates to a translation model training method, as shown in fig. 1, including:
Step 101, carrying out destruction processing on multilingual corpus carrying parallel relations to obtain destroyed first complete text content, wherein the parallel relations comprise parallel relations of source languages and target languages;
102, inputting the destroyed first complete text content into a preset translation model, and executing a text reconstruction action to obtain reconstructed first complete text content;
and step 103, adjusting the translation model according to the reconstructed first complete text content and the multilingual corpus to obtain a first translation model.
Specifically, in the translation model training method provided by the application, firstly, multilingual corpus carrying parallel relations for training is combined and subjected to destruction treatment, and the destruction treatment object is mainly aimed at sentence pair parts corresponding to target languages in the multilingual corpus, and destroyed first complete text content is obtained after treatment in modes of masking, replacement or sequence disorder and the like. The destroyed first complete text content is then input into a translation model for text reconstruction actions, the translation model outputting reconstructed first complete text content carrying both destroyed and uncorrupted segments. And finally, comparing the reconstructed first complete text content with multilingual corpus originally carrying parallel relations, and adjusting a translation model according to a comparison result, thereby obtaining a first translation model with stronger text generation capacity, namely the text generation capacity of the trained translation model.
It should be emphasized that the method of comparing the reconstructed first complete text content with multilingual corpus originally carrying parallel relations is not improved in the present application, and the specific comparison method may be, but is not limited to, determining by using a cross entropy loss function between the translation model result and the correct result. When the cross entropy calculated by the loss function is smaller than a cross entropy threshold set by people, the translation result is accurate compared with the correct result, parameters in the translation model are not required to be adjusted, when the cross entropy is larger than the cross entropy threshold set by people, the error of the translation result is larger, and the related parameters in the translation model are required to be adjusted until the cross entropy of the trained translation result and the correct result falls within the cross entropy threshold range, so that the text generation capacity of the translation model is improved.
The translation model training method provided by the application solves the problem that the existing translation model training method has low lifting degree on multi-language generation type downstream tasks, such as machine translation downstream tasks. Specifically, text reconstruction is carried out on the first complete text content after being covered and destroyed through a translation model according to multilingual corpus carrying parallel relations, and the translation model is adjusted to obtain a first translation model. The application promotes the translation model to output complete sentences according to the destroyed sentences, adjusts the model according to the complete sentences and multilingual corpus carrying parallel relations, improves the text generation capacity of the translation model, reduces the confusion of the translation output when the translation model is used for machine translation, and improves the translation quality.
In addition, the translation model training method provided by the application can also effectively and fully utilize parallel corpus resources to ensure that the translation model can be converged in the multilingual feature space. Specifically, in the prior art, the training mode of many translation models adopts a single corpus or multilingual single corpus without parallel relationship to execute the training action of the translation models. However, the translation model can only be trained by single language materials to improve the single language representation capability of the model, and the translation model can have multi-language representation capability by multi-language single language materials without parallel relation, but because no alignment information is used as a prompt in training, the translation model is difficult to establish effective connection among related sentences of multi-language. For example, the Chinese "good morning-! "English-corresponding" Good movement-! "if they are trained only in the identity input translation model of multilingual monocorpus without parallel relationship, the trained translation model can only learn the context word representation of the respective sentence, and can not effectively correspond" morning "to" moving "and" good "to" good ". This situation can lead to a lack of analog learning of word alignment knowledge by the translation model, which can lead to limited model improvement in multi-lingual generation class downstream tasks such as machine translation. The translation model training method provided by the application inputs the multilingual corpus carrying the parallel relationship into the translation model for training, utilizes the rich cross-language semantic alignment information contained in the multilingual corpus, designs proper training targets to enable the translation model to obtain cross-language representation which is more beneficial to translation tasks, and further improves translation quality.
On the basis of the above embodiment, as shown in fig. 2, the translation model training method provided by the present application further includes:
step 111, performing multi-language corpus stitching processing to obtain the first complete text content;
and 112, adding noise to the region in the splicing result in the first complete text content to obtain the destroyed first complete text content and the first masked fragment.
The translation model training method provided by the application needs word segmentation processing on multilingual corpus, and the word segmentation device can use, but is not limited to, a Tokenizer based on a BPE algorithm, and uses algorithms such as Unigram to train the word segmentation device or uses a word segmentation module of an open source model, wherein Unigram is a Unigram language model in an N-gram language model, and the content in a text is subjected to sliding window operation with the size of N according to bytes based on the algorithm of a statistical language model, so that a byte fragment sequence with the length of N is formed. In addition, the word segmentation device needs to reserve the range of identity information for the special bidding in the application.
In the translation model training method provided by the application, special marks are added in multilingual corpus for text reconstruction and mask word prediction, and the special marks do not contain semantic information and are used for prompting the translation model to execute translation actions. For example, the present application provides two special identifiers, including basic symbols, such as "< span >" for representing filling or "pad" commonly used in mail filling, "< sep >" for segmenting the source language and the target language, "< eos >" for predicting end-of-sentence prompts at mask words, "< mask >" for mask word replacement, and the like; also included are symbols for representing languages, such as chinese expressed in "< zh >", which are determined according to the source language and the target language.
The method of destruction in text reconstruction in the translation model training method provided by the application includes but is not limited to masking, replacing, scrambling, etc., and the embodiment is described by taking masking as an example.
Specifically, the destroyed first complete text content in the translation model training method provided by the application is obtained by means of splicing and noise adding. And splicing multilingual corpus carrying parallel relations to obtain a long sentence, dividing the multilingual corpus by using "< sep >" in the middle, and adding a symbol representation representing a target language at the tail of the sentence.
Then, in the region of the long sentence where the target language is located, the word segmentation device is used for dividing the region into a plurality of word segments or text segments, the word segments or text segments are randomly extracted, and the extracted text segments are replaced by "< mask >".
Inputting the destroyed first complete text content added with noise into a translation model for text reconstruction, and adjusting the translation model according to the difference between the reconstruction result and the actual text content.
According to the translation model training method provided by the application, noise is added to the multilingual corpus carrying the parallel relationship in a random masking mode, and the same multilingual corpus carrying the parallel relationship can be trained for multiple times, so that the training quantity of the translation model training method is increased, and the text generating capacity of the translation model is further improved.
On the basis of the above embodiment, as shown in fig. 3, the translation model training method provided by the present application further includes:
and 113, masking the region in the splicing result in the first complete text content according to the preset span parameter and poisson factor to obtain the destroyed first complete text content.
Specifically, in the translation model training method provided by the application, the coverage ratio of a single corpus and multilingual corpus carrying parallel relations in the training corpus is controlled by span parameters and poisson factors, and the span parameters and the poisson factors are preset according to the actual coverage requirements of users.
For example, the noise adding method may be as follows: for the target language sentence pointer in the long sentence pointing to the sentence head, the mark is "i" down, the distance from "i" to the end of the sentence is detected, when the distance is smaller than the preset value, for example, 4, 50% probability stops masking, 50% continues masking, and when the distance from "i" to the end of the sentence is larger than the preset value, the masking continues. The application defines an integer as a span parameter, and randomly extracts a number r between the span parameter and 0 as a jump span, wherein the length of (i+r) is required to be smaller than the length of a region where the multilingual corpus of the target language in a long sentence is located. Masking the contents from (i+r) to (i+r+m-1), wherein m70% takes an integer p generated by poisson distribution with mean value lambda, 30% takes 1, and after masking is finished, the subscript points to (i+r+m), and the masking action is continued until the sentence length is exceeded. Wherein the training object is similar to the pre-training language model BERT, masking the continuation when m takes p, and masking the individual words when m takes 1.
For example, as shown in fig. 4, the multilingual corpus for training is a sentence pair in which the source language is Tibetan and the target language is chinese, the Tibetan sentence pair and the chinese sentence pair are first joined, the middle is separated by a separator "< sep >", and then noise addition processing is performed. If the "one" is marked as "i", the "i+r" to the "i+r+m-1" are selected, and if the "i+1" to the "i+1+5-1" correspond to the "road, the" segment "is covered, then the" fright "corresponding to the" (i+1+5) "is pointed from the subscript, the covering is continued, the" continuous "segment" corresponding to the "(i+1+5) +2" to the "(i+1+5) +2+1-1" is selected, the covering is carried out, and after the partial covering is carried out on the sentence corresponding to the target language, the "< zh >" is added to the end to mark the information of the target language, so that the translation model can judge that the Tibetan language needs to be converted into Chinese. Inputting the destroyed complete text content into a translation model, acquiring sentence pairs of a target language after text reconstruction, comparing the sentence pairs of the reconstructed target language with the sentence pairs of the original target language, and adjusting relevant parameters of the translation model until the difference is smaller than a threshold when the difference is larger than a preset loss function threshold, so that training of text reconstruction and complete sentence generation capacity of the translation model is realized.
In order to improve the generalization capability of the downstream task of the translation model, the translation model can be converged more effectively, and three-stage training can be preset when masking of multilingual corpus is carried out, and the masking segment proportion is controlled by different span parameters and poisson factors in each stage.
The first stage is a word masking stage, mainly aiming at the initial training stage of the translation model, and because the translation model at the moment does not have any grammar and semantic features of the target language, words and small fragments in the training corpus are mainly masked. The span parameter is set to be 8, the poisson factor is 3.5, the proportion of masking fragments in the destroyed multilingual corpus and the single corpus is about 25%, the masking amplitude is small, the masking length is low, and the translation model can judge the masked fragments by referring to sufficient context information, so that the translation model can acquire language features such as basic lexical information.
The second stage is a segment masking stage, and mainly aims at the translation with primary language capability, and the model enhances the learning of the translation model on language features such as syntax information and the like by increasing noise intensity. Setting span parameter as 6 and poisson factor as 4.5, and interpreting the masked segment by the translation model according to less context information, so that the translation model obtains language features such as high-order syntax information.
The third stage is a high-order generation stage, mainly aiming at a translation model with a certain language understanding and word generation level, span parameters are adjusted down to 4, poisson factors are adjusted up to 5, so that training and Li Ao are less exposed, and the model needs more masked fragments to be predicted. After training, the translation model can enhance the context association capability of the training corpus, obtain higher-level language representation capability and enhance the adaptability to downstream tasks.
The training steps of the three stages of text reconstruction can be controlled to be 8 ten thousand steps, 5 ten thousand to 8 ten thousand steps and 6 ten thousand steps respectively.
It should be emphasized that the present application does not specifically limit the values of span parameters, poisson factors, and the duty ratio of masking segments, and when the text lengths of multilingual corpus and unimorpheme corpus change, these parameters also have corresponding up-and-down fluctuations. In addition, the span parameter and the poisson factor provided by the application can be used for text reconstruction training and mask word prediction training.
According to the translation model training method, the duty ratio of the covered segments is controlled according to the span parameters and the poisson factors, a plurality of training methods with different covering duty ratios are provided, the translation model is trained to different degrees, the context association capacity of the translation model is effectively improved, and meanwhile efficient convergence of the translation model is ensured.
On the basis of the above embodiment, as shown in fig. 5, the translation model training method provided by the present application further includes:
104, inputting the destroyed first complete text content into the first translation model, and executing a prediction action to obtain a predicted first masked segment;
and step 105, adjusting the first translation model according to the predicted first masked fragment and the first complete text content to obtain a second translation model.
Specifically, the translation model training method provided by the application not only carries out text reconstruction training on multilingual corpus, but also carries out mask word prediction training on multilingual corpus. And splicing multilingual corpus corresponding to the source language and the target language to obtain a long sentence, dividing the multilingual corpus by using "< sep >" in the middle, and adding a symbol representation representing the target language at the tail of the sentence.
Then, in the region of the long sentence where the target language is located, the word segmentation device is used for dividing the region into a plurality of word segments or text segments, the word segments or text segments are randomly extracted, and the extracted text segments are replaced by "< mask >". The masking word prediction training in the embodiment of the application is similar to the random masking method of the text reconstruction training, and is not repeated here. Because the word masking prediction method in the translation model training method does not need to train the text generation capacity of the translation model, only needs to train the context association capacity of the translation model, the part which is not randomly masked in the area where the target language is located can be replaced by the mark of "< span >, and the translation model only needs to output a predicted first masked segment, and then the first translation model is adjusted according to the difference between the predicted first masked segment and the first complete text content, so that a second translation model with the context association capacity is obtained.
For example, as shown in fig. 6, the multilingual corpus for training is a sentence pair in which the source language is Tibetan and the target language is chinese, the Tibetan sentence pair and the chinese sentence pair are first joined, the middle is separated by a separator "< sep >", and then noise addition processing is performed. If the "one" is marked as "i", the "i+r" to the "i+r+m-1" are selected, and if the "i+1" to the "i+1+5-1" correspond to the "road, the" segment "is covered, then the" fright "corresponding to the" (i+1+5) "is pointed from the subscript, the covering is continued, the" continuous "segment" corresponding to the "(i+1+5) +2" to the "(i+1+5) +2+1-1" is selected, the covering is carried out, and after the partial covering is carried out on the sentence corresponding to the target language, the "< zh >" is added to the end to mark the information of the target language, so that the translation model can judge that the Tibetan language needs to be converted into Chinese. And when the difference is larger, for example, the difference is larger than a preset loss function threshold, the relevant parameters of the translation model are adjusted until the difference is smaller than the threshold, so that the training of the context association capability of the translation model is realized.
On the basis of the three-stage training of the embodiment, the application adds a mask word prediction target in a segment masking stage and a high-order generation stage, thereby training the mask word prediction capability of the translation model.
The first stage is the same as the word masking stage in the above embodiment, and the present application is not repeated.
The second stage is a segment masking stage, and mainly aims at the translation with primary language capability, and the model enhances the learning of the translation model on language features such as syntax information and the like by increasing noise intensity. Setting span parameter as 6, poisson factor as 4.5, and adding masking prediction target, wherein the ratio of masking segments in destroyed multilingual corpus and single corpus is about 33%, and the translation model interprets the masked segments according to less context information, so that the translation model obtains language features such as high-order syntactic information.
The third stage is a high-order prediction stage, mainly aiming at a translation model with a certain language understanding and word generation level, span parameters are adjusted down to 4, poisson factors are adjusted up to 5, and a masking prediction target ratio is increased to 50%, so that the content of the exposed training corpus is less, and the model needs more masking fragments to be predicted. After training, the translation model can enhance the context association capability of the training corpus, obtain higher-level language representation capability and enhance the adaptability to downstream tasks.
Similar to the above embodiment, the number of training steps in three stages can be controlled to 8 ten thousand steps, 5 ten thousand to 8 ten thousand steps, and 6 ten thousand steps, respectively.
In the translation model training method provided by the application, the text reconstruction training has generated complete sentences as the output results of the model, the loss function of the difference between the specific output results and the actual text is more focused on the whole sentences, the translation model text generation capacity is focused on training, the attention of the translation model to the masked words is reduced, and the attention of the translation model to the translation model word masking prediction capacity training is insufficient. Because the embodiment carries out word masking prediction training on multilingual corpus corresponding to the target language, the specific gravity of cross entropy of the masked part in a loss function can be increased, and how to effectively predict the masked word can be more focused when the model converges, so that the attention of the translation model to the masked word is improved, and the context association capability of the translation model is further improved.
On the basis of the above embodiment, as shown in fig. 7, the translation model training method provided by the present application further includes:
step 106, masking according to the single language material of the source language and the target language to obtain destroyed second complete text content;
step 107, inputting the destroyed second complete text content into the second translation model, and executing a text reconstruction action to obtain reconstructed second complete text content;
And step 108, adjusting the second translation model according to the reconstructed second complete text content and the single language of the source language and the target language, and obtaining a third translation model.
Specifically, the translation model training method provided by the application also carries out text reconstruction training aiming at single corpus. Because the single language material of the source language opposite to the single language material of the target language does not exist, splicing processing is not needed, noise addition is directly carried out on the single language material, and the text reconstruction training of the single language material and the text reconstruction training of the multilingual language material in the translation model training method provided by the application are similar, and are not repeated here.
Because the single language material is easier to obtain than the multi-language material, the address information processing method provided by the application carries out text reconstruction training on the translation model through the large-scale single language material, and further improves the text generation capacity of the translation model.
On the basis of the above embodiment, as shown in fig. 8, the translation model training method provided by the present application further includes:
step 109, inputting the destroyed second complete text content into the third translation model to execute a prediction action to obtain a predicted second masked segment;
And step 110, adjusting the third translation model according to the predicted second masked segment and the second complete text content to obtain a fourth translation model.
Specifically, the translation model training method provided by the application also carries out word masking prediction training aiming at single corpus. According to the translation model training method provided by the application, the single corpus can use the single language data set corresponding to the disclosed target language, and can be captured in a network in a crawler mode and the like. Because the single language material of the source language opposite to the single language material of the target language does not exist, splicing processing is not needed, partial segment masking is directly carried out on the single language material, and word masking prediction training of the single language material in the translation model training method provided by the application is similar to word masking prediction training of multilingual language materials, and redundant description is omitted here.
Because the single language material is easier to obtain than the multi-language material, the address information processing method provided by the application carries out word masking prediction training on the translation model through the large-scale single language material, and further improves the context association capability of the translation model.
On the basis of the above embodiment, as shown in fig. 9, the translation model training method provided by the present application further includes:
Step 114, obtaining the destroyed first complete text content according to multilingual corpus carrying pseudo parallel relation.
According to the translation model training method provided by the application, multilingual corpus with pseudo-parallel relation can be selected as a substitute to carry out training methods such as text reconstruction, mask word prediction and the like of the translation model. The multilingual corpus with pseudo-parallel relation can be obtained through the prior disclosed translation system, for example, single language materials of the target languages are input into the prior translation system from the target to the source languages, so that machine translation translations of the source languages are obtained in batches, and the single language materials of the target languages are combined to serve as multilingual corpus between the source languages and the target languages in the translation model training method.
The multilingual corpus carrying parallel relations is precious and scarce, and the problem of high training cost and small training quantity exists in the training of the translation model by utilizing the multilingual corpus carrying parallel relations. According to the translation model training method, the translation model is trained by the multilingual corpus which is easier to obtain and lower in cost and carries the pseudo-parallel relationship, and the translation model training cost is reduced on the premise that the translation model is trained by using the large-scale multilingual corpus which carries the pseudo-parallel relationship so as to ensure the accuracy of the translation model.
It should be emphasized that the training object of the present application may be, but not limited to, a neural network translation model, for example, the translation model in the translation model training method provided by the present application is a Attention mechanism-based transducer model, which includes two sub-parts of an encoder and a decoder, the encoder sub-part includes a plurality of encoding layers, and the encoding layers include a Multi-Head Attention mechanism (Multi-Head Attention) module and a Feed Forward (Feed Forward) module, and are configured with residual links. The decoder subsection includes a Masked Multi-Head Attention mechanism (mask Multi-Head Attention) module, a cross-module Multi-Head Attention mechanism (Multi-Head Attention) module, and a Feed Forward (Feed Forward) module, and is configured with residual links, and specific encoder model structures and decoder model structures are disclosed in the prior art, and the application is not repeated. In the translation model training method provided by the application, text reconstruction, word masking prediction and subsequent fine tuning all use cross entropy loss functions with label smoothing, an Adam optimizer iteration parameter is used, a dynamic learning rate adjustment strategy is utilized to adjust a translation model, a Beam Search algorithm is used for model decoding prediction, and Beam Size can be set to be 2.
A second embodiment of the present application relates to a translation model training device, as shown in fig. 10, including:
a first text destruction module 121, configured to perform destruction processing on multilingual corpus carrying parallel relations to obtain destroyed first complete text content, where the parallel relations include parallel relations of a source language and a target language;
a first text reconstruction module 122, configured to input the corrupted first complete text content into a preset translation model, and perform a text reconstruction action to obtain reconstructed first complete text content;
the first model generating module 123 is configured to adjust the translation model according to the reconstructed first complete text content and the multilingual corpus, and obtain a first translation model.
On the basis of the above embodiment, as shown in fig. 11, the first text destruction module 121 includes:
a stitching sub-module 124, configured to stitch the multilingual corpus to obtain the first complete text content;
and the noise adding sub-module 125 is configured to perform noise addition on an area located in the splicing result in the first complete text content, so as to obtain the corrupted first complete text content and the first masked fragment.
On the basis of the above embodiment, as shown in fig. 12, the noise adding sub-module 125 includes:
and a masking unit 126, configured to mask, according to a preset span parameter and poisson factor, an area located in the splicing result in the first complete text content, so as to obtain the corrupted first complete text content. .
On the basis of the above embodiment, as shown in fig. 13, the translation model training device provided by the present application further includes:
a first prediction module 127, configured to input the corrupted first complete text content into the first translation model to perform a prediction action to obtain a predicted first masked segment;
and a second model generating module 128, configured to adjust the first translation model according to the predicted first masked segment and the first complete text content, and obtain a second translation model.
On the basis of the above embodiment, as shown in fig. 14, the translation model training device provided by the present application further includes:
a second text destruction module 129, configured to mask the single language material according to the source language and the target language to obtain destroyed second complete text content;
a second text reconstruction module 130, configured to input the corrupted second complete text content into the second translation model, and perform a text reconstruction action to obtain reconstructed second complete text content;
And the third model generating module 131 is configured to adjust the second translation model according to the reconstructed second complete text content and the single language of the source language and the target language, and obtain a third translation model.
On the basis of the above embodiment, as shown in fig. 15, the translation model training device provided by the present application further includes:
a second prediction module 132, configured to input the corrupted second complete text content into the third translation model to perform a prediction action to obtain a predicted second masked segment;
and a fourth model generating module 133, configured to adjust the third translation model according to the predicted second masked segment and the second complete text content, and obtain a fourth translation model.
On the basis of the above embodiment, as shown in fig. 16, the first text destruction module 121 further includes:
the pseudo parallel corpus destruction sub-module 134 is configured to obtain the destroyed first complete text content according to a multilingual corpus carrying pseudo parallel relations.
A third embodiment of the present application relates to a mobile terminal, as shown in fig. 17, including:
at least one processor 161; the method comprises the steps of,
a memory 162 communicatively coupled to the at least one processor 161; wherein,,
The memory 162 stores instructions executable by the at least one processor 161 to enable the at least one processor 161 to implement the translation model training method according to the first embodiment of the present application.
Where the memory and the processor are connected by a bus, the bus may comprise any number of interconnected buses and bridges, the buses connecting the various circuits of the one or more processors and the memory together. The bus may also connect various other circuits such as peripherals, voltage regulators, and power management circuits, which are well known in the art, and therefore, will not be described any further herein. The bus interface provides an interface between the bus and the transceiver. The transceiver may be one element or may be a plurality of elements, such as a plurality of receivers and transmitters, providing a means for communicating with various other apparatus over a transmission medium. The data processed by the processor is transmitted over the wireless medium via the antenna, which further receives the data and transmits the data to the processor.
The processor is responsible for managing the bus and general processing and may also provide various functions including timing, peripheral interfaces, voltage regulation, power management, and other control functions. And memory may be used to store data used by the processor in performing operations.
A fourth embodiment of the present application relates to a computer-readable storage medium storing a computer program. The computer program, when executed by a processor, implements the translation model training method according to the first embodiment of the present application.
That is, it will be understood by those skilled in the art that all or part of the steps in implementing the methods of the embodiments described above may be implemented by a program stored in a storage medium, where the program includes several instructions for causing a device (which may be a single-chip microcomputer, a chip or the like) or a processor (processor) to perform all or part of the steps in the methods of the embodiments of the application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Other embodiments of the application will be apparent to those skilled in the art from consideration of the specification and practice of the application disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It is to be understood that the application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (10)

1. A method of training a translation model, the method comprising:
performing destruction processing on multilingual corpus carrying parallel relations to obtain destroyed first complete text content, wherein the parallel relations comprise parallel relations of source languages and target languages;
inputting the destroyed first complete text content into a preset translation model, and executing a text reconstruction action to obtain reconstructed first complete text content;
and adjusting the translation model according to the reconstructed first complete text content and the multilingual corpus to obtain a first translation model.
2. The method of claim 1, wherein the destroying process comprises a partial masking process, wherein the destroying process is performed on the multilingual corpus carrying parallel relations to obtain the destroyed first complete text content, wherein the parallel relations comprise parallel relations of a source language and a target language, and wherein the destroying process comprises:
The multilingual corpus is spliced to obtain the first complete text content;
and carrying out partial masking processing on the region, located in the splicing result, of the first complete text content to obtain the destroyed first complete text content and the first masked fragment.
3. The method of claim 2, wherein the performing a partial masking process on the region of the first complete text content located in the splicing result to obtain the corrupted first complete text content and the first masked fragment includes:
and masking the region in the splicing result in the first complete text content according to the preset span parameter and poisson factor to obtain the destroyed first complete text content.
4. The method of claim 2, wherein said adjusting the translation model based on the reconstructed first complete text content and the multilingual corpus, after obtaining a first translation model, further comprises:
inputting the destroyed first complete text content into the first translation model, and executing a prediction action to obtain a predicted first masked segment;
And adjusting the first translation model according to the predicted first masked fragment and the first complete text content to obtain a second translation model.
5. The method of claim 4, wherein said adjusting said first translation model based on said predicted first complete text content and said complete text content, after obtaining a second translation model, further comprises:
masking according to the single language material of the source language and the target language to obtain destroyed second complete text content;
inputting the destroyed second complete text content into the second translation model, and executing a text reconstruction action to obtain reconstructed second complete text content;
and adjusting the second translation model according to the reconstructed second complete text content and the single language materials of the source language and the target language, and obtaining a third translation model.
6. The method of claim 5, wherein said adjusting said second translation model based on said reconstructed second complete text content and said single language of said source and target languages, after obtaining a third translation model, further comprises:
inputting the destroyed second complete text content into the third translation model to execute a prediction action to obtain a predicted second masked segment;
And adjusting the third translation model according to the predicted second masked fragment and the second complete text content to obtain a fourth translation model.
7. The method of claim 1, wherein the multilingual corpus carrying parallel relations further includes a multilingual corpus carrying pseudo-parallel relations, wherein the destroying the multilingual corpus carrying parallel relations to obtain destroyed first complete text content, wherein the parallel relations include parallel relations of source languages and target languages includes:
and obtaining the destroyed first complete text content according to multilingual corpus carrying pseudo-parallel relations.
8. A machine translation training device, comprising:
the first text destruction module is used for performing destruction processing on multilingual corpus carrying parallel relations to obtain destroyed first complete text content, wherein the parallel relations comprise parallel relations of source languages and target languages;
the first text reconstruction module is used for inputting the destroyed first complete text content into a preset translation model and executing text reconstruction action to obtain reconstructed first complete text content;
And the first model generation module is used for adjusting the translation model according to the reconstructed first complete text content and the multilingual corpus to obtain a first translation model.
9. A mobile terminal, comprising:
at least one processor; the method comprises the steps of,
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor to enable the at least one processor to implement the translation model training method of any of claims 1-7.
10. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the translation model training method according to any of claims 1-7.
CN202310458178.XA 2023-04-25 2023-04-25 Translation model training method and device, mobile terminal and storage medium Pending CN116579352A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310458178.XA CN116579352A (en) 2023-04-25 2023-04-25 Translation model training method and device, mobile terminal and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310458178.XA CN116579352A (en) 2023-04-25 2023-04-25 Translation model training method and device, mobile terminal and storage medium

Publications (1)

Publication Number Publication Date
CN116579352A true CN116579352A (en) 2023-08-11

Family

ID=87533246

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310458178.XA Pending CN116579352A (en) 2023-04-25 2023-04-25 Translation model training method and device, mobile terminal and storage medium

Country Status (1)

Country Link
CN (1) CN116579352A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110263349A (en) * 2019-03-08 2019-09-20 腾讯科技(深圳)有限公司 Corpus assessment models training method, device, storage medium and computer equipment
US20200045289A1 (en) * 2018-07-31 2020-02-06 Intel Corporation Neural network based patch blending for immersive video
CN113673261A (en) * 2021-09-07 2021-11-19 北京小米移动软件有限公司 Data generation method and device and readable storage medium
CN114861628A (en) * 2022-04-22 2022-08-05 四川语言桥信息技术有限公司 System, method, electronic device and storage medium for training machine translation model
CN115114940A (en) * 2022-06-29 2022-09-27 中译语通科技股份有限公司 Machine translation style migration method and system based on curriculum pre-training

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200045289A1 (en) * 2018-07-31 2020-02-06 Intel Corporation Neural network based patch blending for immersive video
CN110263349A (en) * 2019-03-08 2019-09-20 腾讯科技(深圳)有限公司 Corpus assessment models training method, device, storage medium and computer equipment
CN113673261A (en) * 2021-09-07 2021-11-19 北京小米移动软件有限公司 Data generation method and device and readable storage medium
CN114861628A (en) * 2022-04-22 2022-08-05 四川语言桥信息技术有限公司 System, method, electronic device and storage medium for training machine translation model
CN115114940A (en) * 2022-06-29 2022-09-27 中译语通科技股份有限公司 Machine translation style migration method and system based on curriculum pre-training

Similar Documents

Publication Publication Date Title
CN110489555B (en) Language model pre-training method combined with similar word information
Harrat et al. Machine translation for Arabic dialects (survey)
Liu et al. Machine translation: general
EP1489523B1 (en) Adaptive machine translation
US7295963B2 (en) Adaptive machine translation
US10789431B2 (en) Method and system of translating a source sentence in a first language into a target sentence in a second language
US20090150139A1 (en) Method and apparatus for translating a speech
JP2004171575A (en) Statistical method and device for learning translation relationships among phrases
JP2004355625A (en) Method and system for training machine translator
US20170286376A1 (en) Checking Grammar Using an Encoder and Decoder
CN104462072A (en) Input method and device oriented at computer-assisting translation
WO2002039318A1 (en) User alterable weighting of translations
Lavie et al. Experiments with a Hindi-to-English transfer-based MT system under a miserly data scenario
Collins et al. Adaptation-guided retrieval in EBMT: A case-based approach to machine translation
CN112926344A (en) Word vector replacement data enhancement-based machine translation model training method and device, electronic equipment and storage medium
Zhang et al. Mind the gap: Machine translation by minimizing the semantic gap in embedding space
CN116579352A (en) Translation model training method and device, mobile terminal and storage medium
Geer Statistical machine translation gains respect
CN112380882B (en) Mongolian Chinese neural machine translation method with error correction function
CN114861628A (en) System, method, electronic device and storage medium for training machine translation model
Pu et al. Passing parser uncertainty to the transformer: Labeled dependency distributions for neural machine translation
Chinea-Rios et al. Vector sentences representation for data selection in statisticalmachine translation
Rana et al. Example based machine translation using fuzzy logic from English to Hindi
CN110147556A (en) A kind of construction method of multidirectional neural network translation system
Collins et al. An example-based approach to machine translation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination