CN114757210A - Translation model training method, sentence translation method, device, equipment and program - Google Patents

Translation model training method, sentence translation method, device, equipment and program Download PDF

Info

Publication number
CN114757210A
CN114757210A CN202210220466.7A CN202210220466A CN114757210A CN 114757210 A CN114757210 A CN 114757210A CN 202210220466 A CN202210220466 A CN 202210220466A CN 114757210 A CN114757210 A CN 114757210A
Authority
CN
China
Prior art keywords
translation
sentence
translation model
training
memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210220466.7A
Other languages
Chinese (zh)
Inventor
程信
严睿
刘乐茂
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Renmin University of China
Original Assignee
Tencent Technology Shenzhen Co Ltd
Renmin University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd, Renmin University of China filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202210220466.7A priority Critical patent/CN114757210A/en
Publication of CN114757210A publication Critical patent/CN114757210A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • G06F40/47Machine-assisted translation, e.g. using translation memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention provides a method for training a translation model, which comprises the following steps: obtaining a source terminal statement corresponding to the target translation memory statement in a translation memory library; forming training samples by the source statement and each target translation memory statement, and forming a training sample set by different training samples; processing different training samples in the training sample set through the translation model to determine an updating parameter of the translation model; and according to the updating parameters of the translation model, iteratively updating the encoder parameters and the decoder parameters of the translation model through different training samples in the training sample set. The invention also provides a device, equipment, a software program and a storage medium. The method and the device can enable the trained translation model to be higher in accuracy and better in translation effect, and the embodiment of the invention can also be applied to various scenes such as cloud technology, artificial intelligence, intelligent traffic, auxiliary driving and the like.

Description

Translation model training method, sentence translation method, device, equipment and program
Technical Field
The present invention relates to Machine Translation (MT) technology, and more particularly, to a Translation model training method, a sentence Translation method, an apparatus, a device, a software program, and a storage medium.
Background
At present, text or voice is often required to be translated in work and life of people, and generally, special Translation applications or Machine Translation (MT) through translated web pages can be used, but sometimes Translation errors occur in Machine Translation, so when a Machine Translation technology is used in the industry, the technology is widely used in combination with CAT-Aided Translation (CAT). With the advancement and improvement of MT systems, various efficient CAT interaction modes are emerging.
With the development of Machine Translation, Neural Machine Translation (NMT) is commonly used as a new generation of Translation technology. The neural network machine translation system is built based on an encoder-decoder framework, however, in the translation process of the neural network machine translation system, a decoder has multiple tasks, such as recording the current translation content and the content needing translation, recording the information related to the fluency of translation, and the like. A Translation Memory (TM Translation Memory) is a database that stores pairs of source and target language fragments. The translator can consult the database when performing translation to improve the efficiency and consistency of translation. In the machine translation community, early work focused primarily on integrating translation memory into statistical machine translation models. In recent years, with the excellent effect of the Neural Machine Translation model on each Translation task, more and more researches aim to integrate the Translation memory into the Neural Translation model (NMT Neural Machine Translation), but the complex model structure and the redundant Translation memory affect the training precision and the training speed of the Translation model, and are not beneficial to the wide popularization and use of the Translation model.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method, an apparatus, a device, a software program, and a storage medium for training a translation model, which can reduce the model complexity of the translation model, and select a translation memory statement similar to a to-be-translated statement through comparison and retrieval, so as to reduce the problems of complex network structure, influence on training speed, and overlong translation time in use, which are caused by an additional memory network in the related art.
The technical scheme of the embodiment of the invention is realized as follows:
the embodiment of the invention provides a training method of a translation model, which comprises the following steps:
acquiring a target translation memory statement;
obtaining a source terminal statement corresponding to the target translation memory statement in a translation memory library;
forming training samples by the source-end sentences and each target translation memory sentences, and forming training sample sets by different training samples;
acquiring initial parameters of a translation model;
responding to initial parameters of the translation model, processing different training samples in the training sample set through the translation model, and determining updating parameters of the translation model;
and according to the updating parameters of the translation model, iteratively updating the encoder parameters and the decoder parameters of the translation model through different training samples in the training sample set.
The embodiment of the invention provides a method comprising the following steps:
determining at least one word-level hidden variable corresponding to a sentence to be translated through an encoder of a translation model;
generating, by a decoder of the translation model, a translated term corresponding to the at least one term-level hidden variable and a selected probability of the translated term according to the at least one term-level hidden variable;
selecting at least one translation word to form a translation result corresponding to the sentence to be translated according to the selection probability of the translation result;
and outputting the translation result.
The embodiment of the invention also provides a training device of the translation model, which comprises:
the data transmission module is used for acquiring a target translation memory statement;
the translation model training module is used for acquiring a source end sentence corresponding to the target translation memory sentence from a translation memory library;
the translation model training module is used for forming training samples by the source-end sentences and the target translation memory sentences and forming training sample sets by different training samples;
the translation model training module is used for acquiring initial parameters of a translation model;
the translation model training module is used for responding to initial parameters of the translation model, processing different training samples in the training sample set through the translation model, and determining updating parameters of the translation model;
and the translation model training module is used for carrying out iterative updating on the encoder parameter and the decoder parameter of the translation model through different training samples in the training sample set according to the updating parameter of the translation model.
In the above-mentioned scheme, the first step of the method,
the translation model training module is used for acquiring the maximum length of the sentence to be translated and the maximum length of any translation memory sentence;
the translation model training module is used for acquiring the lemma distance between the sentence to be translated and any translation memory sentence,
the translation model training module is used for determining the similarity between the sentence to be translated and any translation memory sentence based on the word element distance, the maximum length of the sentence to be translated and the maximum length of any translation memory sentence;
and the translation model training module is used for determining that any translation memory statement is an original translation memory statement corresponding to the statement to be translated when the similarity is greater than or equal to a similarity threshold.
In the above-mentioned scheme, the first step of the method,
the translation model training module is used for calculating an attention value corresponding to each translation memory statement through an attention function;
the translation model training module is used for fusing the translation memory sentences with the same attention value into the same translation memory sentence; or
And the translation model training module is used for fusing the translation memory sentences with the same attention value into different training samples in the training sample subset.
In the above-mentioned scheme, the first step of the method,
the translation model training module is used for determining a dynamic noise threshold value matched with the use environment of the translation model;
the translation model training module is used for carrying out denoising treatment on the training sample set according to the dynamic noise threshold value so as to form a denoising training sample set matched with the dynamic noise threshold value; alternatively, the first and second electrodes may be,
the translation model training module is used for determining a fixed noise threshold corresponding to the translation model and carrying out denoising processing on the training sample set according to the fixed noise threshold so as to form a denoising training sample set matched with the fixed noise threshold.
In the above-mentioned scheme, the first and second light sources,
and the translation model training module is used for carrying out negative example processing on the training sample set to form a negative example sample set corresponding to the training sample set, wherein the negative example sample set is used for adjusting the encoder parameters and the decoder parameters of the translation model.
In the above-mentioned scheme, the first step of the method,
the translation model training module is used for determining the training sample set;
the translation model training module is used for determining a supervision function corresponding to the translation model;
the translation model training module is used for adjusting the temperature coefficient of the supervision function;
the translation model training module is used for carrying out negative example processing on the training sample set through the supervision function based on the vector similarity and different temperature coefficients of any two translation memory sentences in the training sample set to form a negative example sample set corresponding to the training sample set.
In the above-mentioned scheme, the first step of the method,
the translation model training module is used for randomly combining sentences to be output in a decoder of the translation model to form a negative sample set corresponding to the training sample set; alternatively, the first and second electrodes may be,
and the translation model training module is used for carrying out random deletion processing or replacement processing on the sentences to be output in a decoder of the translation model to form a negative example sample set corresponding to the training sample set.
In the above-mentioned scheme, the first step of the method,
the translation model training module is used for substituting different training samples in the training sample set into a loss function corresponding to a self-coding network formed by an encoder and a decoder of the translation model;
and the translation model training module is used for determining parameters corresponding to an encoder and corresponding decoder parameters in the translation model as updating parameters of the translation model when the loss function meets the convergence condition.
An embodiment of the present invention further provides a sentence translation apparatus, where the apparatus includes:
the encoder module is used for determining at least one word-level hidden variable corresponding to the sentence to be translated through an encoder of the translation model;
a decoder module, configured to generate, by a decoder of the translation model, a translated term corresponding to the hidden variable at the term level and a selected probability of the translated term according to the hidden variable at the term level;
the decoder module is used for selecting at least one translation word to form a translation result corresponding to the sentence to be translated according to the selection probability of the translation result;
and the decoder module is used for outputting the translation result.
An embodiment of the present invention further provides an electronic device, where the electronic device includes:
a memory for storing executable instructions;
and the processor is used for realizing the training method of the preorder translation model when the executable instructions stored in the memory are run.
The embodiment of the present invention further provides a computer-readable storage medium, which stores executable instructions, and is characterized in that the executable instructions, when executed by a processor, implement a training method of a translation model of a preamble or a sentence translation method of the preamble.
The embodiment of the invention has the following beneficial effects:
the technical scheme provided by the invention comprises the steps of obtaining a sentence to be translated, and obtaining at least two original translation memory sentences through contrast retrieval based on the sentence to be translated; performing translation memory fusion processing on the obtained at least two original translation memory sentences to obtain a target translation memory sentence; acquiring a corresponding source terminal statement from a translation memory library based on each target translation memory statement; forming training samples by the source-end sentences and each target translation memory sentences, and forming training sample sets by different training samples; acquiring initial parameters of a translation model; responding to initial parameters of the translation model, processing different training samples in the training sample set through the translation model, and determining updating parameters of the translation model; according to the updating parameters of the translation model, the encoder parameters and the decoder parameters of the translation model are updated iteratively through different training samples in the training sample set, so that translation memory sentences similar to the sentences to be translated are selected through comparison and retrieval, the problems of complex network structure, influence on training speed and overlong translation time in use, which are caused by extra memory networks in the related technology, can be reduced, meanwhile, aiming at the redundancy of the translation memory sentences, the similarity of different translation mechanisms can be captured through translation memory fusion by utilizing an attention mechanism, the diversity of the translation memory (namely the diversity of the training samples) is ensured, the accuracy of the trained translation model is higher, the translation effect is better, and the use experience of a user is improved; meanwhile, the gain of the existing translation memory sentences to model training can be effectively and fully utilized, so that the translation model can adapt to different use scenes.
Drawings
FIG. 1 is a schematic view of a use scenario of a translation model training method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a structure of a training apparatus for translation models according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating the generation of translation results in a conventional scheme;
fig. 4 is an optional flowchart of a method for training a translation model according to an embodiment of the present invention;
fig. 5 is an optional flowchart of the translation model training method according to the embodiment of the present invention;
fig. 6 is an alternative flow chart diagram of the translation model training method according to the embodiment of the present invention;
FIG. 7 is a diagram illustrating the results of the accuracy test of the translation model;
FIG. 8 is a diagram illustrating the results of the training efficiency test of the translation model;
FIG. 9 is a schematic diagram showing test results of different translation memory statements in the translation model training method of the present application;
fig. 10 is a schematic diagram of a front-end display interface of a translation model according to an embodiment of the present invention.
FIG. 11 is an alternative architectural diagram of a translation model in an embodiment of the invention;
FIG. 12 is a diagram illustrating an alternative translation process for the translation model in an embodiment of the present invention;
FIG. 13 is a schematic diagram of an alternative structure of an encoder in the translation model in an embodiment of the present invention;
FIG. 14 is a schematic diagram of vector concatenation of an encoder in a translation model according to an embodiment of the present invention;
FIG. 15 is a diagram illustrating an encoding process of an encoder in the translation model according to an embodiment of the present invention;
FIG. 16 is a diagram illustrating a decoding process of a decoder in a translation model according to an embodiment of the present invention;
FIG. 17 is a diagram illustrating an output effect of a translation model according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail with reference to the accompanying drawings, the described embodiments should not be construed as limiting the present invention, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.
In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.
Before further detailed description of the embodiments of the present invention, terms and expressions mentioned in the embodiments of the present invention are explained, and the terms and expressions mentioned in the embodiments of the present invention are applied to the following explanations.
1) And inputting the sentence corresponding to a certain natural language of the translation model before language conversion of the sentence to be translated.
2) And translating the result, wherein the translation model outputs a sentence which corresponds to a certain natural language after the source sentence is subjected to language conversion.
3) And the reference sentence is preset with a reference standard corresponding to a certain natural language after the source sentence is subjected to language conversion.
4) The fidelity, which is a parameter between 0 and 1 representing the closeness of the content of the target sentence to the content of the source sentence, is used as a standard for evaluating the high and low accuracy of the translation of the source sentence, and the larger the value, the higher the closeness of the content of the target sentence to the content of the source sentence is, namely, the higher the accuracy of the translation is.
5) Translation, which converts a sentence in one natural language into a sentence in another natural language.
6) Neural Networks (NN): an Artificial Neural Network (ANN), referred to as Neural Network or Neural Network for short, is a mathematical model or computational model that imitates the structure and function of biological Neural Network (central nervous system of animals, especially brain) in the field of machine learning and cognitive science, and is used for estimating or approximating functions.
7) Machine Translation (MT): in the category of computational linguistics, the study of translating words or speech from one natural language to another by computer programs has been carried out. Neural Network Machine Translation (NMT) is a technique for performing Machine Translation using a Neural network technique.
9) Speech Translation (Speech Translation): also known as automatic speech translation, is a technology for translating speech of one natural language into text or speech of another natural language through a computer, and generally comprises two stages of speech recognition and machine translation.
10) Encoder-decoder architecture: a network architecture commonly used for machine translation technology. The decoder receives the output result of the encoder as input and outputs a corresponding text sequence of another language.
11) A translation memory (translation memory) for storing a database of natural language sentences (or sentence fragments) and translations thereof. As a core component of the machine-assisted translation system, the core component is established and expanded along with the use of a user, so as to eliminate repeated translation and improve the work efficiency, for example, bilingual inter-translation sentence pairs (sample pairs) can be stored in the translation memory bank, wherein the bilingual inter-translation sentence pairs can be manually translated or collected through other ways (for example, translation information in webpage information collected through a web crawler program), if a sentence to be translated provided by the user is "I want to eat hamburgers", the corresponding original translation memory sentence "I like to eat hamburgers", "want to eat French fries", and a source end sentence "Ilike eat hamburgers", "I wait to eat French fries" corresponding to the original translation memory sentence can be retrieved from the translation memory bank, I like to eat hamburgers }, { I want to eat potato chips, I wait to eat French fries } codes into the translation model to guide the translation model to decode so as to obtain a translation result 'I wait to eat Hamburger' which is output by the translation model and corresponds to a statement to be translated. In some embodiments, the source-side sentences and the target translation memory sentences may also be combined into training samples to train the translation model, so as to improve the translation accuracy of the translation model.
Fig. 1 is a schematic view of a use scenario of a translation model training method according to an embodiment of the present invention, referring to fig. 1, a client of translation software is disposed on a terminal (including a terminal 10-1 and a terminal 10-2), a user can input a corresponding sentence to be translated through the disposed client of translation software, and a chat client can also receive a corresponding translation result and display the received translation result to the user; the terminal is connected to the server 200 through a network 300, and the network 300 may be a wide area network or a local area network, or a combination of the two, and uses a wireless link to realize data transmission.
As an example, the server 200 is configured to lay out the translation model and train the translation model to update parameters of an encoder network and parameters of a decoder network in the translation model, so as to generate a translation result for a target sentence to be translated through the encoder network and the decoder network in the translation model, and expose the translation result corresponding to the sentence to be translated generated by the translation model through the terminal (the terminal 10-1 and/or the terminal 10-2). In order to better understand the method provided by the embodiment of the present application, artificial intelligence, each branch of artificial intelligence, and an application field, a cloud technology, and an artificial intelligence cloud service related to the method provided by the embodiment of the present application are explained first.
Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject, and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like. Each direction will be described below.
Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques typically include text processing, semantic understanding, machine translation, robotic question and answer, knowledge mapping, and the like.
Machine Learning (ML) is a multi-domain cross discipline, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. The special research on how a computer simulates or realizes the learning behavior of human beings so as to acquire new knowledge or skills and reorganize the existing knowledge structure to continuously improve the performance of the computer. Machine learning is the core of artificial intelligence, is the fundamental approach to make computers have intelligence, and is applied in various fields of artificial intelligence. Machine learning and deep learning generally include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning, and the like.
Cloud technology refers to a hosting technology for unifying series of resources such as hardware, software, and network in a wide area network or a local area network to realize calculation, storage, processing, and sharing of data. The cloud technology is based on the general names of network technology, information technology, integration technology, management platform technology, application technology and the like applied in the cloud computing business model, can form a resource pool, is used as required, and is flexible and convenient. Cloud computing technology will become an important support. Background services of the technical network system require a large amount of computing and storage resources, such as video websites, picture-like websites and more web portals. With the high development and application of the internet industry, each article may have its own identification mark and needs to be transmitted to a background system for logic processing, data in different levels are processed separately, and various industrial data need strong system background support and can only be realized through cloud computing.
The so-called artificial intelligence cloud Service is also generally called AI as a Service (AI as a Service), and is a Service method of an artificial intelligence platform that is mainstream at present, specifically, the AI as a platform splits several types of common AI services and provides independent or packaged services at a cloud end. This service model is similar to the opening of an AI theme mall: all developers can access one or more artificial intelligence services provided by the platform through an API (application programming interface) interface, and part of the qualified developers can also use the AI framework and AI infrastructure provided by the platform to deploy and operate and maintain own dedicated cloud artificial intelligence services.
The scheme provided by the embodiment of the application relates to technologies such as natural language processing, machine learning and artificial intelligence cloud service of artificial intelligence, and is specifically explained by the following embodiment.
The translation model training method provided by the embodiment of the present application will be described with reference to exemplary applications and implementations of the terminal provided by the embodiment of the present application.
Certainly, before the target sentence to be translated is processed through the translation model to generate a corresponding translation result, the translation model needs to be trained, which specifically includes: obtaining a sentence to be translated, and obtaining at least two original translation memory sentences through contrast retrieval based on the sentence to be translated; performing translation memory fusion processing on the obtained at least two original translation memory sentences to obtain a target translation memory sentence; acquiring a corresponding source terminal statement from a translation memory library based on each target translation memory statement; forming training samples by the source-end sentences and each target translation memory sentences, and forming training sample sets by different training samples; acquiring initial parameters of a translation model; responding to initial parameters of the translation model, processing different training samples in the training sample set through the translation model, and determining updating parameters of the translation model; and according to the updating parameters of the translation model, iteratively updating the encoder parameters and the decoder parameters of the translation model through different training samples in the training sample set.
As described in detail below, the structure of the training apparatus for translation models according to the embodiment of the present invention may be implemented in various forms, for example, the electronic device in the embodiment of the present invention may be a dedicated terminal with a translation model training function, or may be a server provided with a translation model training function, such as the server 200 in fig. 1. Fig. 2 is a schematic structural diagram of a component of a training apparatus for a translation model according to an embodiment of the present invention, and it can be understood that fig. 2 only shows an exemplary structure of the training apparatus for a translation model, and not a whole structure, and a part of or the whole structure shown in fig. 2 may be implemented as needed.
The training device of the translation model provided by the embodiment of the invention comprises: at least one processor 201, memory 202, user interface 203, and at least one network interface 204. The various components in the training apparatus of the translation model are coupled together by a bus system 205. It will be appreciated that the bus system 205 is used to enable communications among the components. The bus system 205 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 205 in fig. 2.
The user interface 203 may include, among other things, a display, a keyboard, a mouse, a trackball, a click wheel, a key, a button, a touch pad, or a touch screen.
It will be appreciated that the memory 202 can be either volatile memory or nonvolatile memory, and can include both volatile and nonvolatile memory. The memory 202 in embodiments of the present invention is capable of storing data to support operation of the terminal (e.g., 10-1). Examples of such data include: any computer program, such as an operating system and application programs, for operation on a terminal, such as 10-1. The operating system includes various system programs, such as a framework layer, a core library layer, a driver layer, and the like, and is used for implementing various basic services and processing hardware-based tasks. The application program may include various application programs.
In some embodiments, the translation model training apparatus provided in the embodiments of the present invention may be implemented by a combination of hardware and software, and for example, the translation model training apparatus provided in the embodiments of the present invention may be a processor in the form of a hardware decoding processor, which is programmed to execute the translation model training method provided in the embodiments of the present invention. For example, a processor in the form of a hardware decode processor may employ one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), Field Programmable Gate Arrays (FPGAs), or other electronic components.
As an example of the implementation of the training device of the translation model provided by the embodiment of the present invention by using a combination of software and hardware, the training device of the translation model provided by the embodiment of the present invention may be directly embodied as a combination of software modules executed by the processor 201, the software modules may be located in a storage medium, the storage medium is located in the memory 202, the processor 201 reads executable instructions included in the software modules in the memory 202, and the translation model training method provided by the embodiment of the present invention is completed in combination with necessary hardware (for example, including the processor 201 and other components connected to the bus 205).
By way of example, the Processor 201 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or the like, wherein the general purpose Processor may be a microprocessor or any conventional Processor or the like.
As an example of the translation model training Device provided in the embodiment of the present invention implemented by hardware, the translation model training Device provided in the embodiment of the present invention may be implemented by directly using a processor 201 in the form of a hardware decoding processor, for example, by being executed by one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), Field Programmable Gate Arrays (FPGAs), or other electronic components, to implement the translation model training method provided in the embodiment of the present invention.
The memory 202 in embodiments of the present invention is used to store various types of data to support the operation of the training apparatus of the translation model. Examples of such data include: any executable instructions for operating on a training apparatus for a translation model, such as executable instructions, a program implementing the method for training from a translation model according to an embodiment of the present invention may be included in the executable instructions.
In other embodiments, the training apparatus for translation models provided by the embodiments of the present invention may be implemented in software, and fig. 2 illustrates the training apparatus for translation models stored in the memory 202, which may be software in the form of programs and plug-ins, and includes a series of modules, and as an example of the programs stored in the memory 202, may include the training apparatus for translation models, and the training apparatus for translation models includes the following software modules: a data transmission module 2081 and a translation model training module 2082. When the software modules in the training apparatus for translation models are read into the RAM by the processor 201 and executed, the method for training translation models provided by the embodiment of the present invention will be implemented, and the functions of the software modules in the training apparatus for translation models in the embodiment of the present invention will be described below, wherein,
and the data transmission module 2081 is used for acquiring the target translation memory statement.
And the translation model training module 2082 is used for acquiring the source end sentence corresponding to the target translation memory sentence from the translation memory library.
The translation model training module 2082 is configured to obtain a corresponding source-end sentence from the translation memory library based on each target translation memory sentence.
The translation model training module 2082 is configured to combine the source-side sentences and each target translation memory sentence into training samples, and combine different training samples into a training sample set.
The translation model training module 2082 is configured to obtain initial parameters of a translation model.
The translation model training module 2082 is configured to, in response to the initial parameters of the translation model, process different training samples in the training sample set through the translation model, and determine updated parameters of the translation model.
The translation model training module 2082 is configured to iteratively update encoder parameters and decoder parameters of the translation model through different training samples in the training sample set according to the update parameters of the translation model.
In some embodiments of the present invention, when the trained translation model is deployed, the electronic device in an embodiment may further include a sentence translation apparatus, and specifically, the sentence translation apparatus includes:
and the encoder module is used for determining at least one word-level hidden variable corresponding to the statement to be translated through an encoder of the translation model. And the decoder module is used for generating a translation word corresponding to the hidden variable of the word level and the selected probability of the translation word according to the hidden variable of the at least one word level through a decoder of the translation model. And the decoder module is used for selecting at least one translation word to form a translation result corresponding to the sentence to be translated according to the selection probability of the translation result. And the decoder module is used for outputting the translation result.
Before introducing the training method of the translation model provided by the embodiment of the present invention, first, in a process of generating a corresponding translation result according to a statement to be translated by the translation model in the present application, a schematic diagram of generating the translation result in a conventional scheme is shown in fig. 3, where a transform frame is taken as an example of a structure of the translation model, and mainly includes functional components such as encoder-embedding layers, encoder/decoder layers (1 to 6 layers), decoder-embedding layers (decoder word vector layers), and decoder software layers (decoder output layers). Wherein each layer of the encoder/decoder is also composed of other basic units. All components are organically combined together to form one layer in the network and then stacked layer by layer to form the whole network. Where the encoder layers convert input sentences (source language) into semantic vectors and the decoder layers convert semantic vectors into output sentences (target language), this process begins with TM retrieval when training is aided with translation memory. This step will search the database for translation fragments similar to the current input by some similarity measure. The TM and X are then combined using an additional Dual-Encoder (Dual-Encoder) structure or directly with a unified vocabulary, and input to the translation model for training. The disadvantage of this approach is that the sentences most similar to the currently translated sentence are selected from the database in turn, but the retrieved sentences are ignored, so that all the retrieved sentences are mutually repeated, the redundant content brings less information gain information, the model training efficiency is affected, and meanwhile, the translation model of the related art processes the retrieved multiple TM independently instead of treating them as a unified whole, and the translation effect of the translation model is affected.
To solve the drawbacks of the related art, referring to fig. 4, fig. 4 is an optional flowchart of a method for training a translation model according to an embodiment of the present invention, and it can be understood that the steps shown in fig. 4 can be executed by various electronic devices operating a translation model training apparatus, such as a dedicated terminal with a model training function, a server with a translation model training function, or a server cluster. The following is a description of the steps shown in fig. 4.
Step 401: the translation model training device obtains a target translation memory statement and obtains a source statement corresponding to the target translation memory statement from a translation memory library.
In some embodiments of the present invention, when obtaining a target translation memory statement, first obtaining a statement to be translated; then, retrieving is carried out based on the sentence similarity between the sentence to be translated and the original translation memory sentence, and at least two original translation memory sentences matched with the sentence to be translated are obtained; and finally, performing translation memory fusion processing on the obtained at least two original translation memory sentences to obtain the target translation memory sentence. The translation memory library can store different original translation memory statements made, for example, the translation memory library can store translation information provided by historical users, namely, each time a user initiates a translation request, the translation memory library can respond to the translation request of the user and correspondingly store the translation information provided by the user, and each stored translation information can be used as an original translation memory statement; or automatically translating the web pages in different languages collected by the crawler program by the trained translation model, and storing the translation result in a translation memory library as an original translation memory statement, which is not limited in the present application.
In some embodiments of the present invention, for example, for a japanese sentence presented on the display interface in the game a and a translated chinese sentence, the sentence translation in the game display interface may be used as a target translation memory sentence and a corresponding source sentence, and stored in the translation memory library, and when a japanese sentence presented on the display interface in the game B needs to be subjected to a memorial translation, a target translation memory sentence matched with a sentence to be translated in the game B and a corresponding source sentence may be obtained in the translation memory library to complete training of a translation model.
In some embodiments of the present invention, the retrieval is performed based on the sentence similarity between the to-be-translated sentence and the original translation memory sentence to obtain at least two original translation memory sentences matched with the to-be-translated sentence, and the retrieval may be implemented in the following manner:
acquiring the maximum length of the statement to be translated and the maximum length of any translation memory statement; obtaining a word element distance between the statement to be translated and any translation memory statement, and determining the similarity between the statement to be translated and any translation memory statement based on the word element distance, the maximum length of the statement to be translated and the maximum length of any translation memory statement; and when the similarity is greater than or equal to a similarity threshold value, determining that any translation memory statement is an original translation memory statement corresponding to the statement to be translated. The input sentence to be translated is any translation memory sentence, and the sentence similarity between the sentence to be translated and the original translation memory sentence is searched by using a comparative search method according to formula 1, so as to obtain at least two original translation memory sentences matched with the sentence to be translated, for example, k (k is an integer greater than or equal to 2) translation memory sentences can be found in a translation memory library:
Figure BDA0003537070940000161
wherein (x, x)i) And xi,xjFor two groups of sentences to be translated and original translation memory sentences, for example, the sentences to be translated which can be input are 'I want to eat apples', and the original translation memory sentences are 'I want to eat strawberries' and 'I intend to eat instant noodles'. Through translation memory fusion processing, the obtained target translation memory sentence can be 'I want to eat strawberry and want to eat instant noodles'.
When similarity retrieval is carried out, a specific calculation mode refers to formula 2, wherein Sim (x, y) is a similarity measurement function between a sentence to be translated and any translation memory sentence:
Figure BDA0003537070940000162
wherein, the molecular term D in the formula 2edit(x, y) is the edit distance of the Token (Token) between any two sentences. The denominator term max (| x |, | y |) is the maximum length of the to-be-translated sentence and any translation memory sentence, and the normalization processing of the similarity can be realized through the maximum length of the to-be-translated sentence and any translation memory sentence, so that by comparing the sentence similarity between the to-be-translated sentence and different original translation memory sentences, a higher original translation memory sentence matched with the to-be-translated sentence can be obtained, the accuracy of the translation model is ensured, and the influence of wrong translation of the translation model on a user is reduced.
Step 402: and the translation model training device performs translation memory fusion processing on the obtained at least two original translation memory sentences to obtain a target translation memory sentence.
In some embodiments of the present invention, performing translation memory fusion processing on the obtained at least two original translation memory statements to obtain a target translation memory statement, may be implemented in the following manner:
calculating an attention value corresponding to each translation memory statement through an attention function; fusing the translation memory statements with the same attention value into the same translation memory statement; or fusing the translation memory statements of the same attention value into different training samples in the training sample subset. Wherein, the attention value corresponding to each translation memory statement can be obtained by calculation of an attention function, specifically referring to formula 3:
Figure BDA0003537070940000171
wherein
Figure BDA0003537070940000172
Memorizing the node v for translationiThe state at the time t +1, the state of the neighbor node of the node at the time t and the state of the node at the time t represent the updating through the self-attention mechanism. Meanwhile, a super node can be configured for each translation memory library, the function of the super node is to enable information communication and interaction between different translation memory sentences, and meanwhile, the content of surrounding nodes cannot be seen through the node of one translation memory internal library, so that the integrity of semantics is ensured, and the original translation memory sentences can be called conveniently in different use environments. Through translation memory fusion, different translation mechanism similarities are captured by utilizing an attention mechanism, and the diversity of translation memory (namely the diversity of training samples) is ensured, so that the accuracy of a trained translation model is higher, and the translation effect is better.
In some embodiments of the present invention, in order to accelerate the training speed of the translation model, the translation memory sentences with the same attention value may also be discarded, and only the translation memory sentences with different attention values may be retained, so as to reduce the volume of the training sample set.
Step 403: and the translation model training device acquires the corresponding source terminal sentences from the translation memory library based on each target translation memory sentence.
Step 404: the translation model training device combines the source-end sentence and each target translation memory sentence into a training sample, and combines different training samples into a training sample set.
In some embodiments of the present invention, when the number of training samples in the training sample set exceeds the threshold of the number of training samples, denoising the training sample set is further required, which specifically includes:
determining a dynamic noise threshold value matching the use environment of the translation model;
denoising the training sample set according to the dynamic noise threshold value to form a denoising training sample set matched with the dynamic noise threshold value; alternatively, the first and second electrodes may be,
and determining a fixed noise threshold corresponding to the translation model, and denoising the training sample set according to the fixed noise threshold to form a denoising training sample set matched with the fixed noise threshold. Wherein the dynamic noise threshold value matched with the use environment of the translation model is different due to different use environments of the translation model, for example, in the use environment of academic translation, the dynamic noise threshold value matched with the use environment of the translation model needs to be smaller than that in the article reading environment.
In some embodiments of the present invention, when the translation model is fixed in a corresponding hardware mechanism and the usage environment is spoken translation, the training speed of the translation model can be effectively refreshed and the waiting time of the user can be reduced by fixing a fixed noise threshold corresponding to the translation model.
After the training sample set is determined, execution continues at step 405.
Step 405: the translation model training device obtains initial parameters of a translation model.
Step 406: and the translation model training device responds to the initial parameters of the translation model, processes different training samples in the training sample set through the translation model, and determines the updating parameters of the translation model.
Step 407: and the translation model training device iteratively updates the encoder parameters and the decoder parameters of the translation model through different training samples in the training sample set according to the updating parameters of the translation model.
In some embodiments of the present invention, initializing the decoder network to update parameters of the decoder network may be accomplished by:
coding the statement to be translated through a coder of the decoder network to form a coding result of the statement to be translated; decoding the coding result of the sentence to be translated through a decoder of the decoder network; and when the selected probability of the translation result corresponding to the sentence to be translated is obtained through decoding, determining the parameters of the decoder network. For example: and (3) generating translation result words which are respectively a translation result a (the probability is 0.45), a translation result b (the probability is 0.5) and a translation result c (the probability is 0.45) by initializing the decoder network of the translation model after training and training, wherein the probability distribution is {0.45, 0.5 and 0.45 }.
Therefore, the translation model can output the translation result with the minimum loss value according to the corresponding statement to be translated.
In some embodiments of the present invention, in response to the initial parameters of the translation model, processing different training samples in the training sample set by the translation model, and determining the update parameters of the translation model may be implemented by:
substituting different training samples in the training sample set into a loss function corresponding to a self-coding network formed by an encoder and a decoder of the translation model; and determining parameters corresponding to an encoder and corresponding decoder parameters in the translation model when the loss function meets the convergence condition as update parameters of the translation model. Wherein the loss function of the encoder network is represented as:
loss _ a ═ Σ (decoder _ a (encoder (warp (x1))) -x1) 2; specifically, decoder _ A is decoder A, warp is a function of the statement to be translated, x1The encoder is used for the statement to be translated.
In the iterative training process, the sentence to be translated is substituted into the loss function of the encoder network, parameters of the encoder A and the decoder A when the loss function is reduced according to the gradient (such as the maximum gradient) are solved, and when the loss function is converged (namely when the hidden variable capable of forming the word level corresponding to the sentence to be translated is determined), the training is finished.
In the training process of the encoder network, the loss function of the encoder network is represented as: loss _ B ═ Σ (decoder _ B (encoder (warp (x2))) -x2) 2; wherein decoder _ B is a decoder B, warp is a function of a statement to be translated, x2 is the statement to be translated, and encoder is an encoder.
In the iterative training process, parameters of an encoder B and a decoder B when a loss function is reduced according to a gradient (such as a maximum gradient) are solved by substituting a statement to be translated into the loss function of the encoder network; when the loss function converges (i.e. when the decoding results in the selected probability of the translation result corresponding to the sentence to be translated), the training is ended.
Therefore, the translation model can output the translation result with the minimum loss value according to the corresponding statement to be translated, and the accuracy of the translation result is ensured.
In some embodiments of the invention, the method further comprises:
negative example processing is carried out on the training sample set to form a negative example sample set corresponding to the training sample set, wherein the negative example sample set is used for adjusting the encoder parameter and the decoder parameter of the translation model.
To further illustrate the obtaining process of the negative example sample, with reference to fig. 5, fig. 5 is an optional flowchart of the method for training the translation model according to the embodiment of the present invention, and it can be understood that the steps shown in fig. 5 may be executed by various electronic devices operating the translation model training apparatus, such as a dedicated terminal with a model training function, a server with a translation model training function, or a server cluster. The following is a description of the steps shown in fig. 5.
Step 501: and determining a supervision function corresponding to the translation model.
Step 502: adjusting a temperature coefficient of the supervisory function.
Step 503: and carrying out negative example processing on the training sample set through the supervision function based on the vector similarity and different temperature coefficients of any two translation memory sentences in the training sample set to form a negative example sample set corresponding to the training sample set. Wherein, the supervision function refers to formula 4:
Figure BDA0003537070940000201
the sim (y, y') represents the Similarity of vectors of any two translation memory sentences, the Similarity calculation can be in a standard method (BLEU) of machine translation Evaluation, Cosine Similarity (Cosine Similarity) and the like, the specific Similarity calculation mode is not limited in the application, the τ is a temperature coefficient and is used for controlling the difficulty of distinguishing positive and negative examples in comparison learning, and the proportion of the negative example samples can be flexibly controlled by adjusting the temperature coefficient.
In some embodiments of the present invention, the encoder and the decoder corresponding to the translation model may also be bidirectional network models, for example, Bi-GRU bidirectional GRU models may be used as the encoder and the decoder corresponding to the translation model, where the Bi-GRU bidirectional GRU model is a model that can identify a flip sentence structure. When a user inputs a dialogue sentence, the dialogue sentence may be in an inverted sentence structure, that is, the dialogue sentence is different from a normal sentence structure, for example, the dialogue sentence input by the user is "how today weather" and the normal sentence structure is "how today weather", and the Bi-directional GRU model is adopted to identify the dialogue sentence in the inverted sentence structure, so that functions of the trained model can be enriched, and the robustness of the target model obtained by final training can be improved.
With reference to fig. 6, fig. 6 is an optional flowchart of the text sentence processing method of the translation model provided in the embodiment of the present invention, and it can be understood that the steps shown in fig. 6 may be executed by various electronic devices operating the translation model training apparatus, for example, a dedicated terminal with a sentence processing function to be translated, a server with a sentence processing function to be translated, or a server cluster. The following is a description of the steps shown in fig. 6.
Step 601: and determining at least one word-level hidden variable corresponding to the sentence to be translated through an encoder of the translation model.
Step 602: generating, by a decoder of the translation model, a translated term corresponding to the at least one term-level hidden variable and a selected probability of the translated term based on the at least one term-level hidden variable.
Step 603: and selecting at least one translation word to form a translation result corresponding to the sentence to be translated according to the selection probability of the translation result.
Step 604: and outputting the translation result.
When testing the trained translation model, refer to fig. 7 to 9, where the model training method of the related art is: the method comprises the following steps of Vaswani et al, 2017, Gu et al, 2018, Zhang et al, 2018, Xu et al, 2020, Xia et al, 2019, He et al, 2021(@ s), Cai et al, 2021(#2), wherein the training method is T-Full, the precision test result schematic diagram of the translation model is shown in FIG. 7, the training efficiency test result schematic diagram of the translation model is shown in FIG. 8, the test result schematic diagram of the number of different translation memory sentences in the training method of the translation model is shown in FIG. 9, the precision score of the T-Full is 58.69-67.76, which is obviously higher than that of the training method in the related art, and the method can effectively and fully utilize the gain of the existing translation memory sentences on model training, so that the translation model can adapt to different use scenes.
Fig. 10 is a schematic diagram of a front-end display interface of a translation model according to an embodiment of the present invention, where the translation model shown in this embodiment can process a sentence to be translated to generate a corresponding translation text. The target to be translated is the 'when me can find time, i want to camp on, and camps on the bonfire together with my friends' input by the user through a translation applet of the instant messaging client. "
And forming a corresponding translation text for a user to select and the selection probability of the translation result through the processing of the translation model.
Selecting translation results to form translation results corresponding to the sentences to be translated according to the selected probability of the translation results, wherein the translation results comprise the following three combinations:
1)“When I can find time,I want to go camping and light a bonfire with my friends”。
2)“When I find time,go camping and light a bonfire with your friends”。
3)“When you find time,go camping friends and light a campfire”。
therefore, by the translation model provided by the invention, a plurality of different translation results can be generated according to the same sentence to be translated.
The following describes the use process of the translation model provided by the embodiment of the invention with reference to a specific structure of the translation model, wherein, since the virtual object and the virtual scene in the game use english or japanese, domestic users often cannot understand the meaning of the virtual object and the virtual scene in time, so the foreign language meaning of the virtual object and the virtual scene can be obtained in time through the translation model, the trained translation model can translate the text information in the game scene of the japanese game server, and the translation model is of a transform structure.
With continuing reference to fig. 11, fig. 11 is an alternative schematic structural diagram of a translation model in an embodiment of the present invention, where the Encoder includes: n ═ 6 identical layers, each layer containing two sub-layers. The first sub-layer is a multi-head attention layer (multi-head attention layer) and then a simple fully connected layer. Each sub-layer is added with residual connection and normalization.
The Decoder includes: the Layer consists of N-6 identical layers, wherein the layers and the encoder are not identical, and the layers comprise three sub-layers, wherein one self-addressing Layer is arranged, and the encoder-decoder addressing Layer is finally a full connection Layer. Both the first two sub-layers are based on multi-head attention layers. Specifically, Nx on the left side represents the structure of one layer of the encoder, and the layer includes two sublayers, the first sublayer is a multi-head attention layer, and the second sublayer is a forward propagation layer. There is an association between the input and output of each sub-layer, with the output of the current sub-layer being an input data for the next sub-layer. Each sub-layer is followed by a normalization operation, which can increase the convergence speed of the model. The right Nx represents the structure of one layer of the decoder, the first sublayer is a multi-head attention sublayer controlled by a mask matrix and used for modeling generated target sentence vectors, and in the training process, the mask matrix is required to control, so that only the first t-1 words are calculated in multi-head attention calculation each time. The second sub-layer is a multi-head attention sub-layer, which is an attention mechanism between an encoder and a decoder, that is, relevant semantic information is searched in a source text, and the calculation of the layer uses a dot product mode. The third sublayer is a forward propagation sublayer, which is computed in the same way as the forward propagation sublayer in the encoder. There is also a relation between each sub-layer of the decoder, and the output of the current sub-layer is used as an input data of the next sub-layer. And each sub-layer of the decoder is also followed by a normalization operation to speed up model convergence.
With continuing reference to fig. 12, fig. 12 is a diagram illustrating an alternative translation process of the translation model in the embodiment of the present invention, in which both the encoder and decoder portions include 6 encoders and encoders. Inputs into the first encoder combine embedding and positional embedding. After passing 6 encoders, outputting to each decoder of the decoder part; the sentence to be translated is "the Japanese Bizhenzhizi , cloth は and った. "これから " gold "は people で dominate します. Character と う includes a person います. Through the processing of the translation model, an original translation statement' Tianxiang dragon Yun は No. います is stored in a translation memory. "the oath of the courage いは, the life or death よりも or だしい! Heart に of terrorist れないと embraces いて and is empty を べます. "; and the source end sentence "cang tian xiang longzhao yun say: "oath of courage, rather than death! Will not fear and soar over the sky! "; after a Chinese source end statement corresponding to a target translation memory statement which is also Japanese is obtained in a translation memory library, a translation model is trained, and finally, a translation result output through the translation model is as follows: "magic cloth without double: "from this moment, the battlefield is dominated by one person! Can be dared to war with me! "
With continuing reference to FIG. 13, FIG. 13 is an alternative block diagram of an encoder in a translation model according to an embodiment of the present invention, where its input consists of a query (Q) and a key (K) of dimension d and a value (V) of dimension d, all keys compute the dot product of the query and apply the softmax function to obtain the weight of the value.
With continued reference to FIG. 13, FIG. 13 is a vector diagram of an encoder in the translation model according to an embodiment of the present invention, wherein Q, K, and V are obtained by multiplying the vector x of the input encoder by W ^ Q, W ^ K, W ^ V to obtain Q, K, and V. W ^ Q, W ^ K, W ^ V are (512, 64) in the dimension of the article, then assume that the dimension that can input is (m, 512), where m represents the number of words. So the dimension of Q, K and V obtained after multiplying the input vector by WQ, WK, WV is (m, 64).
With continued reference to fig. 14, fig. 14 is a schematic diagram of vector splicing of an encoder in a translation model according to an embodiment of the present invention, where Z0 to Z7 are corresponding 8 parallel heads (dimension is (m, 64)), and then concat obtains (m, 512) dimension after the 8 heads. After the final multiplication with W ^ O, the output matrix with the dimension (m, 512) is obtained, and the dimension of the matrix is consistent with the dimension of entering the next encoder.
With continued reference to fig. 15, fig. 15 is a schematic diagram of the encoding process of the encoder in the translation model according to the embodiment of the present invention, wherein the tensor passing self-attack further needs to go through the residual error network and the latex Norm, and then goes into the fully connected feed-forward network, which needs to perform the same operations, and perform the residual error processing and normalization. The tensor which is finally output can enter the next encoder, then the iteration is carried out for 6 times, and the result of the iteration processing enters the decoder.
With continuing reference to fig. 16, fig. 16 is a schematic diagram of a decoding process of a decoder in the translation model according to an embodiment of the present invention, wherein the decoder inputs and outputs and the decoding process:
and (3) outputting: probability distribution of output words corresponding to the i position;
inputting: output of encoder & output of corresponding i-1 position decoder. So the middle atttion is not self-atttion, its K, V comes from encoder and Q comes from the output of the decoder at the last position.
Fig. 17 is a schematic diagram illustrating an output effect of a translation model in an embodiment of the present invention, and after the translation model is trained by using the translation model training method provided in the present application, user input in a game scene and foreign language contents displayed to a user can be translated, which is convenient for the user to understand and use.
In summary, the embodiments of the present invention have the following technical effects:
the technical scheme provided by the invention comprises the steps of obtaining a sentence to be translated, and obtaining at least two original translation memory sentences through contrast retrieval based on the sentence to be translated; performing translation memory fusion processing on the obtained at least two original translation memory sentences to obtain a target translation memory sentence; acquiring a corresponding source terminal statement from a translation memory library based on each target translation memory statement; forming training samples by the source statements and each target translation memory statement, and forming different training samples into a training sample set; acquiring initial parameters of a translation model; responding to initial parameters of the translation model, processing different training samples in the training sample set through the translation model, and determining updating parameters of the translation model; according to the updating parameters of the translation model, the encoder parameters and the decoder parameters of the translation model are updated iteratively through different training samples in the training sample set, so that translation memory sentences similar to the sentences to be translated are selected through comparison and retrieval, the problems of complex network structure, influence on training speed and overlong translation time in use, which are caused by extra memory networks in the related technology, can be reduced, meanwhile, aiming at the redundancy of the translation memory sentences, the similarity of different translation mechanisms can be captured through translation memory fusion by utilizing an attention mechanism, the diversity of the translation memory (namely the diversity of the training samples) is ensured, the accuracy of the trained translation model is higher, the translation effect is better, and the use experience of a user is improved; meanwhile, the gain of the existing translation memory sentences to model training can be effectively and fully utilized, so that the translation model can adapt to different use scenes.
The above description is only exemplary of the present invention and should not be taken as limiting the scope of the present invention, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (15)

1. A method for training a translation model, the method comprising:
acquiring a target translation memory statement;
obtaining a source terminal statement corresponding to the target translation memory statement in a translation memory library;
forming training samples by the source-end sentences and the target translation memory sentences, and forming training sample sets by different training samples;
acquiring initial parameters of a translation model;
responding to initial parameters of the translation model, processing different training samples in the training sample set through the translation model, and determining updating parameters of the translation model;
and according to the updating parameters of the translation model, iteratively updating the encoder parameters and the decoder parameters of the translation model through different training samples in the training sample set.
2. The method of claim 1, wherein obtaining the target translation memory statement comprises:
obtaining a sentence to be translated;
retrieving based on the sentence similarity between the sentence to be translated and the original translation memory sentence to obtain at least two original translation memory sentences matched with the sentence to be translated;
and performing translation memory fusion processing on the obtained at least two original translation memory sentences to obtain the target translation memory sentence.
3. The method according to claim 2, wherein the retrieving based on the sentence similarity between the sentence to be translated and the original translation memory sentence to obtain at least two original translation memory sentences matched with the sentence to be translated comprises:
acquiring the maximum length of the statement to be translated and the maximum length of any translation memory statement;
obtaining the word element distance between the sentence to be translated and any translation memory sentence,
determining the similarity between the sentence to be translated and any translation memory sentence based on the word element distance, the maximum length of the sentence to be translated and the maximum length of any translation memory sentence;
and when the similarity is greater than or equal to a similarity threshold value, determining that any translation memory statement is an original translation memory statement corresponding to the statement to be translated.
4. The method according to claim 1, wherein the performing translation memory fusion processing on the obtained at least two original translation memory statements to obtain a target translation memory statement comprises:
calculating an attention value corresponding to each translation memory statement through an attention function;
fusing the translation memory sentences with the same attention value into the same translation memory sentence; or
The translated memory statements of the same attention value are fused into different training samples in the subset of training samples.
5. The method of claim 1, further comprising:
determining a dynamic noise threshold value matched with the use environment of the translation model;
denoising the training sample set according to the dynamic noise threshold value to form a denoising training sample set matched with the dynamic noise threshold value; alternatively, the first and second liquid crystal display panels may be,
and determining a fixed noise threshold corresponding to the translation model, and denoising the training sample set according to the fixed noise threshold to form a denoising training sample set matched with the fixed noise threshold.
6. The method of claim 1, further comprising:
and carrying out negative example processing on the training sample set to form a negative example sample set corresponding to the training sample set, wherein the negative example sample set is used for adjusting the encoder parameters and the decoder parameters of the translation model.
7. The method of claim 6, wherein the negative case processing the set of training samples comprises:
determining a supervision function corresponding to the translation model;
adjusting a temperature coefficient of the supervisory function;
and carrying out negative example processing on the training sample set through the supervision function based on the vector similarity and different temperature coefficients of any two translation memory sentences in the training sample set to form a negative example sample set corresponding to the training sample set.
8. The method of claim 6, wherein the negative case processing the set of training samples comprises:
randomly combining statements to be output in a decoder of the translation model to form a negative sample set corresponding to the training sample set; alternatively, the first and second electrodes may be,
and carrying out random deletion processing or replacement processing on the sentences to be output in a decoder of the translation model to form a negative example sample set corresponding to the training sample set.
9. The method of claim 1, wherein the determining updated parameters of the translation model by processing different training samples in the set of training samples by the translation model in response to initial parameters of the translation model comprises:
substituting different training samples in the training sample set into a loss function corresponding to a self-coding network formed by an encoder and a decoder of the translation model;
and determining parameters corresponding to an encoder and corresponding decoder parameters in the translation model when the loss function meets the convergence condition as update parameters of the translation model.
10. A sentence translation method, the method comprising:
determining at least one word-level hidden variable corresponding to a sentence to be translated through an encoder of a translation model;
generating, by a decoder of the translation model, a translated term corresponding to the word-level hidden variable and a selected probability of the translated term according to the at least one word-level hidden variable;
selecting at least one translation word to form a translation result corresponding to the sentence to be translated according to the selection probability of the translation result;
outputting the translation result;
wherein the translation model is trained based on the method of any one of claims 1 to 9.
11. A training apparatus for a translation model, the training apparatus comprising:
the data transmission module is used for acquiring a target translation memory statement;
the translation model training module is used for acquiring a source end sentence corresponding to the target translation memory sentence from a translation memory library;
the translation model training module is used for forming training samples by the source-end sentences and the target translation memory sentences and forming training sample sets by different training samples;
the translation model training module is used for acquiring initial parameters of a translation model;
the translation model training module is used for responding to initial parameters of the translation model, processing different training samples in the training sample set through the translation model, and determining updating parameters of the translation model;
and the translation model training module is used for carrying out iterative updating on the encoder parameter and the decoder parameter of the translation model through different training samples in the training sample set according to the updating parameter of the translation model.
12. A sentence translation apparatus, characterized in that the apparatus comprises:
the encoder module is used for determining at least one word-level hidden variable corresponding to the sentence to be translated through an encoder of the translation model;
a decoder module, configured to generate, by a decoder of the translation model, a translated term corresponding to the hidden variable at the term level and a selected probability of the translated term according to the hidden variable at the term level;
the decoder module is used for selecting at least one translation word to form a translation result corresponding to the sentence to be translated according to the selection probability of the translation result;
and the decoder module is used for outputting the translation result.
13. An electronic device, characterized in that the electronic device comprises:
a memory for storing executable instructions;
a processor for implementing the method of training a translation model according to any one of claims 1 to 9 or implementing the method of sentence translation according to claim 10 when executing the executable instructions stored in the memory.
14. A computer program product comprising a computer program or instructions which, when executed by a processor, implement the method of training a translation model according to any one of claims 1 to 9 or implement the method of sentence translation according to claim 10.
15. A computer readable storage medium storing executable instructions which, when executed by a processor, implement the method for training a translation model according to any one of claims 1 to 9, or implement the method for sentence translation according to claim 10.
CN202210220466.7A 2022-03-08 2022-03-08 Translation model training method, sentence translation method, device, equipment and program Pending CN114757210A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210220466.7A CN114757210A (en) 2022-03-08 2022-03-08 Translation model training method, sentence translation method, device, equipment and program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210220466.7A CN114757210A (en) 2022-03-08 2022-03-08 Translation model training method, sentence translation method, device, equipment and program

Publications (1)

Publication Number Publication Date
CN114757210A true CN114757210A (en) 2022-07-15

Family

ID=82325860

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210220466.7A Pending CN114757210A (en) 2022-03-08 2022-03-08 Translation model training method, sentence translation method, device, equipment and program

Country Status (1)

Country Link
CN (1) CN114757210A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115860015A (en) * 2022-12-29 2023-03-28 北京中科智加科技有限公司 Translation memory-based transcribed text translation method and computer equipment
CN116992894A (en) * 2023-09-26 2023-11-03 北京澜舟科技有限公司 Training method of machine translation model and computer readable storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115860015A (en) * 2022-12-29 2023-03-28 北京中科智加科技有限公司 Translation memory-based transcribed text translation method and computer equipment
CN116992894A (en) * 2023-09-26 2023-11-03 北京澜舟科技有限公司 Training method of machine translation model and computer readable storage medium
CN116992894B (en) * 2023-09-26 2024-01-16 北京澜舟科技有限公司 Training method of machine translation model and computer readable storage medium

Similar Documents

Publication Publication Date Title
CN112487182B (en) Training method of text processing model, text processing method and device
CN112131366B (en) Method, device and storage medium for training text classification model and text classification
Alvarez-Melis et al. A causal framework for explaining the predictions of black-box sequence-to-sequence models
WO2022007823A1 (en) Text data processing method and device
CN109214006B (en) Natural language reasoning method for image enhanced hierarchical semantic representation
WO2023160472A1 (en) Model training method and related device
CN114565104A (en) Language model pre-training method, result recommendation method and related device
CN112214591B (en) Dialog prediction method and device
JP7335300B2 (en) Knowledge pre-trained model training method, apparatus and electronic equipment
CN110598224A (en) Translation model training method, text processing device and storage medium
CN114757210A (en) Translation model training method, sentence translation method, device, equipment and program
CN111488742B (en) Method and device for translation
Natarajan et al. Sentence2signgesture: a hybrid neural machine translation network for sign language video generation
Dewangan et al. Experience of neural machine translation between Indian languages
CN110705273A (en) Information processing method and device based on neural network, medium and electronic equipment
CN116432019A (en) Data processing method and related equipment
CN115221846A (en) Data processing method and related equipment
Yonglan et al. English-Chinese Machine Translation Model Based on Bidirectional Neural Network with Attention Mechanism.
CN114330483A (en) Data processing method, model training method, device, equipment and storage medium
CN113761883A (en) Text information identification method and device, electronic equipment and storage medium
CN115114939B (en) Training method of translation model, sentence translation method, sentence translation device, sentence translation equipment and sentence translation program
Lin et al. Chinese story generation of sentence format control based on multi-channel word embedding and novel data format
CN110083842B (en) Translation quality detection method, device, machine translation system and storage medium
CN115114937A (en) Text acquisition method and device, computer equipment and storage medium
CN115130461A (en) Text matching method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination