WO2023011260A1 - 翻译处理方法、装置、设备及介质 - Google Patents

翻译处理方法、装置、设备及介质 Download PDF

Info

Publication number
WO2023011260A1
WO2023011260A1 PCT/CN2022/107981 CN2022107981W WO2023011260A1 WO 2023011260 A1 WO2023011260 A1 WO 2023011260A1 CN 2022107981 W CN2022107981 W CN 2022107981W WO 2023011260 A1 WO2023011260 A1 WO 2023011260A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
multilingual
translation
representation
corpus
Prior art date
Application number
PCT/CN2022/107981
Other languages
English (en)
French (fr)
Inventor
孙泽维
王明轩
李磊
Original Assignee
北京有竹居网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京有竹居网络技术有限公司 filed Critical 北京有竹居网络技术有限公司
Publication of WO2023011260A1 publication Critical patent/WO2023011260A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Definitions

  • the present disclosure relates to the technical field of deep learning, and in particular to a translation processing method, device, device and medium.
  • the present disclosure provides a translation processing method, device, device and medium.
  • the present disclosure provides a translation processing method, the method comprising:
  • the target model is trained according to the bilingual corpora in the multiple languages to generate a second translation model, and the target information to be processed is translated according to the second translation model.
  • the multilingual representation model generated according to the monolingual corpus training of each language in multiple languages includes:
  • Obtaining the corpus in the monolingual corpus of each language includes vacant slot information, and obtaining the marked filling information corresponding to the vacant slot information;
  • the multilingual representation model is generated according to the monolingual corpus containing the vacant slot information and the corresponding corpus of the filling information, and the preset first loss function training model parameters of the preset model.
  • the generating a multilingual generation model according to the monolingual corpora of each language includes:
  • Obtaining the corpus in the monolingual corpus of each language contains the predicted next slot information, and obtaining the labeled prediction information corresponding to the predicted next slot information;
  • the multilingual generation model is generated according to the monolingual corpus containing the predicted next slot information, the corresponding corpus of the prediction information, and the preset second loss function training model parameters of the preset model.
  • the multilingual representation model includes one or more cascaded representation sublayers, wherein each representation sublayer includes: a self-attention layer connected to a feedforward network layer.
  • the multilingual generation model includes one or more generation sublayer cascades, wherein each generation sublayer includes: a self-attention layer connected to a feedforward network layer.
  • the first translation model includes an encoder, a decoder, and an output layer, wherein,
  • the encoder includes: one or more encoding sublayers cascaded, wherein each encoding sublayer includes: a self-attention layer connected to a feedforward network layer;
  • the decoder includes one or more decoding sublayer cascades, wherein each of the decoding sublayers includes: a self-attention layer is connected to a cross-language attention layer, and the cross-language attention layer is connected to a feedforward network,
  • the feed-forward network layer in the last encoding sublayer is connected with the cross-language attention layer in the last decoding sublayer.
  • the splicing of the multilingual representation model and the multilingual generation model with the first translation model to generate the target model to be trained includes:
  • the training of the target model according to the bilingual corpora in the multiple languages to generate the second translation model includes:
  • a second translation model is generated according to the model parameters of the multilingual representation model and the first translation model after training and the model parameters of the multilingual generation model before training.
  • the present disclosure provides a translation processing device, the device comprising:
  • the first generation module is used to generate a multilingual representation model according to the monolingual corpus training of each language in multiple languages, and generate a multilingual generation model according to the monolingual corpus of each language;
  • the second generation module is used to splice the multilingual representation model and the multilingual generation model with the first translation model respectively to generate a target model to be trained;
  • the third generating module is configured to train the target model to generate a second translation model according to the bilingual corpora in the multiple languages, and perform translation processing on the target information to be processed according to the second translation model.
  • the present disclosure provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the instructions are run on a terminal device, the terminal device is made to implement the above method.
  • the present disclosure provides an electronic device, which includes: a processor; a memory for storing instructions executable by the processor; the above executable instructions, and execute the instructions to implement the above method.
  • the present disclosure provides a computer program product, where the computer program product includes a computer program/instruction, and when the computer program/instruction is executed by a processor, the above method is implemented.
  • the model is trained using monolingual corpus in multiple languages, which enhances the ability of the model to process corpus in different languages, and the vectors generated by the trained multilingual representation model can more accurately extract sentences to be translated Features, more accurately represent the semantics of the sentences to be translated, the trained multilingual generation model can more accurately represent the vector representation of the translated sentences, extract the sentence features of the translated sentences, and thus more accurately predict the subsequent
  • the above two models are spliced with the first translation model and trained to obtain the second translation model.
  • the second translation model combines the multilingual representation model and the multilingual generation model to improve the accuracy of translation and thus improve the translation quality.
  • FIG. 1 is a schematic flowchart of a translation processing method provided by an embodiment of the present disclosure
  • FIG. 2 is a schematic flowchart of another translation processing method provided by an embodiment of the present disclosure
  • FIG. 3 is a schematic structural diagram of a multilingual representation model provided by an embodiment of the present disclosure.
  • FIG. 4 is a schematic structural diagram of a multilingual generation model provided by an embodiment of the present disclosure.
  • FIG. 5 is a schematic structural diagram of a first translation model provided by an embodiment of the present disclosure.
  • FIG. 6 is a schematic structural diagram of an object model provided by an embodiment of the present disclosure.
  • FIG. 7 is a schematic structural diagram of a translation processing device provided by an embodiment of the present disclosure.
  • FIG. 8 is a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.
  • the term “comprise” and its variations are open-ended, ie “including but not limited to”.
  • the term “based on” is “based at least in part on”.
  • the term “one embodiment” means “at least one embodiment”; the term “another embodiment” means “at least one further embodiment”; the term “some embodiments” means “at least some embodiments.” Relevant definitions of other terms will be given in the description below.
  • an embodiment of the present disclosure provides a translation processing method, which will be introduced in conjunction with specific embodiments below.
  • FIG. 1 is a schematic flow chart of a translation processing method provided by an embodiment of the present disclosure.
  • the method can be executed by a translation processing device, where the device can be implemented by software and/or hardware, and generally can be integrated into an electronic device. As shown in Figure 1, the method includes:
  • Step 101 generate a multilingual representation model according to the monolingual corpus training of each language in multiple languages, and generate a multilingual generation model according to the monolingual corpus of each language.
  • the training corpus in each embodiment of the present disclosure includes a monolingual corpus of each of the multiple languages.
  • a monolingual corpus refers to a corpus whose corpus is a single type of language.
  • the model is trained using monolingual corpora of each language in multiple languages, where each language can correspond to one or Multiple monolingual corpora, for example, if the multiple languages are English and Japanese, then the corresponding multiple monolingual corpora include English monolingual corpus and Japanese monolingual corpus with as rich vocabulary as possible.
  • the multilingual representation model and the multilingual generation model are trained using the monolingual corpora of each of the multiple languages. Therefore, the multilingual representation model trained by the monolingual corpus as rich as possible for each language can more accurately represent the vector representation of the sentence to be translated, extract the features of the sentence to be translated, and thus more accurately represent the semantics of the sentence to be translated. At the same time, the multilingual generation model trained by the monolingual corpus that is as rich as possible for each language can more accurately represent the vector representation of the translated sentence, extract the features of the translated sentence, and thus more accurately predict the subsequent translation. statement.
  • the multilingual representation model can be selected according to the application scenario, which is not limited in this embodiment, for example: multilingual Bidirectional Encoder Representations from Transformers (mBERT).
  • the multilingual generative model can be selected according to the application scenario, which is not limited in this embodiment, for example: multilingual generative pre-training (multilingual Generative Pre-Training, mGPT), generative confrontation network model.
  • Step 102 splicing the multilingual representation model and the multilingual generation model with the first translation model respectively to generate a target model to be trained.
  • the first translation model is a sequence-to-sequence model, and there are many kinds of first translation models, which can be selected and designed by those skilled in the art according to application scenarios, and this embodiment does not limit it.
  • the first translation model includes an encoder and a decoder, wherein the encoder is used to encode the sentence to be translated to obtain a vector; the decoder is used to decode the vector to obtain a translation result.
  • the first translation model can be used to translate the input corpus, but since the model training of the first translation model is usually limited by bilingual corpus with limited vocabulary, the translation accuracy of the first translation model still needs to be improved . Therefore, the multilingual representation model and the multilingual generation model trained by the monolingual corpus with a richer vocabulary than the bilingual corpus can be spliced with the first translation model to generate the target model to be trained to improve the model's subsequent translation processing. accuracy.
  • the encoder of the first translation model is spliced after the multilingual representation model
  • the second A decoder for a translation model is spliced after a multilingual generative model.
  • a second translation model is generated according to the bilingual corpus training target model in multiple languages, and the target information to be processed is translated according to the second translation model.
  • a bilingual corpus refers to a corpus that exists in two languages, and the corpora of the two languages are in a translation relationship, for example: a Chinese-Japanese bilingual corpus, a Chinese-English bilingual corpus.
  • a Chinese-English bilingual corpus For example, if the corpus is a Chinese-English bilingual corpus, and there is a Chinese corpus "I love you" in the Chinese-English bilingual corpus, the corresponding English corpus is "I love you”.
  • the target model is trained according to the bilingual corpus in multiple languages. Among the corresponding two corpora, one is used as the input of the target model, and the other is used as the output of the target model.
  • the translation processing method of the embodiment of the present disclosure uses monolingual corpus in multiple languages to train the model, which enhances the ability of the model to process corpus in different languages, and the vector generated by the trained multilingual representation model can be more accurate. Extract the features of the sentences to be translated more accurately, and express the semantics of the sentences to be translated more accurately. To more accurately predict the subsequent sentences to be translated, the above two models are spliced and trained with the first translation model to obtain the second translation model. On the basis of the translation ability of the first translation model, the second translation model combines the multilingual representation model and the multilingual generation model to improve the accuracy of translation and thus improve the translation quality.
  • a multilingual representation model and a multilingual generation model are generated by using monolingual corpus training, and a second translation model is generated by using multilingual corpus training.
  • Multiple corpora are used to train the model from the perspective of multiple languages, thereby improving the The translation capability of the second translation model.
  • FIG. 2 is a schematic flow diagram of another translation processing method provided by an embodiment of the present disclosure. Specifically, the present disclosure The translation processing method of the embodiment includes:
  • Step 201 acquiring the vacant slot information contained in the corpus in the monolingual corpus of each language, and acquiring the marked filling information corresponding to the vacant slot information.
  • the multilingual representation model in order to enable the multilingual representation model to accurately represent the sentences to be translated by vectors and extract the features of the sentences to be translated, so as to accurately represent the semantics of the sentences to be translated, the multilingual representation model is trained in combination with the context.
  • the model is trained using the corpus with vacant slot information and the corpus with corresponding filling information to generate a multilingual representation model.
  • vacancy processing is performed on the corpus in the monolingual corpus of each language.
  • the vacancy processing refers to vacating one or more words in the corpus.
  • the position of the one or more words in the corpus is not limited in this embodiment. .
  • the corpus with vacant slot information and the filling information corresponding to the vacant slot information are obtained, wherein the vacant slot information refers to the information recording the vacant position in the corpus, and the filling information refers to the The corpus obtained after filling the vacant positions.
  • the corpus before vacancy processing is "I ate buns at noon.”
  • the corpus containing vacant slot information can be "I* ate buns at noon” where "*" represents vacant slot information, then, with The filling information corresponding to the vacant slot information is "I ate buns for lunch”.
  • Step 202 according to the monolingual corpus containing vacant slot information and the corresponding corpus of filling information, and the preset first loss function to train the model parameters of the preset model to generate a multilingual representation model.
  • the first loss function is used to evaluate the difference between the predicted value of the model and the real value, so that the training result of the model converges.
  • the model parameters of the preset model are trained using the monolingual corpus containing the vacant slot information and the corresponding filling information corpus to generate a multilingual representation model.
  • the vector generated by the multilingual representation model combines the context, so the semantics represented by the vector are more appropriate to the real semantics to be translated.
  • the corpus for training the preset model may include multiple languages, such as Japanese and English.
  • the multilingual representation model includes one or more One represents a sub-level cascade. Specifically: if the multilingual representation model consists of one representation sublayer, then the one representation sublayer is the multilingual representation model; if the multilingual representation model consists of two representation sublayers, the two representation sublayers are respectively The first representation sublayer and the second representation sublayer, the multilingual representation model is obtained by cascading the two representation sublayers, specifically: the input of the multilingual representation model is the input of the first representation sublayer, and the first representation The output of the sublayer is used as the input of the second representation sublayer, and the output of the second representation sublayer is the output of the multilingual representation model; if the multilingual representation model contains multiple representation sublayers, the cascade of the multiple representation sublayers Similar to the above two cascading methods for representing sub-layers, details are not repeated
  • each representation sublayer includes: a self-attention layer (self-attention) and a feedforward network layer (feedforward neural network).
  • the number of self-attention layers is One
  • the number of feedforward network layers is one
  • the corresponding connection method is: the input of the representation sublayer is the input of the self-attention layer
  • the output of the self-attention layer is used as the input of the feedforward network layer
  • the feedforward network layer The output of is the output of the representation sublayer.
  • the multilingual representation model is mBERT.
  • the corpus in the monolingual corpus of each language contains the predicted next slot information, and the labeled prediction information corresponding to the predicted next slot information is obtained.
  • prediction processing is performed on the corpus in the monolingual corpus of each language, and the prediction processing refers to removing one or more words at the end of the corpus.
  • the slot of the word to be predicted is the slot information corresponding to the next word at the end of the corpus
  • the prediction information is the word corresponding to the slot information.
  • the corpus before the prediction process is "I finished lunch at noon and I am resting”
  • the corpus containing the predicted next slot information can be "I finished eating at noon *" where "*" indicates the next slot information
  • the prediction information corresponding to the next slot information is "lunch”.
  • Step 204 according to the monolingual corpus containing the predicted next slot information, the corpus of corresponding prediction information, and the preset second loss function to train the model parameters of the preset model to generate a multilingual generation model.
  • the second loss function is used to evaluate the difference between the predicted value of the model and the real value, so that the training result of the model converges.
  • the monolingual corpus containing the predicted next slot information and the corpus of corresponding prediction information to train the model parameters of the preset model
  • the trained language generation model can combine the existing corpus, That is, according to the above environment, predict the following and generate high-precision sentences.
  • the corpus for training the preset model can be in multiple languages, such as Japanese and English.
  • the multilingual generation model includes one or more Generate a sub-level cascade. If the multilingual generative model consists of one generative sublayer, then this one generative sublayer is the multilingual generative model; Layer, the multilingual generation model is obtained by cascading two generation sublayers, specifically: the input of the multilingual generation model is the input of the first generation sublayer, and the output of the first generation sublayer is used as the output of the second generation sublayer Input, the output of the second generation sublayer is the output of the multilingual generation model; if the multilingual generation model is composed of multiple generation sublayers, the cascade of the multiple generation sublayers and the cascade of the above two generation sublayers The method is similar and will not be repeated here.
  • each generation sublayer includes: a self-attention layer and a feedforward network layer.
  • the number of self-attention layers is one and the number of feedforward network layers is one
  • the corresponding connection method is: the input of the generation sublayer is the input of the self-attention layer, the output of the self-attention layer is used as the input of the feedforward network layer, and the output of the feedforward network layer is the output of the generation sublayer .
  • the multilingual generation model is mGPT.
  • Step 205 connect the feedforward network layer in the last representation sublayer in the multilingual representation model to the self-attention layer in the first encoding sublayer in the encoder, and connect the last generation sublayer in the multilingual generation model to
  • the feedforward network layer in the layer is connected to the self-attention layer of the first decoding sublayer in the decoder, and the multilingual generation model is connected to the output layer to generate the target model to be trained.
  • the first translation model includes: an encoder, a decoder, and an output layer, wherein:
  • the encoder is used for encoding, including: one or more encoding sub-layer cascades, each encoding sub-layer includes: a self-attention layer and a feed-forward network layer.
  • each encoding sub-layer includes: a self-attention layer and a feed-forward network layer.
  • the encoding sublayer is composed of a self-attention layer connected to a feedforward network layer.
  • the decoder is used for decoding, including: one or more decoding sub-layer cascades, each decoding sub-layer includes: self-attention layer and cross-language attention layer (cross attention), cross-language attention layer and feedforward network.
  • each decoding sub-layer includes: self-attention layer and cross-language attention layer (cross attention), cross-language attention layer and feedforward network.
  • cross attention cross-language attention layer
  • feedforward network composition a feedforward network composition
  • output layers in the first translation model which can be selected according to the application scenario, which is not limited in this embodiment, for example: sigmoid function and softmax function.
  • the output layer is a softmax function.
  • the multilingual representation model is connected to the encoder of the first translation model
  • the multilingual generation model is connected to the decoder of the first translation model, as shown in Figure 6, specifically:
  • Connect the encoder of the multilingual representation model and the first translation model i.e., connect the feedforward network layer in the last representation sublayer in the multilingual representation model with the self-attention layer in the first encoding sublayer in the encoder connect.
  • Connect the multilingual generative model to the decoder of the first translation model i.e., connect the feedforward network layer in the last generative sublayer in the multilingual generative model to the self-attention layer in the first decoding sublayer in the decoder .
  • the output of the multilingual generation model and the output of the decoder are used as the input of the output layer, and the output of the output layer is used as the input of the multilingual generation model, so as to connect the multilingual generation model and the output layer to generate the target to be trained Model.
  • Step 206 Train the model parameters of the multilingual representation model and the first translation model in the target model according to the bilingual corpora in multiple languages and the preset third loss function.
  • Step 207 Generate a second translation model according to the multilingual representation model after training, model parameters of the first translation model and model parameters of the multilingual generation model before training, and perform translation processing on the target information to be processed according to the second translation model.
  • the third loss function is used to evaluate the difference between the predicted value and the real value of the target model, so that the training result of the target model converges, and the third loss function can be selected according to the application scenario, which is not limited in this embodiment.
  • the multilingual representation model in the target model and the first translation model are trained, and the corpus used for training is a bilingual corpus in multiple languages. Therefore, under the constraints of the third loss function, during training, the corpus in the bilingual corpus can be used as the input of the target model, and the translation corpus corresponding to the corpus can be used as the output of the target model, that is, the multilingual representation model and the Parameters of the first translation model.
  • a second translation model is generated according to the multilingual representation model after training, the model parameters of the first translation model and the model parameters of the multilingual generation model before training, and then the target information to be processed is translated according to the second translation model.
  • the translation processing method of the embodiment of the present disclosure introduces in detail an optional technical solution for connecting the multilingual expression model, the first translation model and the multilingual processing model, and the second translation model generated according to the technical solution.
  • the accuracy of the generated language is higher, the translation quality is improved, and the accuracy of the generated translation sentence is further enhanced by using the model parameters of the untrained multilingual generation model in the target model in the second translation model.
  • FIG. 7 is a schematic structural diagram of a translation processing device provided by an embodiment of the present disclosure.
  • the device can be implemented by software and/or hardware, and generally can be integrated into an electronic device.
  • the translation processing device 700 includes:
  • the first generation module 701 is used to generate a multilingual representation model according to the monolingual corpus training of each language in multiple languages, and generate a multilingual generation model according to the monolingual corpus of each language;
  • the second generation module 702 is configured to splice the multilingual representation model and the multilingual generation model with the first translation model to generate a target model to be trained;
  • the third generating module 703 is configured to train the target model according to the bilingual corpus in the multiple languages to generate a second translation model, and perform translation processing on the target information to be processed according to the second translation model.
  • the first generating module 701 is configured to:
  • Obtaining the corpus in the monolingual corpus of each language includes vacant slot information, and obtaining the marked filling information corresponding to the vacant slot information;
  • the multilingual representation model is generated according to the monolingual corpus containing the vacant slot information and the corresponding corpus of the filling information, and the preset first loss function training model parameters of the preset model.
  • the first generating module 701 is configured to:
  • Obtaining the corpus in the monolingual corpus of each language contains the predicted next slot information, and obtaining the labeled prediction information corresponding to the predicted next slot information;
  • the multilingual generation model is generated according to the monolingual corpus containing the predicted next slot information, the corresponding corpus of the prediction information, and the preset second loss function training model parameters of the preset model.
  • the multilingual representation model includes one or more cascade representation sublayers, wherein each representation sublayer includes: a self-attention layer connected to a feedforward network layer.
  • the multilingual generation model includes one or more generation sublayer cascades, wherein each generation sublayer includes: a self-attention layer connected to a feedforward network layer.
  • the first translation model includes an encoder, a decoder and an output layer, wherein,
  • the encoder includes: one or more encoding sublayers cascaded, wherein each encoding sublayer includes: a self-attention layer connected to a feedforward network layer;
  • the decoder includes one or more decoding sublayer cascades, wherein each of the decoding sublayers includes: a self-attention layer connected to a cross-language attention layer, the cross-language attention layer connected to a feedforward network,
  • the feed-forward network layer in the last encoding sublayer is connected with the cross-language attention layer in the last decoding sublayer.
  • the second generation module 702 is configured to:
  • the third generating module 703 is configured to:
  • a second translation model is generated according to the model parameters of the multilingual representation model and the first translation model after training and the model parameters of the multilingual generation model before training.
  • the translation processing device provided in the embodiment of the present disclosure can execute the translation processing method provided in any embodiment of the present disclosure, and has corresponding functional modules and beneficial effects for executing the method.
  • An embodiment of the present disclosure further provides a computer program product, including a computer program/instruction, and when the computer program/instruction is executed by a processor, the translation processing method provided in any embodiment of the present disclosure is implemented.
  • FIG. 8 is a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.
  • FIG. 8 it shows a schematic structural diagram of an electronic device 800 suitable for implementing an embodiment of the present disclosure.
  • the electronic device 800 in the embodiment of the present disclosure may include, but is not limited to, mobile phones, notebook computers, digital broadcast receivers, PDAs (Personal Digital Assistants), PADs (Tablet Computers), PMPs (Portable Multimedia Players), vehicle-mounted terminals ( Mobile terminals such as car navigation terminals) and stationary terminals such as digital TVs, desktop computers and the like.
  • the electronic device shown in FIG. 8 is only an example, and should not limit the functions and scope of use of the embodiments of the present disclosure.
  • an electronic device 800 may include a processing device (such as a central processing unit, a graphics processing unit, etc.) Various appropriate actions and processes are executed by programs in the memory (RAM) 803 . In the RAM 803, various programs and data necessary for the operation of the electronic device 800 are also stored.
  • the processing device 801, ROM 802, and RAM 803 are connected to each other through a bus 804.
  • An input/output (I/O) interface 805 is also connected to the bus 804 .
  • the following devices can be connected to the I/O interface 805: input devices 806 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speaker, vibration an output device 807 such as a computer; a storage device 808 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 809.
  • the communication means 809 may allow the electronic device 800 to communicate with other devices wirelessly or by wire to exchange data. While FIG. 8 shows electronic device 800 having various means, it is to be understood that implementing or having all of the means shown is not a requirement. More or fewer means may alternatively be implemented or provided.
  • embodiments of the present disclosure include a computer program product, which includes a computer program carried on a non-transitory computer readable medium, where the computer program includes program code for executing the method shown in the flowchart.
  • the computer program may be downloaded and installed from a network via communication means 809, or from storage means 808, or from ROM 802.
  • the processing device 801 the above-mentioned functions defined in the translation processing method of the embodiment of the present disclosure are executed.
  • the computer-readable medium mentioned above in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two.
  • a computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can transmit, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device .
  • Program code embodied on a computer readable medium may be transmitted by any appropriate medium, including but not limited to wires, optical cables, RF (radio frequency), etc., or any suitable combination of the above.
  • the client and the server can communicate using any currently known or future network protocols such as HTTP (HyperText Transfer Protocol, Hypertext Transfer Protocol), and can communicate with digital data in any form or medium
  • HTTP HyperText Transfer Protocol
  • the communication eg, communication network
  • Examples of communication networks include local area networks (“LANs”), wide area networks (“WANs”), internetworks (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network of.
  • the above-mentioned computer-readable medium may be included in the above-mentioned electronic device, or may exist independently without being incorporated into the electronic device.
  • the above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device: generates a multilingual representation model according to the monolingual corpus training of each language in multiple languages, And generate a multilingual generation model according to the monolingual corpus of each language; splicing the multilingual representation model and the multilingual generation model with the first translation model respectively to generate a target model to be trained; training the target model according to the bilingual corpus in multiple languages A second translation model is generated, and the target information to be processed is translated according to the second translation model.
  • the multilingual representation model can more accurately extract features of sentences to be translated, and the multilingual generation model can more accurately predict subsequent sentences to be translated. Therefore, since the second translation model combines the multilingual representation model and the multilingual generation model, the accuracy of translation is improved, thereby improving the translation quality.
  • Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, or combinations thereof, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and Included are conventional procedural programming languages - such as the "C" language or similar programming languages.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as through an Internet service provider). Internet connection).
  • LAN local area network
  • WAN wide area network
  • Internet service provider such as AT&T, MCI, Sprint, EarthLink, MSN, GTE, etc.
  • each block in a flowchart or block diagram may represent a module, program segment, or portion of code that contains one or more logical functions for implementing specified executable instructions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented by a dedicated hardware-based system that performs the specified functions or operations , or may be implemented by a combination of dedicated hardware and computer instructions.
  • the units involved in the embodiments described in the present disclosure may be implemented by software or by hardware. Wherein, the name of a unit does not constitute a limitation of the unit itself under certain circumstances.
  • FPGAs Field Programmable Gate Arrays
  • ASICs Application Specific Integrated Circuits
  • ASSPs Application Specific Standard Products
  • SOCs System on Chips
  • CPLD Complex Programmable Logical device
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device.
  • a machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • a machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM compact disk read only memory
  • magnetic storage or any suitable combination of the foregoing.
  • the present disclosure provides a translation processing method, including:
  • the target model is trained according to the bilingual corpus in the multiple languages to generate a second translation model, and the target information to be processed is translated according to the second translation model.
  • the generating a multilingual representation model according to the monolingual corpus training of each language in multiple languages includes:
  • Obtaining the corpus in the monolingual corpus of each language includes vacant slot information, and obtaining the marked filling information corresponding to the vacant slot information;
  • the multilingual representation model is generated according to the monolingual corpus containing the vacant slot information and the corresponding corpus of the filling information, and the preset first loss function training model parameters of the preset model.
  • the generating a multilingual generation model based on the monolingual corpus of each language includes:
  • Obtaining the corpus in the monolingual corpus of each language contains the predicted next slot information, and obtaining the labeled prediction information corresponding to the predicted next slot information;
  • the multilingual generation model is generated according to the monolingual corpus containing the predicted next slot information, the corresponding corpus of the prediction information, and the preset second loss function training model parameters of the preset model.
  • the multilingual representation model includes one or more cascade representation sublayers, wherein each representation sublayer includes: The attention layer is connected with the feed-forward network layer.
  • the multilingual generation model includes one or more generation sub-layer cascades, wherein each generation sub-layer includes: The attention layer is connected with the feed-forward network layer.
  • the first translation model includes an encoder, a decoder, and an output layer, wherein,
  • the encoder includes: one or more encoding sublayers cascaded, wherein each encoding sublayer includes: a self-attention layer connected to a feedforward network layer;
  • the decoder includes one or more decoding sublayer cascades, wherein each of the decoding sublayers includes: a self-attention layer is connected to a cross-language attention layer, and the cross-language attention layer is connected to a feedforward network,
  • the feed-forward network layer in the last encoding sublayer is connected with the cross-language attention layer in the last decoding sublayer.
  • the multilingual representation model and the multilingual generation model are respectively spliced with the first translation model to generate a Target model, including:
  • the training of the target model according to the bilingual corpora in the multiple languages to generate the second translation model includes:
  • a second translation model is generated according to the model parameters of the multilingual representation model and the first translation model after training and the model parameters of the multilingual generation model before training.
  • the present disclosure provides a translation processing device, including:
  • the first generation module is used to generate a multilingual representation model according to the monolingual corpus training of each language in multiple languages, and generate a multilingual generation model according to the monolingual corpus of each language;
  • the second generation module is used to splice the multilingual representation model and the multilingual generation model with the first translation model respectively to generate a target model to be trained;
  • the third generation module is used to train the target model according to the bilingual corpus in the multiple languages to generate a second translation model, and perform translation processing on the target information to be processed according to the second translation model.
  • the first generating module is configured to:
  • Obtaining the corpus in the monolingual corpus of each language includes vacant slot information, and obtaining the marked filling information corresponding to the vacant slot information;
  • the multilingual representation model is generated according to the monolingual corpus containing the vacant slot information and the corresponding corpus of the filling information, and the preset first loss function training model parameters of the preset model.
  • the first generation module is configured to:
  • Obtaining the corpus in the monolingual corpus of each language contains the predicted next slot information, and obtaining the labeled prediction information corresponding to the predicted next slot information;
  • the multilingual generation model is generated according to the monolingual corpus containing the predicted next slot information, the corresponding corpus of the prediction information, and the preset second loss function training model parameters of the preset model.
  • the multilingual representation model includes one or more cascade representation sublayers, wherein each representation sublayer includes: self-attention layer is connected with the feed-forward network layer.
  • the multilingual generation model includes one or more generation sublayer cascades, wherein each generation sublayer includes: self-attention layer is connected with the feed-forward network layer.
  • the first translation model includes an encoder, a decoder, and an output layer, wherein,
  • the encoder includes: one or more encoding sublayers cascaded, wherein each encoding sublayer includes: a self-attention layer connected to a feedforward network layer;
  • the decoder includes one or more decoding sublayer cascades, wherein each of the decoding sublayers includes: a self-attention layer is connected to a cross-language attention layer, and the cross-language attention layer is connected to a feedforward network,
  • the feed-forward network layer in the last encoding sublayer is connected with the cross-language attention layer in the last decoding sublayer.
  • the second generating module is configured to:
  • the third generation module is configured to:
  • a second translation model is generated according to the model parameters of the multilingual representation model and the first translation model after training and the model parameters of the multilingual generation model before training.
  • the present disclosure provides an electronic device, including:
  • the processor is configured to read the executable instruction from the memory, and execute the instruction to implement any translation processing method provided in the present disclosure.
  • the present disclosure provides a computer-readable storage medium, the storage medium stores a computer program, and the computer program is used to perform any translation as provided in the present disclosure Approach.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

提供了一种翻译处理方法、装置、设备及介质,其中该方法包括:根据多种语言中各语言的单语语料库训练生成多语言表示模型,以及根据各语言的单语语料库生成多语言生成模型(101);将多语言表示模型和多语言生成模型分别与第一翻译模型拼接,生成待训练的目标模型(102);根据多种语言中的双语语料库训练目标模型生成第二翻译模型,根据第二翻译模型对待处理的目标信息进行翻译处理(103)。

Description

翻译处理方法、装置、设备及介质
相关申请的交叉引用
本申请基于申请号为202110888353.X、申请日为2021年08月03日,名称为“翻译处理方法、装置、设备及介质”的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本公开涉及深度学习技术领域,尤其涉及一种翻译处理方法、装置、设备及介质。
背景技术
随着深度学习技术的发展,可以通过深度学习技术,使用翻译模型进行自然语言的翻译,然而,现有的翻译模型生成的语句在精确性方面仍存在不足。
因而,提升翻译模型生成的翻译语句的精准性,从而提升翻译质量,是目前亟待解决的问题。
发明内容
为了解决上述技术问题或者至少部分地解决上述技术问题,本公开提供了一种翻译处理方法、装置、设备及介质。
第一方面,本公开提供了一种翻译处理方法,所述方法包括:
根据多种语言中各语言的单语语料库训练生成多语言表示模型,以及根据所述各语言的单语语料库生成多语言生成模型;
将所述多语言表示模型和所述多语言生成模型分别与第一翻译模 型拼接,生成待训练的目标模型;
根据所述多种语言中的双语语料库训练所述目标模型生成第二翻译模型,根据所述第二翻译模型对待处理的目标信息进行翻译处理。
一种可选的实施方式中,所述根据多种语言中各语言的单语语料库训练生成多语言表示模型,包括:
获取所述各语言的单语语料库中的语料包含空缺槽位信息,并获取标注的与所述空缺槽位信息对应的填补信息;
根据所述包含空缺槽位信息的单语语料和对应的所述填补信息的语料,以及预设的第一损失函数训练预设模型的模型参数生成所述多语言表示模型。
一种可选的实施方式中,所述根据所述各语言的单语语料库生成多语言生成模型,包括:
获取所述各语言的单语语料库中的语料包含预测的下一个槽位信息,并获取标注的与所述预测的下一个槽位信息对应的预测信息;
根据所述包含所述预测的下一个槽位信息的单语语料和对应的所述预测信息的语料,以及预设的第二损失函数训练预设模型的模型参数生成所述多语言生成模型。
一种可选的实施方式中,所述多语言表示模型包含一个或者多个表示子层级联,其中,每个所述表示子层包括:自注意力层与前馈网络层连接。
一种可选的实施方式中,所述多语言生成模型包含一个或者多个生成子层级联,其中,每个所述生成子层包括:自注意力层与前馈网络层连接。
一种可选的实施方式中,所述第一翻译模型包括编码器、解码器和输出层,其中,
所述编码器包括:一个或者多个编码子层级联,其中,每个所述编码子层包括:自注意力层与前馈网络层连接;
所述解码器包括一个或者多个解码子层级联,其中,每个所述解 码子层包括:自注意力层与跨语言注意力层连接,所述跨语言注意力层和前馈网络连接,
其中,最后一个所述编码子层中的前馈网络层与最后一个所述解码子层中的跨语言注意力层连接。
一种可选的实施方式中,所述将所述多语言表示模型和所述多语言生成模型分别与第一翻译模型拼接,生成待训练的目标模型,包括:
将所述多语言表示模型中最后一个所述表示子层中的前馈网络层与所述编码器中第一个所述编码子层中的自注意力层连接,以及
将所述多语言生成模型中最后一个所述生成子层中的前馈网络层与所述解码器中第一个所述解码子层的自注意力层连接,以及将所述多语言生成模型与所述输出层连接,生成待训练的目标模型。
一种可选的实施方式中,所述根据所述多种语言中的双语语料库训练所述目标模型生成第二翻译模型,包括:
根据所述多种语言中的双语语料库以及预设的第三损失函数训练所述目标模型中所述多语言表示模型和所述第一翻译模型的模型参数;
根据训练后所述多语言表示模型和所述第一翻译模型的模型参数和训练前所述多语言生成模型的模型参数生成第二翻译模型。
第二方面,本公开提供了一种翻译处理装置,所述装置包括:
第一生成模块,用于根据多种语言中各语言的单语语料库训练生成多语言表示模型,以及根据所述各语言的单语语料库生成多语言生成模型;
第二生成模块,用于将所述多语言表示模型和所述多语言生成模型分别与第一翻译模型拼接,生成待训练的目标模型;
第三生成模块,用于根据所述多种语言中的双语语料库训练所述目标模型生成第二翻译模型,根据所述第二翻译模型对待处理的目标信息进行翻译处理。
第三方面,本公开提供了一种计算机可读存储介质,所述计算机 可读存储介质中存储有指令,当所述指令在终端设备上运行时,使得所述终端设备实现上述的方法。
第四方面,本公开提供了一种电子设备,所述电子设备包括:处理器;用于存储所述处理器可执行指令的存储器;所述处理器,用于从所述存储器中读取所述可执行指令,并执行所述指令以实现上述的方法。
第五方面,本公开提供了一种计算机程序产品,所述计算机程序产品包括计算机程序/指令,所述计算机程序/指令被处理器执行时实现上述的方法。
本公开实施例提供的技术方案与现有技术相比具有如下优点:
本公开实施例中,使用多种语言的单语语料对模型进行训练,增强了模型处理不同语种的语料的能力,经过训练后的多语言表示模型生成的向量可以更精准地提取待翻译的语句特征,更准确表示待翻译语句的语义,经过训练后的多语言生成模型能够更准确地对已经翻译出来的语句进行向量表示,提取已经翻译出来的语句的语句特征,从而更准确的预测后续待翻译的语句,将上述两个模型与第一翻译模型进行拼接、训练得到第二翻译模型。第二翻译模型在具备第一翻译模型的翻译能力基础上,由于拼接了多语言表示模型和多语言生成模型,提升了翻译的精准性,从而提高了翻译质量。
附图说明
结合附图并参考以下具体实施方式,本公开各实施例的上述和其他特征、优点及方面将变得更加明显。贯穿附图中,相同或相似的附图标记表示相同或相似的元素。应当理解附图是示意性的,原件和元素不一定按照比例绘制。
图1为本公开实施例提供的一种翻译处理方法的流程示意图;
图2为本公开实施例提供的另一种翻译处理方法的流程示意图;
图3为本公开实施例提供的一种多语言表示模型的结构示意图;
图4为本公开实施例提供的一种多语言生成模型的结构示意图;
图5为本公开实施例提供的一种第一翻译模型的结构示意图;
图6为本公开实施例提供的一种目标模型的结构示意图;
图7为本公开实施例提供的一种翻译处理装置的结构示意图;
图8为本公开实施例提供的一种电子设备的结构示意图。
具体实施方式
下面将参照附图更详细地描述本公开的实施例。虽然附图中显示了本公开的某些实施例,然而应当理解的是,本公开可以通过各种形式来实现,而且不应该被解释为限于这里阐述的实施例,相反提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是,本公开的附图及实施例仅用于示例性作用,并非用于限制本公开的保护范围。
应当理解,本公开的方法实施方式中记载的各个步骤可以按照不同的顺序执行,和/或并行执行。此外,方法实施方式可以包括附加的步骤和/或省略执行示出的步骤。本公开的范围在此方面不受限制。
本文使用的术语“包括”及其变形是开放性包括,即“包括但不限于”。术语“基于”是“至少部分地基于”。术语“一个实施例”表示“至少一个实施例”;术语“另一实施例”表示“至少一个另外的实施例”;术语“一些实施例”表示“至少一些实施例”。其他术语的相关定义将在下文描述中给出。
需要注意,本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分,并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。
需要注意,本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的,本领域技术人员应当理解,除非在上下文另有明确指出,否则应该理解为“一个或多个”。
本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的,而并不是用于对这些消息或信息的范围进行限制。
为了解决上述问题,本公开实施例提供了一种翻译处理方法,下面结合具体的实施例对该方法进行介绍。
图1为本公开实施例提供的一种翻译处理方法的流程示意图,该方法可以由翻译处理装置执行,其中该装置可以采用软件和/或硬件实现,一般可集成在电子设备中。如图1所示,该方法包括:
步骤101,根据多种语言中各语言的单语语料库训练生成多语言表示模型,以及根据各语言的单语语料库生成多语言生成模型。
为了使训练的翻译模型能够对各种语言中尽可能多的词汇进行更为准确的翻译,在本公开各实施例中的训练语料包括多种语言中各语言的单语语料库。其中,单语语料库是指语料为单一种类语言的语料库,为了使模型具有处理多种语言的能力,使用多种语言中各语言的单语语料库对模型进行训练,其中每种语言可以对应一个或多个单语语料库,举例而言,若多种语言为英语和日语,则对应的多个单语语料库包括词汇尽可能丰富的英语单语语料库和日语单语语料库。
进而,采用该多种语言中各语言的单语语料库训练多语言表示模型和多语言生成模型。因此,通过每种语言尽可能丰富的单语语料库训练得到的多语言表示模型可以更准确地对待翻译的语句进行向量表示,提取待翻译的语句特征,从而更准确地表示待翻译语句的语义。同时,通过每种语言尽可能丰富的单语语料库训练得到的多语言生成模型可以更准确地对已经翻译出的语句进行向量表示,提取已经翻译出的语句特征,从而更准确地预测后续待翻译的语句。因此,基于尽可能丰富的单语语料库训练得到的多语言表示模型和多语言生成模型对翻译过程中的词汇进行更准确的向量转化,提取更加准确的语义特征,使得翻译结果越准确。需要说明的是,本实施例中,多语言表示 模型可以根据应用场景进行选择,本实施例不作限制,例如:多语言基于转换器的双向编码表征(multilingual Bidirectional Encoder Representations from Transformers,mBERT)。本实施例中,多语言生成模型可以根据应用场景进行选择,本实施例不作限制,例如:多语言生成式预训练(multilingual Generative Pre-Training,mGPT)、生成对抗网络模型。
步骤102,将多语言表示模型和多语言生成模型分别与第一翻译模型拼接,生成待训练的目标模型。
本实施例中,第一翻译模型为序列到序列模型,第一翻译模型有多种,本领域技术人员可以根据应用场景进行选择和设计,本实施例不作限制。第一翻译模型包括编码器和解码器,其中编码器用于对待翻译语句进行编码处理,得到向量;解码器用于对该向量进行解码处理,解码得到翻译的结果。
可以理解地,能够使用第一翻译模型对输入语料进行翻译处理,但是由于第一翻译模型的模型训练通常受到词汇量比较有限的双语语料的限制,所以第一翻译模型的翻译精准程度仍需提升。因此,可以将通过词汇量比双语语料丰富的单语语料训练出来的多语言表示模型和多语言生成模型分别与第一翻译模型进行拼接,生成待训练的目标模型,以提高模型后续进行翻译处理的准确性。
多语言表示模型和多语言生成模型分别与第一翻译模型的拼接方式有多种,本实施例不做限制,例如:将第一翻译模型的编码器拼接在多语言表示模型之后,并且将第一翻译模型的解码器拼接在多语言生成模型之后。需要说明的是,结构不同的多语言表示模型、多语言生成模型和第一翻译模型的拼接方法不同,并且不同拼接方法生成的目标模型的性能不同,本领域技术人员可以根据模型结构以及应用场景对拼接方法进行选择,此处不再赘述。
步骤103,根据多种语言中的双语语料库训练目标模型生成第二翻译模型,根据第二翻译模型对待处理的目标信息进行翻译处理。
为了训练目标模型使其具备对多语种进行翻译的能力,在本公开各实施例中,使用多种语言的双语语料库训练目标模型。其中,双语语料库是指存在两个语种的语料库,并且该两个语种的语料互为翻译关系,例如:中日双语语料库、中英双语语料库。举例而言,若语料库为中英双语语料库,该中英双语语料库中存在中文语料“我爱你”,则对应的英文语料为“I love you”。
进而,根据多种语言中的双语语料库对目标模型训练,对应的两个语料中,一个作为目标模型的输入,另一个作为目标模型的输出,训练生成的第二翻译模型具有对多种语言的语料进行翻译处理的能力,因而可以使用第二翻译模型对待处理的目标信息进行翻译处理。
综上,本公开实施例的翻译处理方法,使用多种语言的单语语料对模型进行训练,增强了模型处理不同语种的语料的能力,经过训练后的多语言表示模型生成的向量可以更精准地提取待翻译的语句特征,更准确表示待翻译语句的语义,经过训练后的多语言生成模型能够更准确地对已经翻译出来的语句进行向量表示,提取已经翻译出来的语句的语句特征,从而更准确的预测后续待翻译的语句,将上述两个模型与第一翻译模型进行拼接、训练得到第二翻译模型。第二翻译模型在具备第一翻译模型的翻译能力基础上,由于拼接了多语言表示模型和多语言生成模型,提升了翻译的精准性,从而提高了翻译质量。
并且,使用单语语料库训练生成多语言表示模型和多语言生成模型,使用多语语料库训练生成第二翻译模型,利用了多种语料库,分别从多种语言的角度对模型进行训练,从而提升了第二翻译模型的翻译能力。
基于上述实施例,为了更加清楚的说明如何根据多语言的单语语料训练生成多语言表示模型和多语言生成模型,以及如何根据多语言表示模型和多语言生成模型对第一翻译模型进行拼接处理后,训练生成具备更准备翻译能力的第二翻译模型,通过图2所示实施例具体说 明如下,图2为本公开实施例提供的另一种翻译处理方法的流程示意图,具体地,本公开实施例的翻译处理方法包括:
步骤201,获取各语言的单语语料库中的语料包含空缺槽位信息,并获取标注的与空缺槽位信息对应的填补信息。
本实施例中,为了使多语言表示模型能够准确地对待提取翻译语句进行向量表示,提取待翻译语句的特征,从而准确表示待翻译语句的语义,结合上下文语境对多语言表示模型进行训练。一种可选的实施方式中,使用带有空缺槽位信息的语料及其对应的填补信息的语料对模型进行训练,生成多语言表示模型。
本实施方式中,对各语言的单语语料库中的语料进行空缺处理,空缺处理是指将语料中的一个或多个词语空缺,该一个或多个词在语料中的位置本实施例不作限制。将语料进行空缺处理之后,获得带有空缺槽位信息的语料和该空缺槽位信息对应的填补信息,其中,空缺槽位信息是指记录该语料中空缺位置的信息,填补信息是指将该空缺位置进行填补处理之后得到的语料。举例而言,空缺处理前的语料为“我中午吃了包子。”,那么包含空缺槽位信息的语料可以为“我中午*了包子”其中,“*”表示空缺槽位信息,那么,与该空缺槽位信息对应的填补信息为“我中午吃了包子”。
步骤202,根据包含空缺槽位信息的单语语料和对应的填补信息的语料,以及预设的第一损失函数训练预设模型的模型参数生成多语言表示模型。
本实施方式中,使用第一损失函数评价模型的预测值和真实值的差异,使模型的训练结果收敛。在第一损失函数的约束下,使用包含空缺槽位信息的单语语料和对应的填补信息的语料训练预设模型的模型参数,生成多语言表示模型。可以理解地,通过该多语言表示模型生成的向量结合了上下文语境,因而该向量所表示的语义与待翻译的真实语义更加贴切。需要说明的是,对预设模型进行训练的语料可以包括多个语种,例如:日语和英语。
需要说明的是,该多语言表示模型的结构有多种,本领域技术人员可以根据应用场景进行选择,一种可选的实施方式中,如图3所示,多语言表示模型包含一个或者多个表示子层级联。具体地:若多语言表示模型由一个表示子层组成,则该一个表示子层即为多语言表示模型;若多语言表示模型由两个表示子层组成,该两个表示子层分别为第一表示子层和第二表示子层,该多语言表示模型由两个表示子层进行级联获得,具体地:多语言表示模型的输入即为第一表示子层的输入,将第一表示子层的输出作为第二表示子层的输入,第二表示子层的输出即为多语言表示模型的输出;若多语言表示模型包含多个表示子层,该多个表示子层的级联与上述两个表示子层的级联方法类似,此处不再赘述。
上述多语言表示模型中,每个表示子层包括:自注意力层(self-attention)与前馈网络层(feedforward neural network),一种可选的实施方式中,自注意力层的数目为一个、前馈网络层的数目为一个,对应的连接方法为:表示子层的输入即为自注意力层的输入,将自注意力层的输出作为前馈网络层的输入,前馈网络层的输出即为表示子层的输出。可选地,该多语言表示模型为mBERT。
步骤203,获取各语言的单语语料库中的语料包含预测的下一个槽位信息,并获取标注的与预测的下一个槽位信息对应的预测信息。
为了使第二翻译模型生成的针对已经翻译出的语句的向量表示语句更准确,从而能够更准确预测后续翻译的语句,使用包含预测的下一个槽位信息及其对应的预测信息对预设模型进行训练。
本实施例中,对各语言的单语语料库中的语料进行预测处理,预测处理指的是将语料末尾的一个或多个词语去除。语料进行预测处理之后,需要进行预测的词的槽位为语料末尾的下一个词对应的槽位信息,预测信息为该槽位信息对应的词语。举例而言,预测处理前的语料为“我中午吃完午饭在休息”,那么包含预测的下一个槽位信息的语料可以为“我中午吃完*”其中,“*”表示下一个槽位信息,那么,与 该下一个槽位信息对应的预测信息为“午饭”。
步骤204,根据包含预测的下一个槽位信息的单语语料和对应的预测信息的语料,以及预设的第二损失函数训练预设模型的模型参数生成多语言生成模型。
本实施例中,使用第二损失函数评价模型的预测值和真实值的差异,使模型的训练结果收敛。在第二损失函数的约束下,使用包含预测的下一个槽位信息的单语语料和对应的预测信息的语料训练预设模型的模型参数,训练得到的语言生成模型能够结合已有的语料,即根据上文环境,对下文进行预测,生成精准性高的语句。需要说明的是,对预设模型进行训练的语料可以为多个语种的,例如:日语和英语。
需要说明的是,该多语言生成模型的结构有多种,本领域技术人员可以根据应用场景进行选择,一种可选的实施方式中,如图4所示,多语言生成模型包含一个或者多个生成子层级联。若多语言生成模型由一个生成子层组成,则该一个生成子层即为多语言生成模型;若多语言生成模型由两个生成子层组成,分别为第一生成子层和第二生成子层,该多语言生成模型由两个生成子层级联获得,具体地:多语言生成模型的输入即为第一生成子层的输入,将第一生成子层的输出作为第二生成子层的输入,第二生成子层的输出即为多语言生成模型的输出;若多语言生成模型由多个生成子层组成,该多个生成子层的级联与上述两个生成子层的级联方法类似,此处不再赘述。
上述多语言生成模型中,每个生成子层包括:自注意力层与前馈网络层,一种可选的实施方式中,自注意力层的数目为一个、前馈网络层的数目为一个,对应的连接方法为:生成子层的输入为该自注意力层的输入,将自注意力层的输出作为前馈网络层的输入,前馈网络层的输出即为该生成子层的输出。可选地,该多语言生成模型为mGPT。
步骤205,将多语言表示模型中最后一个表示子层中的前馈网络层与编码器中第一个编码子层中的自注意力层连接,以及,将多语言生成模型中最后一个生成子层中的前馈网络层与解码器中第一个解码子 层的自注意力层连接,以及将多语言生成模型与输出层连接,生成待训练的目标模型。
本实施例中,第一翻译模型包括:编码器、解码器和输出层,其中:
编码器用于编码,包括:一个或者多个编码子层级联,每个编码子层包括:自注意力层与前馈网络层。一种可选的实施方式中,如图5中编码子层所示,编码子层由一个自注意力层和一个前馈网络层连接组成。
解码器用于解码,包括:一个或者多个解码子层级联,每个解码子层包括:自注意力层与跨语言注意力层(cross attention),跨语言注意力层和前馈网络。一种可选的实施方式中,如图5中解码子层所示,解码子层由一个自注意力层与一个跨语言注意力层连接、该一个跨语言注意力层和一个前馈网络连接组成。需要说明的是,由于解码器中包括了跨语言注意力层,因而可以完成对源端信息的捕捉。
第一翻译模型中的输出层有多种,可以根据应用场景进行选择,本实施例不作限制,例如:sigmoid函数、softmax函数。一种可选的实施方式中,输出层为softmax函数。
需要说明的是,编码器和解码进行连接时,如图5所示,最后一个编码子层中的前馈网络层与最后一个解码子层中的跨语言注意力层连接。
本示例中,将多语言表示模型和第一翻译模型的编码器进行连接,将多语言生成模型和第一翻译模型的解码器进行连接,如图6所示,具体地:
将多语言表示模型和第一翻译模型的编码器连接,即,将多语言表示模型中最后一个表示子层中的前馈网络层与编码器中第一个编码子层中的自注意力层连接。将多语言生成模型和第一翻译模型的解码器连接,即,将多语言生成模型中最后一个生成子层中的前馈网络层与解码器中第一个解码子层的自注意力层连接。以及,将多语言生成 模型的输出与解码器的输出共同作为输出层的输入,将输出层的输出作为多语言生成模型的输入,从而将多语言生成模型和输出层连接,生成待训练的目标模型。
步骤206,根据多种语言中的双语语料库以及预设的第三损失函数训练目标模型中多语言表示模型和第一翻译模型的模型参数。
步骤207,根据训练后多语言表示模型和第一翻译模型的模型参数和训练前多语言生成模型的模型参数生成第二翻译模型,根据第二翻译模型对待处理的目标信息进行翻译处理。
本实施例中,使用第三损失函数评价目标模型的预测值和真实值的差异,使目标模型的训练结果收敛,并且第三损失函数可以根据应用场景进行选择,本实施例不做限制。在本实施例中,为了提升对已翻译语句的向量表示的准确程度,对目标模型中的多语言表示模型和第一翻译模型进行训练,训练采用的语料为多种语言中的双语语料库。因而,可以在第三损失函数的约束下,在训练时,将双语语料库中的语料作为目标模型的输入,该语料对应的翻译语料作为目标模型的输出,即训练目标模型中多语言表示模型和第一翻译模型的参数。然后,根据训练后多语言表示模型和第一翻译模型的模型参数和训练前多语言生成模型的模型参数生成第二翻译模型,进而根据第二翻译模型对待处理的目标信息进行翻译处理。
综上,本公开实施例的翻译处理方法,详细介绍了一种可选的将多语言表达模型、第一翻译模型以及多语言处理模型连接的技术方案,根据该技术方案生成的第二翻译模型生成语言的精准度更高,翻译质量得到提升,并且通过在第二翻译模型中采用目标模型中未经训练的多语言生成模型的模型参数,进一步增强了生成的翻译语句的准确程度。
图7为本公开实施例提供的一种翻译处理装置的结构示意图,该装置可由软件和/或硬件实现,一般可集成在电子设备中。如图7所示, 该翻译处理装置700,包括:
第一生成模块701,用于根据多种语言中各语言的单语语料库训练生成多语言表示模型,以及根据所述各语言的单语语料库生成多语言生成模型;
第二生成模块702,用于将所述多语言表示模型和所述多语言生成模型分别与第一翻译模型拼接,生成待训练的目标模型;
第三生成模块703,用于根据所述多种语言中的双语语料库训练所述目标模型生成第二翻译模型,根据所述第二翻译模型对待处理的目标信息进行翻译处理。
可选地,所述第一生成模块701,用于:
获取所述各语言的单语语料库中的语料包含空缺槽位信息,并获取标注的与所述空缺槽位信息对应的填补信息;
根据所述包含空缺槽位信息的单语语料和对应的所述填补信息的语料,以及预设的第一损失函数训练预设模型的模型参数生成所述多语言表示模型。
可选地,所述第一生成模块701,用于:
获取所述各语言的单语语料库中的语料包含预测的下一个槽位信息,并获取标注的与所述预测的下一个槽位信息对应的预测信息;
根据所述包含所述预测的下一个槽位信息的单语语料和对应的所述预测信息的语料,以及预设的第二损失函数训练预设模型的模型参数生成所述多语言生成模型。
可选地,所述多语言表示模型包含一个或者多个表示子层级联,其中,每个所述表示子层包括:自注意力层与前馈网络层连接。
可选地,所述多语言生成模型包含一个或者多个生成子层级联,其中,每个所述生成子层包括:自注意力层与前馈网络层连接。
可选地,所述第一翻译模型包括编码器、解码器和输出层,其中,
所述编码器包括:一个或者多个编码子层级联,其中,每个所述编码子层包括:自注意力层与前馈网络层连接;
所述解码器包括一个或者多个解码子层级联,其中,每个所述解码子层包括:自注意力层与跨语言注意力层连接,所述跨语言注意力层和前馈网络连接,
其中,最后一个所述编码子层中的前馈网络层与最后一个所述解码子层中的跨语言注意力层连接。
可选地,所述第二生成模块702,用于:
将所述多语言表示模型中最后一个所述表示子层中的前馈网络层与所述编码器中第一个所述编码子层中的自注意力层连接,以及
将所述多语言生成模型中最后一个所述生成子层中的前馈网络层与所述解码器中第一个所述解码子层的自注意力层连接,以及将所述多语言生成模型与所述输出层连接,生成待训练的目标模型。
可选地,第三生成模块703,用于:
根据所述多种语言中的双语语料库以及预设的第三损失函数训练所述目标模型中所述多语言表示模型和所述第一翻译模型的模型参数;
根据训练后所述多语言表示模型和所述第一翻译模型的模型参数和训练前所述多语言生成模型的模型参数生成第二翻译模型。
本公开实施例所提供的翻译处理装置可执行本公开任意实施例所提供的翻译处理方法,具备执行方法相应的功能模块和有益效果。
本公开实施例还提供了一种计算机程序产品,包括计算机程序/指令,该计算机程序/指令被处理器执行时实现本公开任意实施例所提供的翻译处理方法。
图8为本公开实施例提供的一种电子设备的结构示意图。
下面具体参考图8,其示出了适于用来实现本公开实施例中的电子设备800的结构示意图。本公开实施例中的电子设备800可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、PDA(个人数 字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。图8示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图8所示,电子设备800可以包括处理装置(例如中央处理器、图形处理器等)801,其可以根据存储在只读存储器(ROM)802中的程序或者从存储装置808加载到随机访问存储器(RAM)803中的程序而执行各种适当的动作和处理。在RAM 803中,还存储有电子设备800操作所需的各种程序和数据。处理装置801、ROM 802以及RAM 803通过总线804彼此相连。输入/输出(I/O)接口805也连接至总线804。
通常,以下装置可以连接至I/O接口805:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置806;包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置807;包括例如磁带、硬盘等的存储装置808;以及通信装置809。通信装置809可以允许电子设备800与其他设备进行无线或有线通信以交换数据。虽然图8示出了具有各种装置的电子设备800,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在非暂态计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置809从网络上被下载和安装,或者从存储装置808被安装,或者从ROM 802被安装。在该计算机程序被处理装置801执行时,执行本公开实施例的翻译处理方法中限定的上述功能。
需要说明的是,本公开上述的计算机可读介质可以是计算机可读 信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。
在一些实施方式中,客户端、服务器可以利用诸如HTTP(HyperText Transfer Protocol,超文本传输协议)之类的任何当前已知或未来研发的网络协议进行通信,并且可以与任意形式或介质的数字数据通信(例如,通信网络)互连。通信网络的示例包括局域网(“LAN”),广域网(“WAN”),网际网(例如,互联网)以及端对端网络(例如,ad hoc端对端网络),以及任何当前已知或未来研发的网络。
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。
上述计算机可读介质承载有一个或者多个程序,当上述一个或者 多个程序被该电子设备执行时,使得该电子设备:根据多种语言中各语言的单语语料库训练生成多语言表示模型,以及根据各语言的单语语料库生成多语言生成模型;将多语言表示模型和多语言生成模型分别与第一翻译模型拼接,生成待训练的目标模型;根据多种语言中的双语语料库训练目标模型生成第二翻译模型,根据第二翻译模型对待处理的目标信息进行翻译处理。本公开实施例中,多语言表示模型能够更精准地提取待翻译的语句特征,多语言生成模型能够更准确的预测后续待翻译的语句。因而,第二翻译模型由于拼接了多语言表示模型和多语言生成模型,提升了翻译的精准性,从而提高了翻译质量。
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括但不限于面向对象的程序设计语言-诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言-诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)-连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组 合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,单元的名称在某种情况下并不构成对该单元本身的限定。
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、片上系统(SOC)、复杂可编程逻辑设备(CPLD)等等。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
根据本公开的一个或多个实施例,本公开提供了一种翻译处理方法,包括:
根据多种语言中各语言的单语语料库训练生成多语言表示模型,以及根据所述各语言的单语语料库生成多语言生成模型;
将所述多语言表示模型和所述多语言生成模型分别与第一翻译模型拼接,生成待训练的目标模型;
根据所述多种语言中的双语语料库训练所述目标模型生成第二翻 译模型,根据所述第二翻译模型对待处理的目标信息进行翻译处理。
根据本公开的一个或多个实施例,本公开提供的一种翻译处理方法中,所述根据多种语言中各语言的单语语料库训练生成多语言表示模型,包括:
获取所述各语言的单语语料库中的语料包含空缺槽位信息,并获取标注的与所述空缺槽位信息对应的填补信息;
根据所述包含空缺槽位信息的单语语料和对应的所述填补信息的语料,以及预设的第一损失函数训练预设模型的模型参数生成所述多语言表示模型。
根据本公开的一个或多个实施例,本公开提供的一种翻译处理方法中,所述根据所述各语言的单语语料库生成多语言生成模型,包括:
获取所述各语言的单语语料库中的语料包含预测的下一个槽位信息,并获取标注的与所述预测的下一个槽位信息对应的预测信息;
根据所述包含所述预测的下一个槽位信息的单语语料和对应的所述预测信息的语料,以及预设的第二损失函数训练预设模型的模型参数生成所述多语言生成模型。
根据本公开的一个或多个实施例,本公开提供的一种翻译处理方法中,所述多语言表示模型包含一个或者多个表示子层级联,其中,每个所述表示子层包括:自注意力层与前馈网络层连接。
根据本公开的一个或多个实施例,本公开提供的一种翻译处理方法中,所述多语言生成模型包含一个或者多个生成子层级联,其中,每个所述生成子层包括:自注意力层与前馈网络层连接。
根据本公开的一个或多个实施例,本公开提供的一种翻译处理方法中,所述第一翻译模型包括编码器、解码器和输出层,其中,
所述编码器包括:一个或者多个编码子层级联,其中,每个所述编码子层包括:自注意力层与前馈网络层连接;
所述解码器包括一个或者多个解码子层级联,其中,每个所述解码子层包括:自注意力层与跨语言注意力层连接,所述跨语言注意力 层和前馈网络连接,
其中,最后一个所述编码子层中的前馈网络层与最后一个所述解码子层中的跨语言注意力层连接。
根据本公开的一个或多个实施例,本公开提供的一种翻译处理方法中,所述将所述多语言表示模型和所述多语言生成模型分别与第一翻译模型拼接,生成待训练的目标模型,包括:
将所述多语言表示模型中最后一个所述表示子层中的前馈网络层与所述编码器中第一个所述编码子层中的自注意力层连接,以及
将所述多语言生成模型中最后一个所述生成子层中的前馈网络层与所述解码器中第一个所述解码子层的自注意力层连接,以及将所述多语言生成模型与所述输出层连接,生成待训练的目标模型。
根据本公开的一个或多个实施例,本公开提供的一种翻译处理方法中,所述根据所述多种语言中的双语语料库训练所述目标模型生成第二翻译模型,包括:
根据所述多种语言中的双语语料库以及预设的第三损失函数训练所述目标模型中所述多语言表示模型和所述第一翻译模型的模型参数;
根据训练后所述多语言表示模型和所述第一翻译模型的模型参数和训练前所述多语言生成模型的模型参数生成第二翻译模型。
根据本公开的一个或多个实施例,本公开提供了一种翻译处理装置,包括:
第一生成模块,用于根据多种语言中各语言的单语语料库训练生成多语言表示模型,以及根据所述各语言的单语语料库生成多语言生成模型;
第二生成模块,用于将所述多语言表示模型和所述多语言生成模型分别与第一翻译模型拼接,生成待训练的目标模型;
第三生成模块,用于根据所述多种语言中的双语语料库训练所述 目标模型生成第二翻译模型,根据所述第二翻译模型对待处理的目标信息进行翻译处理。
根据本公开的一个或多个实施例,本公开提供的翻译处理装置中,所述第一生成模块,用于:
获取所述各语言的单语语料库中的语料包含空缺槽位信息,并获取标注的与所述空缺槽位信息对应的填补信息;
根据所述包含空缺槽位信息的单语语料和对应的所述填补信息的语料,以及预设的第一损失函数训练预设模型的模型参数生成所述多语言表示模型。
根据本公开的一个或多个实施例,本公开提供的翻译处理装置中,所述第一生成模块,用于:
获取所述各语言的单语语料库中的语料包含预测的下一个槽位信息,并获取标注的与所述预测的下一个槽位信息对应的预测信息;
根据所述包含所述预测的下一个槽位信息的单语语料和对应的所述预测信息的语料,以及预设的第二损失函数训练预设模型的模型参数生成所述多语言生成模型。
根据本公开的一个或多个实施例,本公开提供的翻译处理装置中,所述多语言表示模型包含一个或者多个表示子层级联,其中,每个所述表示子层包括:自注意力层与前馈网络层连接。
根据本公开的一个或多个实施例,本公开提供的翻译处理装置中,所述多语言生成模型包含一个或者多个生成子层级联,其中,每个所述生成子层包括:自注意力层与前馈网络层连接。
根据本公开的一个或多个实施例,本公开提供的翻译处理装置中,所述第一翻译模型包括编码器、解码器和输出层,其中,
所述编码器包括:一个或者多个编码子层级联,其中,每个所述编码子层包括:自注意力层与前馈网络层连接;
所述解码器包括一个或者多个解码子层级联,其中,每个所述解码子层包括:自注意力层与跨语言注意力层连接,所述跨语言注意力 层和前馈网络连接,
其中,最后一个所述编码子层中的前馈网络层与最后一个所述解码子层中的跨语言注意力层连接。
根据本公开的一个或多个实施例,本公开提供的翻译处理装置中,所述第二生成模块,用于:
将所述多语言表示模型中最后一个所述表示子层中的前馈网络层与所述编码器中第一个所述编码子层中的自注意力层连接,以及
将所述多语言生成模型中最后一个所述生成子层中的前馈网络层与所述解码器中第一个所述解码子层的自注意力层连接,以及将所述多语言生成模型与所述输出层连接,生成待训练的目标模型。
根据本公开的一个或多个实施例,本公开提供的翻译处理装置中,第三生成模块,用于:
根据所述多种语言中的双语语料库以及预设的第三损失函数训练所述目标模型中所述多语言表示模型和所述第一翻译模型的模型参数;
根据训练后所述多语言表示模型和所述第一翻译模型的模型参数和训练前所述多语言生成模型的模型参数生成第二翻译模型。
根据本公开的一个或多个实施例,本公开提供了一种电子设备,包括:
处理器;
用于存储所述处理器可执行指令的存储器;
所述处理器,用于从所述存储器中读取所述可执行指令,并执行所述指令以实现如本公开提供的任一所述的翻译处理方法。
根据本公开的一个或多个实施例,本公开提供了一种计算机可读存储介质,所述存储介质存储有计算机程序,所述计算机程序用于执行如本公开提供的任一所述的翻译处理方法。
以上描述仅为本公开的较佳实施例以及对所运用技术原理的说 明。本领域技术人员应当理解,本公开中所涉及的公开范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述公开构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。
此外,虽然采用特定次序描绘了各操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实施例中。相反地,在单个实施例的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。
尽管已经采用特定于结构特征和/或方法逻辑动作的语言描述了本主题,但是应当理解所附权利要求书中所限定的主题未必局限于上面描述的特定特征或动作。相反,上面所描述的特定特征和动作仅仅是实现权利要求书的示例形式。

Claims (12)

  1. 一种翻译处理方法,其特征在于,包括:
    根据多种语言中各语言的单语语料库训练生成多语言表示模型,以及根据所述各语言的单语语料库生成多语言生成模型;
    将所述多语言表示模型和所述多语言生成模型分别与第一翻译模型拼接,生成待训练的目标模型;
    根据所述多种语言中的双语语料库训练所述目标模型生成第二翻译模型,根据所述第二翻译模型对待处理的目标信息进行翻译处理。
  2. 根据权利要求1所述的方法,其特征在于,所述根据多种语言中各语言的单语语料库训练生成多语言表示模型,包括:
    获取所述各语言的单语语料库中的语料包含空缺槽位信息,并获取标注的与所述空缺槽位信息对应的填补信息;
    根据所述包含空缺槽位信息的单语语料和对应的所述填补信息的语料,以及预设的第一损失函数训练预设模型的模型参数生成所述多语言表示模型。
  3. 根据权利要求1所述的方法,其特征在于,所述根据所述各语言的单语语料库生成多语言生成模型,包括:
    获取所述各语言的单语语料库中的语料包含预测的下一个槽位信息,并获取标注的与所述预测的下一个槽位信息对应的预测信息;
    根据所述包含所述预测的下一个槽位信息的单语语料和对应的所述预测信息的语料,以及预设的第二损失函数训练预设模型的模型参数生成所述多语言生成模型。
  4. 根据权利要求1所述的方法,其特征在于,所述多语言表示模型包含一个或者多个表示子层级联,其中,每个所述表示子层包括:自注意力层与前馈网络层连接。
  5. 根据权利要求4所述的方法,其特征在于,所述多语言生成模型包含一个或者多个生成子层级联,其中,每个所述生成子层包括: 自注意力层与前馈网络层连接。
  6. 根据权利要求5所述的方法,其特征在于,所述第一翻译模型包括编码器、解码器和输出层,其中,
    所述编码器包括:一个或者多个编码子层级联,其中,每个所述编码子层包括:自注意力层与前馈网络层连接;
    所述解码器包括一个或者多个解码子层级联,其中,每个所述解码子层包括:自注意力层与跨语言注意力层连接,所述跨语言注意力层和前馈网络连接,
    其中,最后一个所述编码子层中的前馈网络层与最后一个所述解码子层中的跨语言注意力层连接。
  7. 根据权利要求6所述的方法,其特征在于,所述将所述多语言表示模型和所述多语言生成模型分别与第一翻译模型拼接,生成待训练的目标模型,包括:
    将所述多语言表示模型中最后一个所述表示子层中的前馈网络层与所述编码器中第一个所述编码子层中的自注意力层连接,以及
    将所述多语言生成模型中最后一个所述生成子层中的前馈网络层与所述解码器中第一个所述解码子层的自注意力层连接,以及将所述多语言生成模型与所述输出层连接,生成待训练的目标模型。
  8. 根据权利要求1所述的方法,其特征在于,所述根据所述多种语言中的双语语料库训练所述目标模型生成第二翻译模型,包括:
    根据所述多种语言中的双语语料库以及预设的第三损失函数训练所述目标模型中所述多语言表示模型和所述第一翻译模型的模型参数;
    根据训练后所述多语言表示模型和所述第一翻译模型的模型参数和训练前所述多语言生成模型的模型参数生成第二翻译模型。
  9. 一种翻译处理装置,其特征在于,包括:
    第一生成模块,用于根据多种语言中各语言的单语语料库训练生成多语言表示模型,以及根据所述各语言的单语语料库生成多语言生 成模型;
    第二生成模块,用于将所述多语言表示模型和所述多语言生成模型分别与第一翻译模型拼接,生成待训练的目标模型;
    第三生成模块,用于根据所述多种语言中的双语语料库训练所述目标模型生成第二翻译模型,根据所述第二翻译模型对待处理的目标信息进行翻译处理。
  10. 一种电子设备,其特征在于,所述电子设备包括:
    处理器;
    用于存储所述处理器可执行指令的存储器;
    所述处理器,用于从所述存储器中读取所述可执行指令,并执行所述指令以实现上述权利要求1-8中任一项所述的翻译处理方法。
  11. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有指令,当所述指令在终端设备上运行时,使得所述终端设备实现如权利要求1-8中任一项所述的翻译处理方法。
  12. 一种计算机程序产品,其特征在于,所述计算机程序产品包括计算机程序/指令,所述计算机程序/指令被处理器执行时实现如权利要求1-8中任一项所述的翻译处理方法。
PCT/CN2022/107981 2021-08-03 2022-07-26 翻译处理方法、装置、设备及介质 WO2023011260A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110888353.XA CN113591498B (zh) 2021-08-03 2021-08-03 翻译处理方法、装置、设备及介质
CN202110888353.X 2021-08-03

Publications (1)

Publication Number Publication Date
WO2023011260A1 true WO2023011260A1 (zh) 2023-02-09

Family

ID=78254695

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/107981 WO2023011260A1 (zh) 2021-08-03 2022-07-26 翻译处理方法、装置、设备及介质

Country Status (2)

Country Link
CN (1) CN113591498B (zh)
WO (1) WO2023011260A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113591498B (zh) * 2021-08-03 2023-10-03 北京有竹居网络技术有限公司 翻译处理方法、装置、设备及介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170060855A1 (en) * 2015-08-25 2017-03-02 Alibaba Group Holding Limited Method and system for generation of candidate translations
CN108829685A (zh) * 2018-05-07 2018-11-16 内蒙古工业大学 一种基于单语语料库训练的蒙汉互译方法
CN110874537A (zh) * 2018-08-31 2020-03-10 阿里巴巴集团控股有限公司 多语言翻译模型的生成方法、翻译方法及设备
CN111159416A (zh) * 2020-04-02 2020-05-15 腾讯科技(深圳)有限公司 语言任务模型训练方法、装置、电子设备及存储介质
CN111178094A (zh) * 2019-12-20 2020-05-19 沈阳雅译网络技术有限公司 一种基于预训练的稀缺资源神经机器翻译训练方法
CN111382580A (zh) * 2020-01-21 2020-07-07 沈阳雅译网络技术有限公司 一种面向神经机器翻译的编码器-解码器框架预训练方法
CN113591498A (zh) * 2021-08-03 2021-11-02 北京有竹居网络技术有限公司 翻译处理方法、装置、设备及介质

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111460838B (zh) * 2020-04-23 2023-09-22 腾讯科技(深圳)有限公司 智能翻译模型的预训练方法、装置和存储介质
CN111680529A (zh) * 2020-06-11 2020-09-18 汪金玲 一种基于层聚合的机器翻译算法及装置

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170060855A1 (en) * 2015-08-25 2017-03-02 Alibaba Group Holding Limited Method and system for generation of candidate translations
CN108829685A (zh) * 2018-05-07 2018-11-16 内蒙古工业大学 一种基于单语语料库训练的蒙汉互译方法
CN110874537A (zh) * 2018-08-31 2020-03-10 阿里巴巴集团控股有限公司 多语言翻译模型的生成方法、翻译方法及设备
CN111178094A (zh) * 2019-12-20 2020-05-19 沈阳雅译网络技术有限公司 一种基于预训练的稀缺资源神经机器翻译训练方法
CN111382580A (zh) * 2020-01-21 2020-07-07 沈阳雅译网络技术有限公司 一种面向神经机器翻译的编码器-解码器框架预训练方法
CN111159416A (zh) * 2020-04-02 2020-05-15 腾讯科技(深圳)有限公司 语言任务模型训练方法、装置、电子设备及存储介质
CN113591498A (zh) * 2021-08-03 2021-11-02 北京有竹居网络技术有限公司 翻译处理方法、装置、设备及介质

Also Published As

Publication number Publication date
CN113591498B (zh) 2023-10-03
CN113591498A (zh) 2021-11-02

Similar Documents

Publication Publication Date Title
JP7112536B2 (ja) テキストにおける実体注目点のマイニング方法および装置、電子機器、コンピュータ読取可能な記憶媒体並びにコンピュータプログラム
US20200410396A1 (en) Implicit bridging of machine learning tasks
CN112633947B (zh) 文本生成模型生成方法、文本生成方法、装置及设备
CN111339789B (zh) 一种翻译模型训练方法、装置、电子设备及存储介质
WO2023165538A1 (zh) 语音识别方法、装置、计算机可读介质及电子设备
WO2022247562A1 (zh) 多模态数据检索方法、装置、介质及电子设备
WO2022037419A1 (zh) 音频内容识别方法、装置、设备和计算机可读介质
CN111382261B (zh) 摘要生成方法、装置、电子设备及存储介质
WO2022135080A1 (zh) 语料样本确定方法、装置、电子设备及存储介质
WO2023082931A1 (zh) 用于语音识别标点恢复的方法、设备和存储介质
WO2024099342A1 (zh) 翻译方法、装置、可读介质及电子设备
CN113139391A (zh) 翻译模型的训练方法、装置、设备和存储介质
WO2023011260A1 (zh) 翻译处理方法、装置、设备及介质
CN111104796B (zh) 用于翻译的方法和装置
CN112380876A (zh) 基于多语言机器翻译模型的翻译方法、装置、设备和介质
CN112257459B (zh) 语言翻译模型的训练方法、翻译方法、装置和电子设备
WO2023179506A1 (zh) 韵律预测方法、装置、可读介质及电子设备
WO2023138361A1 (zh) 图像处理方法、装置、可读存储介质及电子设备
CN109657046B (zh) 内容分析处理方法、装置、电子设备及存储介质
WO2022121859A1 (zh) 口语信息处理方法、装置和电子设备
CN114490946A (zh) 基于Xlnet模型的类案检索方法、系统及设备
CN112906371B (zh) 一种平行语料获取方法、装置、设备及存储介质
CN116127943A (zh) 自然语言处理模型的训练方法、装置、设备及存储介质
CN112927676A (zh) 一种语音信息的获取方法、装置、设备和存储介质
CN112307152A (zh) 一种数据解析方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22851974

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE