WO2022228221A1 - 信息翻译方法、装置、设备和存储介质 - Google Patents

信息翻译方法、装置、设备和存储介质 Download PDF

Info

Publication number
WO2022228221A1
WO2022228221A1 PCT/CN2022/087801 CN2022087801W WO2022228221A1 WO 2022228221 A1 WO2022228221 A1 WO 2022228221A1 CN 2022087801 W CN2022087801 W CN 2022087801W WO 2022228221 A1 WO2022228221 A1 WO 2022228221A1
Authority
WO
WIPO (PCT)
Prior art keywords
translation
network
sample
trained
target
Prior art date
Application number
PCT/CN2022/087801
Other languages
English (en)
French (fr)
Inventor
林泽辉
吴礼蔚
王明轩
李磊
Original Assignee
北京有竹居网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京有竹居网络技术有限公司 filed Critical 北京有竹居网络技术有限公司
Publication of WO2022228221A1 publication Critical patent/WO2022228221A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • G06F40/47Machine-assisted translation, e.g. using translation memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present disclosure relates to the field of computer technology, for example, to an information translation method, apparatus, device, and storage medium.
  • the present disclosure provides an information translation method, apparatus, device and storage medium to improve the translation accuracy of a multilingual translation model.
  • the present disclosure provides an information translation method, including:
  • the target translation type is used to indicate the source language and target language of the information to be translated
  • the pre-trained translation model is obtained by training parallel corpora of at least two sample translation types and corresponding network masks, and the network masks are used to control the sample translation types in the pre-trained translation model.
  • the sub-network that processes the corpus is obtained by training parallel corpora of at least two sample translation types and corresponding network masks, and the network masks are used to control the sample translation types in the pre-trained translation model.
  • the present disclosure provides an information translation device, including:
  • a first acquisition module configured to acquire information to be translated and a target translation type specified for the information to be translated; wherein, the target translation type is used to indicate the source language and target language of the information to be translated;
  • a translation module configured to translate the information to be translated into translation information corresponding to the target language through a sub-network corresponding to the target translation type in the pre-trained translation model;
  • the pre-trained translation model is obtained by training parallel corpora of at least two sample translation types and corresponding network masks, and the network masks are used to control the sample translation types in the pre-trained translation model.
  • the sub-network that processes the corpus is obtained by training parallel corpora of at least two sample translation types and corresponding network masks, and the network masks are used to control the sample translation types in the pre-trained translation model.
  • the present disclosure provides an information translation device, comprising a memory and a processor, wherein the memory stores a computer program, and the processor implements the above-mentioned information translation method when the computer program is executed.
  • the present disclosure provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements the above-mentioned information translation method.
  • FIG. 1 is a schematic flowchart of an information translation method provided by an embodiment of the present disclosure
  • FIG. 2 is a schematic diagram of a network architecture of a multilingual translation model in the related art
  • FIG. 3 is a schematic diagram of a network architecture of a multilingual translation model provided by an embodiment of the present disclosure
  • FIG. 4 is a schematic flowchart of another information translation method provided by an embodiment of the present disclosure.
  • FIG. 5 is a schematic flowchart of a training process of a pre-trained translation model provided by an embodiment of the present disclosure
  • FIG. 6 is a schematic flowchart of a process of generating a network mask according to an embodiment of the present disclosure
  • FIG. 7 is a schematic structural diagram of an information translation apparatus according to an embodiment of the present disclosure.
  • FIG. 8 is a schematic structural diagram of an information translation device according to an embodiment of the present disclosure.
  • method embodiments of the present disclosure may be performed in different orders and/or in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this regard.
  • the term “including” and variations thereof are open-ended inclusions, ie, "including but not limited to”.
  • the term “based on” is “based at least in part on.”
  • the term “one embodiment” means “at least one embodiment”; the term “another embodiment” means “at least one additional embodiment”; the term “some embodiments” means “at least some embodiments”. Relevant definitions of other terms will be given in the description below.
  • the traditional multilingual translation model is obtained by modeling the corpus of multiple language pairs in the same model.
  • the corpus of different language pairs often causes interference, especially the language pairs with rich corpus ( For example, a common language pair centered on English will be affected by the corpus of other language pairs, resulting in a decrease in the translation performance of the multilingual translation model. Therefore, the technical solutions provided by the embodiments of the present disclosure aim to improve the translation performance of the multilingual translation model.
  • the execution subject of the following method embodiments may be an information translation apparatus, and the apparatus may be implemented by software, hardware, or a combination of software and hardware to become part or all of an electronic device.
  • the electronic device may be a client, including but not limited to a smart phone, a tablet computer, an e-book reader, a vehicle terminal, and the like.
  • the electronic device may also be an independent server or a server cluster, and the embodiment of the present disclosure does not limit the form of the electronic device.
  • the following method embodiments are described by taking the execution subject being an electronic device as an example.
  • FIG. 1 is a schematic flowchart of an information translation method provided by an embodiment of the present disclosure. This embodiment relates to the process of how the electronic device uses the trained multilingual translation model to translate information. As shown in Figure 1, the method may include:
  • the information to be translated is the information that needs to be translated by language.
  • the information to be translated can be any source language, and the translated information is the corresponding target language. If the source language is English, the corresponding target language can be Chinese. Meanwhile, the information to be translated may be information of any modal, for example, the information to be translated may be at least one of images, texts, videos, or audios.
  • the electronic device can select the information to be translated that needs to be translated in the language from the database, and can also obtain the information to be translated input by the user through the translation software installed on the electronic device. This embodiment does not limit the acquisition method of the information to be translated. .
  • the above target translation type is used to indicate the source language and target language of the information to be translated. For example, if the target translation type is English to Chinese, this translation operation needs to translate the English information to be translated into Chinese information with the same semantics.
  • the target translation type can be specified by the user, or it can be randomly specified according to a set rule.
  • the electronic device may acquire the information input by the user, and then determine the target translation type designated for the information to be translated.
  • the pre-trained translation model is obtained by training parallel corpora of at least two sample translation types and a corresponding network mask, and the network mask is used to control the corpus of the sample translation type in the pre-trained translation model. Subnet for processing.
  • the parallel corpus of the above-mentioned sample translation type is the corpus data required for the training of the pre-trained translation model, and the parallel corpus includes a pair of source-end corpus and target-end corpus.
  • the source-end corpus can be understood as the corpus before translation
  • the target-end corpus can be understood as the translated corpus of the source-end corpus.
  • the Chinese-English parallel corpus includes a Chinese document and a corresponding English document. If the translation model is used to translate from English to Chinese, the English document is the source corpus, and the Chinese document is the target corpus. .
  • the above parallel corpus can also be obtained from monolingual corpus through back-translation technology.
  • each subnet is represented by a netmask.
  • subnet 1 in the pretrained translation model is represented by the netmask "xxxxxx”
  • subnet2 in the pretrained translation model is represented by the netmask "yyyyyy”.
  • the corpus of the sample translation type is processed through the corresponding sub-network, so that the corpus of the sample translation type only trains the sub-network allocated by itself. In this way, the training of each sub-network is relatively independent, which greatly reduces the differences in the modeling process. Interference between corpora of sample translation types improves the translation performance of pre-trained translation models.
  • different sub-networks can be allocated for different sample translation types, that is, the network masks corresponding to different sample translation types Different sub-networks in the pre-trained translation model process the corpus of different sample translation types.
  • sample translation types include sample translation type 1 (ie, English to Chinese) and sample translation type 2 (ie, German to English).
  • sample translation type 1 ie, English to Chinese
  • sample translation type 2 ie, German to English
  • the electronic device Sub-network 1 in the multilingual translation model can be assigned to sample translation type 1 by netmask "xxxxxx”
  • sub-network 2 in the multilingual translation model can be assigned to sample translation type 2 by netmask "yyyyyyy”
  • the network architecture of the pre-trained translation model is obtained as shown in Figure 3.
  • the corpus of sample translation type 1 when using the parallel corpus of sample translation type 1 and sample translation type 2 to train the pre-trained translation model, the corpus of sample translation type 1 only trains sub-network 1, and the corpus of sample translation type 2 only trains sub-network 2
  • the training is carried out so that the training of sub-network 1 and sub-network 2 are relatively independent, thereby reducing the interference between the corpus of sample translation type 1 and sample translation type 2 during the modeling process, and improving the translation performance of the pre-trained translation model.
  • the electronic device can translate the information to be translated through the sub-network corresponding to the target translation type in the pre-trained translation model, so as to translate the information to be translated into the specified Translation information corresponding to the target language.
  • the translation information refers to the translated information.
  • the above-mentioned pre-trained translation model may include a sequence to sequence (sequence to sequence, seq2seq) model, which is a neural network with an encoder (Encoder)-decoder (Decoder) structure, the input is a sequence (Sequence), and the output is also A sequence; in the Encoder, the variable-length sequence is converted into a fixed-length vector representation, and the Decoder converts the fixed-length vector representation into a variable-length target signal sequence, thereby realizing variable-length input to variable-length output.
  • sequence to sequence sequence to sequence
  • seq2seq sequence to sequence
  • the sequence-to-sequence model can include various types, for example, a seq2seq model based on a recurrent neural network (Recurrent Neural Network, RNN) and a seq2seq model based on a convolution operation (Convolution, CONV), etc.
  • RNN recurrent neural network
  • Convolution convolution
  • the pretrained translation model is used. The type is not limited.
  • the network structure of the pre-trained translation model shown in FIG. 3 is only an example, and the embodiment of the present disclosure does not limit the network structure of the pre-trained translation model, and a corresponding network structure can be selected based on actual requirements.
  • the information translation method obtains the information to be translated and the target translation type specified for the information to be translated, and translates the information to be translated into the target language corresponding to the sub-network corresponding to the target translation type in the pre-trained translation model. translation information. Because the above-mentioned pre-trained translation model is obtained by training parallel corpora of at least two sample translation types and the corresponding network mask, and the network mask is used to control the sub-section of the pre-trained translation model that processes the corpus of the sample translation type network, that is, different sample translation types are assigned corresponding sub-networks in the pre-training translation model.
  • the corpus of the sample translation type only trains the sub-network assigned by itself, which greatly reduces the number of different sample translations in the modeling process. Interference between types of corpus, thereby improving the translation performance of each sub-network in the pre-trained translation model. Since the translation performance of each sub-network in the pre-trained translation model has been improved, it is only necessary to translate the information to be translated through the sub-network corresponding to the target translation type in the pre-trained translation model, thereby improving the accuracy of the translation results. .
  • the multilingual translation model is usually obtained by training the corpus of a common language pair such as English.
  • the target translation type may be different from the sample translation type participating in the pre-trained translation model.
  • the language to be translated information can be translated by referring to the process described in the following embodiment.
  • the above S102 may include:
  • the target language of the first sample translation type is the same as the source language of the second sample translation type.
  • Both the first sample translation type and the second sample translation type refer to the translation types corresponding to the corpus participating in the training of the pre-trained translation model.
  • the electronic device can select the first sample translation type that is the same as the source language of the target translation type, and the second sample translation type that is the same as the target language of the target translation type, from all the sample translation types participating in the training of the pre-trained translation model, At the same time, ensure that the target language of the selected first sample translation type is the same as the source language of the second sample translation type.
  • the target translation type is German translation, that is, the source language of the target translation type is German, and the target language is French.
  • the electronic device can select the sample translation type with German as the source language and English as the target language as the first sample translation type, and at the same time select the sample translation type with English as the source language and French as the target language as the second sample translation type Sample translation type.
  • the first sub-network refers to the network that processes the corpus of the first sample translation type in the pre-training translation model
  • the second sub-network refers to the network that processes the corpus of the second sample translation type in the pre-training translation model.
  • a corresponding sub-network is assigned to each sample translation type in advance.
  • the target translation type is the same as the source language of the first sample translation type and the target language of the second sample translation type. The same, therefore, it can be considered to perform language translation on the information to be translated of the target translation type through the first sub-network corresponding to the first sample translation type and the second sub-network corresponding to the second sample translation type.
  • the target sub-network refers to the network in the pre-trained translation model that processes the information of the target translation type (that is, the information to be translated).
  • the electronic device may splicing part of the network in the first sub-network and part of the network in the second sub-network to obtain the target sub-network.
  • the above-mentioned basic translation model includes an encoder and a decoder.
  • the encoder can perform feature extraction on the input sequence to obtain a feature vector, and the encoder decodes the feature vector according to the context information to obtain a corresponding output sequence.
  • the process of the foregoing S403 may be: combining the encoder of the first sub-network and the decoder of the second sub-network to obtain the target translation Type corresponds to the target sub-network in the pre-trained translation model.
  • the target translation type is the same as the source language of the first sample translation type and the target language of the second sample translation type, it can be considered to select the encoder in the first sub-network and the decoder in the second sub-network , which is combined into a target sub-network corresponding to the target translation type.
  • the electronic device inputs the information to be translated into the target sub-network in the pre-trained translation model, and performs language translation on the information to be translated through the target sub-network, thereby obtaining translation information corresponding to the target language.
  • the electronic device can select the encoder in the corresponding sub-network of the German-English pair, and select the decoder in the corresponding sub-network of the English-French pair, and the selected encoding
  • the decoder and decoder form a new network to obtain the target sub-network corresponding to the German-French pair.
  • the electronic device inputs the information to be translated in German into the pre-trained translation model, and can translate the information to be translated into French information of the same meaning through the target sub-network.
  • the sub-network corresponding to the zero-resource target translation type is obtained by splicing the sub-networks corresponding to the existing sample translation types in the pre-training translation model.
  • the training of the sub-network corresponding to the type is relatively independent, so that the translation performance of the sub-network corresponding to the existing sample translation type is higher. Therefore, the translation of the target sub-network formed by combining the existing sub-networks with higher translation performance The performance is also high, thereby improving the translation effect of the information to be translated under zero resources.
  • a training process of a pre-trained translation model is also provided.
  • the method may further include:
  • the parallel corpus set includes parallel corpora of at least two sample translation types.
  • a large number of corpora are stored in the corpus database, so the electronic device can directly acquire multiple parallel corpora from the corpus database, and the multiple parallel corpora includes at least two types of sample translations.
  • the initial translation model is obtained by training the parallel corpus.
  • the network mask is used to control the sub-network that processes the corpus of the sample translation type in the pre-trained translation model, that is, the sub-network is represented by the network mask.
  • the netmasks are the same, the subnets are also the same, and when the netmasks are different, the subnets are also different.
  • the electronic device can use a parallel corpus including at least two sample translation types to train to obtain an initial translation model, and the initial translation model can be understood as having learned at least two sample translation types The grammatical structure and lexical association between the source language and the target language.
  • the corpus of at least two sample translation types will interfere with each other, resulting in a decrease in the translation performance of the initial translation model obtained by training. For example, English-centered corpus resources are relatively abundant.
  • a corresponding sub-network can be allocated for each sample translation type in the pre-training translation model in advance, and the corpus of the sample translation type can only be used for its own assigned sub-network.
  • the sub-network is trained so that the training of each sub-network is relatively independent, which greatly reduces the interference between the corpus of different sample translation types during the modeling process, thereby improving the translation performance of the final training pre-trained translation model.
  • the process of the above S502 may be: for each sample translation type, use the source-end corpus in the parallel corpus of the sample translation type as the input of the initial translation model, and the target-end corpus corresponding to the source-end corpus.
  • a preset loss function is used to train the sub-network represented by the corresponding network mask in the initial translation model.
  • the above loss function may be a maximum likelihood loss function or a cross entropy loss function, or the like.
  • the electronic device can input the source corpus in the parallel corpus of the sample translation type into the initial translation model, and process the source corpus through the subnet represented by the network mask corresponding to the sample translation type in the initial translation model to obtain the predicted corpus. , and calculate the loss value of the above loss function based on the prediction corpus and the target corpus.
  • the loss value is greater than the preset threshold
  • the parameters of the sub-network represented by the network mask are updated, and based on the updated sub-network, the source corpus continues to be processed until the loss value of the obtained loss function is less than or equal to the predetermined value. until the threshold is set.
  • the sample translation type 1 corresponds to the sub-network 1
  • the sample translation type 2 corresponds to the sub-network 2 as an example.
  • the electronic device uses the corpus of the sample translation type 1 to train the sub-network 1 in the pre-trained translation model.
  • the corpus of sample translation type 1 and sample translation type 2 not only realizes the exclusive branch (as shown in Figure 3).
  • connection line exclusive to sub-network 1 and the connection line exclusive to sub-network 2) also realizes the training of the shared branch (the shared branch can share the parameters of the sub-network 1 and the sub-network 2), so that the final pre-training
  • the translation model not only learns the grammatical structure and lexical association between the source language and the target language in sample translation type 1 and sample translation type 2, and realizes the mutual translation between multiple languages, but also reduces the sample translation type 1 and sample translation. Interference between type 2 corpora, thereby improving the translation performance of the pre-trained translation model for multiple languages.
  • the sub-networks represented by the corresponding network masks in the initial translation model are trained by using parallel corpora of at least two sample translation types, so that the training of each sub-network is relatively independent, reducing the time required for the modeling process.
  • the mutual interference between corpora of different sample translation types improves the translation performance of the pretrained translation model obtained by training.
  • the pre-trained translation model is used to train the new sample translation type. To predict the information to be translated, it can obtain translation results with high accuracy, and has little impact on the translation performance of the original sample translation type.
  • the method before training the pre-trained translation model, the method further includes: generating a network mask corresponding to each sample translation type. Next, the generation process of the network mask corresponding to the sample translation type is introduced. As shown in Figure 6, before the above S502, the method further includes:
  • the basic translation model can include a sequence-to-sequence model, which is a neural network with an encoder-decoder structure, where the input is a sequence and the output is also a sequence.
  • the electronic device uses the parallel corpus including at least two sample translation types to train the basic translation model, and obtains an initial translation model after the model convergence condition is reached. At this point, the initial translation model has been able to learn the grammatical structure and lexical association between the source language and the target language in at least two sample translation types.
  • the electronic device After obtaining the initial translation model trained with multilingual corpus, for each sample translation type, the electronic device uses the parallel corpus of the sample translation type to perform micro-training on the initial translation model, and enlarges the initial translation model through micro-training.
  • the weight of the branch (the branch here can be understood as the connection line between neurons in different layers).
  • the trained initial translation model is trimmed to retain the branches that are more important to the sample translation type, and the branches that have less impact on the sample translation type are trimmed, so as to obtain the network mask corresponding to the sample translation type.
  • the electronic device may trim the trained initial translation model based on the weights of the connecting lines between different layers in the trained initial translation model to generate a network mask corresponding to the sample translation type.
  • the above connecting lines represent the connections between multiple layers of neurons in the initial translation model after training. After micro-training the initial translation model with the parallel corpus of the sample translation type, the weights corresponding to the connection lines between different layers in the initial translation model have changed. Know which connecting lines have a greater impact on the processing of the sample translation type corpus, and which connecting lines have less impact on the processing of the sample translation type corpus, and then retain the connecting lines that are more important for the sample translation type, and cut out the impact on the sample translation type. Smaller connecting lines to get the netmask corresponding to the sample translation type.
  • the electronic device may sort the connecting lines between multiple layers in the initial translation model after training according to the weights of the connecting lines, to obtain a sorting result.
  • the N connecting lines with the largest weights are selected from the sorting results, and the first flag codes corresponding to the N connecting lines are generated to indicate that the N connecting lines are reserved in the initial translation model after training.
  • a connection line that needs to be reserved can be represented by the number 1.
  • the second flag codes corresponding to the other connection lines are respectively generated to indicate that the part of the connection lines is clipped in the initial translation model after training.
  • a connection line that needs to be trimmed can be represented by the number 0.
  • the generated first flag code and the second flag code are combined to obtain the network mask of the sample translation type.
  • the above N is a natural number greater than 1, and the value of N may be set based on actual requirements, which is not limited in this embodiment.
  • the electronic device may also generate the corresponding network mask of the sample translation type according to the following process:
  • Step A Determine whether the weights of the connecting lines between different layers in the initial translation model after training are greater than or equal to a preset threshold.
  • the preset threshold may be set based on actual requirements, which is not limited in this embodiment. If the weights of the connecting lines between different layers in the trained initial translation model are greater than or equal to the preset threshold, perform the following step B, if the weights of the connecting lines between different layers in the trained initial translation model are less than the preset thresholds Set the threshold, then perform the following step C.
  • Step B generating the first sign code corresponding to the connection line.
  • the first flag code is used to indicate that the connection line is reserved in the initial translation model after training.
  • the first flag code can be set to 1 to indicate that the connection line is reserved.
  • Step C generating a second flag code corresponding to the connection line.
  • the second flag code is used to indicate that the connecting line is cut from the trained initial translation model.
  • the second flag code can be set to 0 to indicate that the connecting line is cut.
  • Step D Combine all the generated first flag codes and all the second flag codes to obtain a network mask corresponding to the sample translation type.
  • the network mask controls the sub-network that processes the corpus of the sample translation type in the initial translation model. In this way, when using corpus of different sample translation types to train the initial translation model In the process, only the corpus of the sample translation type is used to train the sub-network represented by the corresponding network mask in the initial translation model, which reduces the mutual interference between the corpus of different sample translation types and improves the translation of the pre-trained translation model. performance.
  • micro-training the initial translation model with the sample translation type, and trimming the initial translation model after the micro-training can retain the connection lines that are more important for the processing of the corpus of the sample translation type, and cut out the sample translation type.
  • the processing of the corpus has less influence on the connecting line, so that the subnet represented by the generated network mask can more accurately process the corpus of the sample translation type, and the translation performance of the pre-trained translation model for multiple languages is improved.
  • FIG. 7 is a schematic structural diagram of an information translation apparatus provided by an embodiment of the present disclosure. As shown in FIG. 7 , the apparatus may include: a first acquisition module 701 and a translation module 702 .
  • the first acquisition module 701 is configured to acquire the information to be translated and the target translation type specified for the information to be translated; wherein, the target translation type is used to indicate the source language and target language of the information to be translated; the translation module 702 sets In order to translate the information to be translated into translation information corresponding to the target language through the sub-network corresponding to the target translation type in the pre-training translation model; wherein, the pre-training translation model is obtained by at least two kinds of samples
  • the parallel corpus of the translation type and the corresponding network mask are obtained by training, and the network mask is used to control the sub-network in the pre-trained translation model that processes the corpus of the sample translation type.
  • the information translation device obtains the information to be translated and the target translation type specified for the information to be translated, and translates the information to be translated into the target language corresponding to the sub-network corresponding to the target translation type in the pre-trained translation model. translation information. Because the above-mentioned pre-trained translation model is obtained by training parallel corpora of at least two sample translation types and the corresponding network mask, and the network mask is used to control the sub-process of processing the corpus of the sample translation type in the pre-trained translation model Network, that is, different sample translation types are assigned corresponding sub-networks in the pre-training translation model.
  • the corpus of the sample translation type only trains the sub-network assigned by itself, which greatly reduces the translation of different samples in the modeling process. Interference between types of corpus, thereby improving the translation performance of each sub-network in the pretrained translation model. Since the translation performance of each sub-network in the pre-trained translation model has been improved, it is only necessary to translate the information to be translated through the sub-network corresponding to the target translation type in the pre-trained translation model, thereby improving the accuracy of the translation results. .
  • the network masks corresponding to different sample translation types are different, so as to control different sub-networks in the pre-trained translation model to process the corpus of the different sample translation types.
  • the translation module 702 may include: a first acquisition unit, a second acquisition unit, a determination unit, and a translation unit.
  • the first obtaining unit is configured to obtain a first sample translation type that is the same as the source language of the target translation type, and a second sample translation type that is the same as the target language of the target translation type; wherein, the first sample translation type The target language of the sample translation type is the same as the source language of the second sample translation type; the second obtaining unit is configured to obtain respectively the first sub-network corresponding to the first sample translation type in the pre-training translation model, and The second sub-network corresponding to the second sample translation type in the pre-trained translation model; the determining unit is set to determine, according to the first sub-network and the second sub-network, that the target translation type is in the The corresponding target sub-network in the pre-trained translation model; the translation unit is configured to translate the information to be translated into translation information corresponding to the target language through the target sub-network.
  • the pre-trained translation model includes an encoder and a decoder; the above determination unit is configured to combine the encoder of the first sub-network and the decoder of the second sub-network to obtain the target translation.
  • Type corresponds to the target sub-network in the pre-trained translation model.
  • the apparatus further includes: a second acquisition module and a first training module.
  • the second obtaining module is configured to obtain a parallel corpus before the first obtaining module 701 obtains the information to be translated and the target translation type specified for the information to be translated; wherein, the parallel corpus includes at least two sample translation types.
  • Parallel corpus; the first training module is configured to train the initial translation model according to the parallel corpus of the at least two sample translation types and the corresponding network masks to obtain the pre-trained translation model; wherein, the initial translation model is obtained by training on the parallel corpus.
  • the apparatus further includes: a second training module, a third training module and a cropping module.
  • the second training module is configured to, before the first training module trains the initial translation model according to the parallel corpus of the at least two sample translation types and the corresponding network masks, and obtains the pre-trained translation model, according to the parallel
  • the corpus trains a preset basic translation model to obtain the initial translation model
  • the third training module is configured to use the parallel corpus of the sample translation type to train the initial translation model for each sample translation type, A trained initial translation model is obtained
  • the trimming module is configured to trim the trained initial translation model for each sample translation type to generate a network mask corresponding to the sample translation type.
  • the trimming module is configured to trim the trained initial translation model based on the weights of the connecting lines between different layers in the trained initial translation model to generate The netmask corresponding to the sample translation type.
  • the cropping module is set to determine whether the weights of the connecting lines between different layers in the initial translation model after training are greater than or equal to a preset threshold; When the weight of the connecting line between different layers in the initial translation model is greater than or equal to a preset threshold, the first sign code corresponding to the connecting line is generated; after determining the connecting line between different layers in the initial translation model after training When the weight value is less than the preset threshold, the second sign code corresponding to the connecting line is generated; all the first sign codes and all the second sign codes generated are combined to obtain the network mask corresponding to the sample translation type; Wherein, the first flag code is used to indicate that the connection line is reserved in the initial translation model after training; the second flag code is used to indicate that the connection is clipped from the initial translation model after training Wire.
  • the first training module is configured to, for each sample translation type, use the source-end corpus in the parallel corpus of the sample translation type as the input of the initial translation model, the source corpus
  • the target-end corpus corresponding to the end corpus is used as the expected output, and a preset loss function is used to train the sub-network represented by the corresponding network mask in the initial translation model.
  • FIG. 8 it shows a schematic structural diagram of an electronic device (ie, an information translation device) 800 suitable for implementing an embodiment of the present disclosure.
  • the electronic devices in the embodiments of the present disclosure may include, but are not limited to, such as mobile phones, notebook computers, digital broadcast receivers, personal digital assistants (Personal Digital Assistants, PDAs), tablet computers (PADs), portable multimedia players (Portable Media Players) , PMP), mobile terminals such as in-vehicle terminals (eg, in-vehicle navigation terminals), etc., as well as fixed terminals such as digital televisions (Television, TV), desktop computers, and the like.
  • PDAs Personal Digital Assistants
  • PMP portable multimedia players
  • PMP portable multimedia players
  • the electronic device 800 may include a processing device (such as a central processing unit, a graphics processor, etc.) 801, which may be stored in a read-only memory (Read-Only Memory, ROM) 802 according to a program or from a storage device 808 programs loaded into Random Access Memory (RAM) 803 to perform various appropriate actions and processes. In the RAM 803, various programs and data necessary for the operation of the electronic device 800 are also stored.
  • the processing device 801, the ROM 802, and the RAM 803 are connected to each other through a bus 804.
  • An Input/Output (I/O) interface 805 is also connected to the bus 804 .
  • the following devices can be connected to the I/O interface 805: input devices 806 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a Liquid Crystal Display (LCD) Output device 807 , speaker, vibrator, etc.; storage device 808 including, eg, magnetic tape, hard disk, etc.; and communication device 809 .
  • Communication means 809 may allow electronic device 800 to communicate wirelessly or by wire with other devices to exchange data.
  • FIG. 8 shows an electronic device 800 having various apparatuses, it is not required to implement or have all of the illustrated apparatuses. More or fewer devices may alternatively be implemented or provided.
  • embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated in the flowchart.
  • the computer program may be downloaded and installed from the network via the communication device 809, or from the storage device 808, or from the ROM 802.
  • the processing device 801 the above-mentioned functions defined in the methods of the embodiments of the present disclosure are executed.
  • the computer-readable medium described above in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the two.
  • the computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above.
  • Examples of computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer disks, hard disks, RAM, ROM, Erasable Programmable Read-Only Memory (EPROM) or flash memory), optical fiber, portable compact disk read-only memory (Compact Disc Read-Only Memory, CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with computer-readable program code embodied thereon. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device .
  • the program code embodied on the computer-readable medium may be transmitted by any suitable medium, including but not limited to: electric wire, optical fiber cable, radio frequency (RF), etc., or any suitable combination of the above.
  • clients and servers can communicate using any currently known or future developed network protocols, such as HyperText Transfer Protocol (HTTP), and can communicate with digital data in any form or medium.
  • Communication eg, a communication network
  • Examples of communication networks include Local Area Networks (LANs), Wide Area Networks (WANs), the Internet (eg, the Internet), and peer-to-peer networks (eg, ad hoc peer-to-peer networks), as well as any currently Known or future developed networks.
  • LANs Local Area Networks
  • WANs Wide Area Networks
  • the Internet eg, the Internet
  • peer-to-peer networks eg, ad hoc peer-to-peer networks
  • the above-mentioned computer-readable medium may be included in the above-mentioned electronic device; or may exist alone without being assembled into the electronic device.
  • the above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device: obtains the information to be translated and the target translation type specified for the information to be translated; wherein , the target translation type is used to indicate the source language and target language of the information to be translated; through the sub-network corresponding to the target translation type in the pre-trained translation model, the information to be translated is translated into The translation information corresponding to the language type; wherein, the pre-training translation model is obtained by training parallel corpora of at least two sample translation types and corresponding network masks, and the network masks are used to control the pre-training translation model. A sub-network that processes the corpus of the sample translation type.
  • Computer program code for performing operations of the present disclosure may be written in one or more programming languages, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and This includes conventional procedural programming languages - such as the "C" language or similar programming languages.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server.
  • the remote computer may be connected to the user computer through any kind of network, including a LAN or WAN, or may be connected to an external computer (eg, using an Internet service provider to connect through the Internet).
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions.
  • the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented in dedicated hardware-based systems that perform the specified functions or operations , or can be implemented in a combination of dedicated hardware and computer instructions.
  • the units involved in the embodiments of the present disclosure may be implemented in a software manner, and may also be implemented in a hardware manner.
  • the name of the unit does not constitute a limitation of the unit itself in one case, for example, the first obtaining unit may also be described as "a unit that obtains at least two Internet Protocol addresses".
  • exemplary types of hardware logic components include: Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (Application Specific Standard Products) Standard Parts, ASSP), system on chip (System on Chip, SOC), complex programmable logic device (Complex Programmable Logic Device, CPLD) and so on.
  • FPGAs Field Programmable Gate Arrays
  • ASICs Application Specific Integrated Circuits
  • ASSP Application Specific Standard Products
  • SOC System on Chip
  • complex programmable logic device Complex Programmable Logic Device, CPLD
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with the instruction execution system, apparatus or device.
  • the machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or devices, or any suitable combination of the foregoing. Examples of machine-readable storage media would include one or more wire-based electrical connections, portable computer disks, hard disks, RAM, ROM, EPROM or flash memory, optical fibers, CD-ROMs, optical storage devices, magnetic storage devices, or Any suitable combination of the above.
  • an information translation device comprising a memory and a processor, the memory stores a computer program, and the processor implements when executing the computer program:
  • the target translation type is used to indicate the source language and target language of the information to be translated
  • the pre-trained translation model is obtained by training parallel corpora of at least two sample translation types and corresponding network masks, and the network masks are used to control the sample translation types in the pre-trained translation model.
  • the sub-network that processes the corpus is obtained by training parallel corpora of at least two sample translation types and corresponding network masks, and the network masks are used to control the sample translation types in the pre-trained translation model.
  • the network masks corresponding to different sample translation types are different, so as to control different sub-networks in the pre-trained translation model to process the corpus of the different sample translation types.
  • the processor when the processor executes the computer program, the processor further implements: respectively acquiring a first sample translation type that is the same as the source language of the target translation type and a second sample that is the same as the target language of the target translation type Translation type; respectively obtain the first sub-network corresponding to the first sample translation type in the pre-training translation model, and the second sub-network corresponding to the second sample translation type in the pre-training translation model; according to The first sub-network and the second sub-network determine the target sub-network corresponding to the target translation type in the pre-trained translation model; through the target sub-network, the to-be-translated information is translated into Translation information corresponding to the target language; wherein, the target language of the first sample translation type is the same as the source language of the second sample translation type.
  • the pre-trained translation model includes an encoder and a decoder
  • the processor when the processor executes the computer program, it further implements: combining the encoder of the first sub-network and the decoder of the second sub-network to obtain the target translation type in the pre-training translation The corresponding target subnet in the model.
  • the processor when the processor executes the computer program, the processor further implements: acquiring a parallel corpus; wherein the parallel corpus includes parallel corpora of at least two sample translation types; and parallel corpora according to the at least two sample translation types and the corresponding network mask, and train the initial translation model to obtain the pre-trained translation model; wherein, the initial translation model is obtained by training the parallel corpus.
  • the processor when the processor executes the computer program, the processor further implements: training a preset basic translation model according to the parallel corpus to obtain the initial translation model; for each type of sample translation, using the sample translation The initial translation model is trained with a type of parallel corpus to obtain a trained initial translation model; the trained initial translation model is trimmed to generate a network mask corresponding to the sample translation type.
  • the processor when the processor executes the computer program, the processor further implements: tailoring the trained initial translation model based on the weights of connecting lines between different layers in the trained initial translation model to generate the The netmask corresponding to the sample translation type described above.
  • the processor when the processor executes the computer program, it further implements: judging whether the weights of the connecting lines between different layers in the trained initial translation model are greater than or equal to a preset threshold; The weight of the connecting line between different layers in the model is greater than or equal to the preset threshold, then the first sign code corresponding to the connecting line is generated; if the weight of the connecting line between different layers in the initial translation model after training is The weight is less than the preset threshold, then the second sign code corresponding to the connecting line is generated; all the first sign codes and all the second sign codes generated are combined to obtain the network mask corresponding to the sample translation type
  • the first sign code is used to represent that the connecting line is retained in the initial translation model after the training
  • the second sign code is used to represent that the initial translation model after the training is cut out of the described connecting line.
  • the processor when the processor executes the computer program, the processor further implements: for each sample translation type, the source-end corpus in the parallel corpus of the sample translation type is used as the input of the initial translation model, and the source-end corpus corresponds to The target-end corpus is used as the expected output, and a preset loss function is used to train the sub-network represented by the corresponding network mask in the initial translation model.
  • a computer-readable storage medium on which a computer program is stored, and the computer program is executed by a processor to realize:
  • the target translation type is used to indicate the source language and target language of the information to be translated
  • the pre-trained translation model is obtained by training parallel corpora of at least two sample translation types and corresponding network masks, and the network masks are used to control the sample translation types in the pre-trained translation model.
  • the sub-network that processes the corpus is obtained by training parallel corpora of at least two sample translation types and corresponding network masks, and the network masks are used to control the sample translation types in the pre-trained translation model.
  • the information translation apparatus, device, and storage medium provided in the above embodiments can execute the information translation method provided by any embodiment of the present disclosure, and have corresponding functional modules and effects for executing the method.
  • the information translation apparatus, device, and storage medium provided in the above embodiments can execute the information translation method provided by any embodiment of the present disclosure, and have corresponding functional modules and effects for executing the method.
  • an information translation method comprising:
  • the target translation type is used to indicate the source language and target language of the information to be translated
  • the pre-trained translation model is obtained by training parallel corpora of at least two sample translation types and corresponding network masks, and the network masks are used to control the sample translation types in the pre-trained translation model.
  • the sub-network that processes the corpus is obtained by training parallel corpora of at least two sample translation types and corresponding network masks, and the network masks are used to control the sample translation types in the pre-trained translation model.
  • the network masks corresponding to different sample translation types are different, so as to control different sub-networks in the pre-trained translation model to process the corpus of the different sample translation types.
  • the above information translation method is provided, further comprising: respectively acquiring a first sample translation type that is the same as the source language of the target translation type, and a first sample translation type that is the same as the target translation type.
  • the second sample translation type with the same target language respectively obtain the first sub-network corresponding to the first sample translation type in the pre-training translation model, and the second sample translation type corresponding to the pre-training translation model
  • the second sub-network according to the first sub-network and the second sub-network, determine the target sub-network corresponding to the target translation type in the pre-training translation model;
  • the information to be translated is translated into translation information corresponding to the target language; wherein, the target language of the first sample translation type is the same as the source language of the second sample translation type.
  • the pre-trained translation model includes an encoder and a decoder; according to one or more embodiments of the present disclosure, the above information translation method is provided, further comprising: combining the encoder of the first sub-network and the The decoders of the second sub-network are combined to obtain the target sub-network corresponding to the target translation type in the pre-trained translation model.
  • the above information translation method is provided, further comprising: acquiring a parallel corpus; wherein the parallel corpus includes parallel corpora of at least two sample translation types; according to the at least two Using parallel corpus of various sample translation types and corresponding network masks, the initial translation model is trained to obtain the pre-trained translation model; wherein, the initial translation model is obtained by training the parallel corpus set.
  • the above information translation method further comprising: training a preset basic translation model according to the parallel corpus to obtain the initial translation model; translating for each sample type, using the parallel corpus of the sample translation type to train the initial translation model to obtain a trained initial translation model; trim the trained initial translation model to generate a network corresponding to the sample translation type mask.
  • the above information translation method is provided, further comprising: based on the weights of the connecting lines between different layers in the trained initial translation model, for the trained initial translation
  • the model is tailored to generate a netmask corresponding to the sample translation type.
  • the above information translation method further comprising: judging whether the weights of the connecting lines between different layers in the initial translation model after training are greater than or equal to a preset threshold; if In the initial translation model after training, the weight of the connecting line between the different layers is greater than or equal to the preset threshold, then the first sign code corresponding to the connecting line is generated; if the initial translation model after training is in the If the weight of the connection line between different layers is less than the preset threshold, then generate the second sign code corresponding to the connection line; combine all the first sign codes and all the second sign codes generated to obtain the sample The network mask corresponding to the translation type; wherein, the first mark code is used to indicate that the connection line is reserved in the initial translation model after the training, and the second mark code is used to indicate that the trained The connecting lines are cropped in the initial translation model.
  • the above information translation method is provided, further comprising: for each sample translation type, using the source-end corpus in the parallel corpus of the sample translation type as the input of the initial translation model , the target-end corpus corresponding to the source-end corpus is used as the expected output, and a preset loss function is used to train the sub-network represented by the corresponding network mask in the initial translation model.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Machine Translation (AREA)

Abstract

一种信息翻译方法、装置、设备和存储介质。信息翻译方法包括:获取待翻译信息以及为待翻译信息指定的目标翻译类型;其中,目标翻译类型用于指示待翻译信息的源语种和目标语种;通过预训练翻译模型中与目标翻译类型对应的子网络,将待翻译信息翻译成与目标语种对应的翻译信息;其中,预训练翻译模型是通过至少两种样本翻译类型的平行语料以及对应的网络掩码训练得到的,网络掩码用于控制预训练翻译模型中对样本翻译类型的语料进行处理的子网络。

Description

信息翻译方法、装置、设备和存储介质
本申请要求在2021年04月29日提交中国专利局、申请号为202110474872.1的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
技术领域
本公开涉及计算机技术领域,例如涉及一种信息翻译方法、装置、设备和存储介质。
背景技术
随着神经网络的不断发展以及数据的爆炸式增长,多种翻译软件应运而生,成为了人们获取外部信息的重要渠道。翻译软件,尤其是多语种翻译模型,其能够实现多个语种之间的相互翻译。但是,多语种翻译模型的翻译性能仍达不到期望要求。
发明内容
本公开提供一种信息翻译方法、装置、设备和存储介质,以提高多语种翻译模型的翻译准确性。
本公开提供了一种信息翻译方法,包括:
获取待翻译信息以及为所述待翻译信息指定的目标翻译类型;其中,所述目标翻译类型用于指示所述待翻译信息的源语种和目标语种;
通过预训练翻译模型中与所述目标翻译类型对应的子网络,将所述待翻译信息翻译成与所述目标语种对应的翻译信息;
其中,所述预训练翻译模型是通过至少两种样本翻译类型的平行语料以及对应的网络掩码训练得到的,所述网络掩码用于控制所述预训练翻译模型中对所述样本翻译类型的语料进行处理的子网络。
本公开提供一种信息翻译装置,包括:
第一获取模块,设置为获取待翻译信息以及为所述待翻译信息指定的目标翻译类型;其中,所述目标翻译类型用于指示所述待翻译信息的源语种和目标语种;
翻译模块,设置为通过预训练翻译模型中与所述目标翻译类型对应的子网络,将所述待翻译信息翻译成与所述目标语种对应的翻译信息;
其中,所述预训练翻译模型是通过至少两种样本翻译类型的平行语料以及对应的网络掩码训练得到的,所述网络掩码用于控制所述预训练翻译模型中对所述样本翻译类型的语料进行处理的子网络。
本公开提供一种信息翻译设备,包括存储器和处理器,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时实现上述的信息翻译方法。
本公开提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现上述的信息翻译方法。
附图说明
图1为本公开实施例提供的一种信息翻译方法的流程示意图;
图2为相关技术中的一种多语种翻译模型的网络架构示意图;
图3为本公开实施例提供的一种多语种翻译模型的网络架构示意图;
图4为本公开实施例提供的另一种信息翻译方法的流程示意图;
图5为本公开实施例提供的一种预训练翻译模型的训练过程的流程示意图;
图6为本公开实施例提供的一种网络掩码的生成过程的流程示意图;
图7为本公开实施例提供一种的信息翻译装置的结构示意图;
图8为本公开实施例提供一种的信息翻译设备的结构示意图。
具体实施方式
下面将参照附图描述本公开的实施例。虽然附图中显示了本公开的一些实施例,然而,本公开可以通过多种形式来实现,提供这些实施例是为了理解本公开。本公开的附图及实施例仅用于示例性作用。
本公开的方法实施方式中记载的多个步骤可以按照不同的顺序执行,和/或并行执行。此外,方法实施方式可以包括附加的步骤和/或省略执行示出的步骤。本公开的范围在此方面不受限制。
本文使用的术语“包括”及其变形是开放性包括,即“包括但不限于”。术语“基于”是“至少部分地基于”。术语“一个实施例”表示“至少一个实施例”;术语“另一实施例”表示“至少一个另外的实施例”;术语“一些实施例”表示“至少一些实施例”。其他术语的相关定义将在下文描述中给出。
本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分,并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。
本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的,本领域技术人员应当理解,除非在上下文另有指出,否则应该理解为“一个或多个”。
本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的,而并不是用于对这些消息或信息的范围进行限制。
传统的多语种翻译模型是基于多个语种对的语料在同一模型内建模得到的,在建模过程中,不同语种对的语料之间往往会产生干扰,尤其是语料较为丰富的语种对(如以英语为中心的通用语种对)会受到其它语种对的语料的影响,导致多语种翻译模型的翻译性能下降。为此,本公开实施例提供的技术方案,旨在提高多语种翻译模型的翻译性能。
下文中将结合附图对本公开的实施例进行详细说明。
下述方法实施例的执行主体可以是信息翻译装置,该装置可以通过软件、硬件或者软硬件结合的方式实现成为电子设备的部分或者全部。可选的,该电子设备可以为客户端,包括但不限于智能手机、平板电脑、电子书阅读器以及车载终端等。该电子设备也可以为独立的服务器或者服务器集群,本公开实施例对电子设备的形式不做限定。下述方法实施例以执行主体是电子设备为例进行说明。
图1为本公开实施例提供的一种信息翻译方法的流程示意图。本实施例涉及的是电子设备如何使用训练好的多语种翻译模型进行信息翻译的过程。如图1所示,该方法可以包括:
S101、获取待翻译信息以及为所述待翻译信息指定的目标翻译类型。
待翻译信息为需要进行语种翻译的信息。待翻译信息可以为任意一种源语种,翻译后的信息为对应的目标语种。如源语种为英语,对应的目标语种可以为中文。同时,待翻译信息可以为任意一种模态的信息,如待翻译信息可以为图像、文本、视频或者音频中的至少一种。作为一种示例,电子设备可以从数据库中选取需要进行语种翻译的待翻译信息,也可以通过其上安装的翻译软件获取用户输入的待翻译信息,本实施例对待翻译信息的获取方式不做限定。
上述目标翻译类型用于指示待翻译信息的源语种和目标语种。例如,假设目标翻译类型为英译中,则本次翻译操作是需要将英语的待翻译信息翻译成同语义的中文信息。通常,目标翻译类型可以由用户指定,也可以是按照设定规则随机指定。电子设备可以获取用户输入的信息,进而确定为待翻译信息所指定的目标翻译类型。
S102、通过预训练翻译模型中与所述目标翻译类型对应的子网络,将所述待翻译信息翻译成与所述目标语种对应的翻译信息。
所述预训练翻译模型是通过至少两种样本翻译类型的平行语料以及对应的网络掩码训练得到的,所述网络掩码用于控制所述预训练翻译模型中对所述样本翻译类型的语料进行处理的子网络。
上述样本翻译类型的平行语料为预训练翻译模型训练所需的语料数据,该平行语料包括成对的源端语料和目标端语料。源端语料可以理解为翻译之前的语料,目标端语料可以理解为源端语料经过翻译后的语料。以平行语料为文本语料为例,中英平行语料包括一个中文文档和一个对应的英文文档,如果通过翻译模型进行英译中操作,那么英文文档即为源端语料,中文文档即为目标端语料。上述平行语料也可以由单语语料,经过回译技术获取得到。
在实际应用中,考虑预训练翻译模型建模过程中,不同样本翻译类型的语料之间往往会产生干扰,为此,可以预先在预训练翻译模型中为每个样本翻译类型分配对应的子网络,每个子网络通过网络掩码来表示。例如,通过网络掩码“xxxxxx”来表示预训练翻译模型中的子网络1,网络掩码“yyyyyy”来表示预训练翻译模型中的子网络2。通过对应的子网络对样本翻译类型的语料进行处理,使得样本翻译类型的语料仅对自身所分配的子网络进行训练,这样,每个子网络的训练相对独立,从而大大降低了建模过程中不同样本翻译类型的语料之间的干扰,提高了预训练翻译模型的翻译性能。
为了降低建模过程中不同样本翻译类型的语料之间的干扰,在上述实施例的基础上,可选地,可以为不同样本翻译类型分配不同的子网络,即不同样本翻译类型对应的网络掩码不同,以控制预训练翻译模型中不同子网络对不同样本翻译类型的语料进行处理。
示例性的,假设样本翻译类型包括样本翻译类型1(即英译中)和样本翻译类型2(即德译英),同时以图2所示的多语种翻译模型的网络架构为例,电子设备可以通过网络掩码“xxxxxx”将多语种翻译模型中的子网络1分配给样本翻译类型1,通过网络掩码“yyyyyy”将多语种翻译模型中的子网络2分配给样本翻译类型2,从而得到如图3所示的预训练翻译模型的网络架构。这样,在采用样本翻译类型1和样本翻译类型2的平行语料对预训练翻译模型进行训练时,样本翻译类型1的语料仅对子网络1进行训练,样本翻译类型2的语料仅对子网络2进行训练,使得子网络1和子网络2的训练相对独立,从而降低了建模过程中样本翻译类型1和样本翻译类型2的语料之间的干扰,提高了预训练翻译模型的翻译性能。
在电子设备获取到待翻译信息和目标翻译类型之后,电子设备便可以通过预训练翻译模型中与目标翻译类型对应的子网络,对待翻译信息进行翻译,以将待翻译信息翻译成与所指定的目标语种对应的翻译信息。其中,翻译信息是 指翻译后的信息。继续以图3所示的预训练翻译模型为例,同时假设目标翻译类型为英译中,由于预训练翻译模型中的子网络1所处理的翻译类型为英译中,因此,电子设备便可以通过预训练翻译模型中的子网络1,将待翻译信息“I love to sing”翻译成对应的中文信息“我爱唱歌”。
可选地,上述预训练翻译模型可以包括序列到序列(sequence to sequence,seq2seq)模型,是一种编码(Encoder)-解码(Decoder)结构的神经网络,输入是一个序列(Sequence),输出也是一个序列;在Encoder中,将可变长度的序列转变为固定长度的向量表示,Decoder将这个固定长度的向量表示转换为可变长度的目标信号序列,进而实现不定长的输入到不定长的输出。序列到序列模型可以包括多种类型,例如,基于循环神经网络(Recurrent Neural Network,RNN)的seq2seq模型和基于卷积运算(Convolution,CONV)的seq2seq模型等,本实施例中对预训练翻译模型的类型不做限定。
图3所示的预训练翻译模型的网络架构仅是一种示例,本公开实施例并未限定预训练翻译模型的网络结构,可以基于实际需求选择相应的网络结构。
本公开实施例提供的信息翻译方法,获取待翻译信息以及为待翻译信息指定的目标翻译类型,通过预训练翻译模型中与目标翻译类型对应的子网络,将待翻译信息翻译成与目标语种对应的翻译信息。由于上述预训练翻译模型是通过至少两种样本翻译类型的平行语料以及对应的网络掩码训练得到的,且该网络掩码用于控制预训练翻译模型中对样本翻译类型的语料进行处理的子网络,即不同样本翻译类型均在预训练翻译模型中被分配了对应的子网络,这样,样本翻译类型的语料仅对自身所分配的子网络进行训练,大大降低了建模过程中不同样本翻译类型的语料之间的干扰,从而提高了预训练翻译模型中每个子网络的翻译性能。由于预训练翻译模型中每个子网络的翻译性能得到了提升,这样,仅需要通过预训练翻译模型中与目标翻译类型对应的子网络对待翻译信息进行翻译即可,从而提高了翻译结果的准确性。
在实际应用中,还存在这样一种场景:多语种翻译模型通常是以如英语为中心的通用语种对的语料训练得到的,此时,目标翻译类型可能与参与预训练翻译模型的样本翻译类型不同,即在预训练翻译模型的训练过程中,训练数据中并没有目标翻译类型的语料(例如,目标翻译类型的源语种和目标语种为其它非英语的语种)。针对该场景,可以参照下述实施例所述的过程对待翻译信息进行语种翻译,在上述实施例的基础上,可选地,如图4所示,上述S102可以包括:
S401、分别获取与所述目标翻译类型的源语种相同的第一样本翻译类型,以及与所述目标翻译类型的目标语种相同的第二样本翻译类型。
所述第一样本翻译类型的目标语种与所述第二样本翻译类型的源语种相同。
第一样本翻译类型和第二样本翻译类型均是指参与预训练翻译模型训练的语料所对应的翻译类型。电子设备可以从参与预训练翻译模型训练的所有样本翻译类型中,选取与目标翻译类型的源语种相同的第一样本翻译类型,以及与目标翻译类型的目标语种相同的第二样本翻译类型,同时确保所选取的第一样本翻译类型的目标语种与第二样本翻译类型的源语种相同。
示例性的,假设目标翻译类型为德译法,即目标翻译类型的源语种为德语,目标语种为法语。此时,电子设备可以选取以德语为源语种,以英语为目标语种的样本翻译类型作为第一样本翻译类型,同时选取以英语作为源语种,以法语作为目标语种的样本翻译类型作为第二样本翻译类型。
S402、分别获取所述第一样本翻译类型在预训练翻译模型中对应的第一子网络,以及所述第二样本翻译类型在所述预训练翻译模型中对应的第二子网络。
第一子网络是指预训练翻译模型中对第一样本翻译类型的语料进行处理的网络,第二子网络是指预训练翻译模型中对第二样本翻译类型的语料进行处理的网络。在预训练翻译模型的训练过程中,预先为每个样本翻译类型分配了对应的子网络,同时,目标翻译类型与第一样本翻译类型的源语种相同,与第二样本翻译类型的目标语种相同,因此,可以考虑通过第一样本翻译类型对应的第一子网络以及第二样本翻译类型对应的第二子网络,来对目标翻译类型的待翻译信息进行语种翻译。
S403、根据所述第一子网络和所述第二子网络,确定所述目标翻译类型在所述预训练翻译模型中对应的目标子网络。
目标子网络是指预训练翻译模型中对目标翻译类型的信息(即待翻译信息)进行处理的网络。在获取到第一子网络和第二子网络后,电子设备可以拼接第一子网络中的部分网络以及第二子网络中的部分网络,从而得到目标子网络。
可选地,上述基础翻译模型包括编码器和解码器。其中,编码器可以对输入序列进行特征提取,得到特征向量,编码器根据上下文信息对特征向量进行解码,得到对应的输出序列。基于此,在上述实施例的基础上,可选地,上述S403的过程可以为:将所述第一子网络的编码器以及所述第二子网络的解码器进行组合,得到所述目标翻译类型在所述预训练翻译模型中对应的目标子网络。
由于目标翻译类型与第一样本翻译类型的源语种相同,与第二样本翻译类型的目标语种相同,因此,可以考虑选取第一子网络中的编码器,以及选取第二子网络中的解码器,组合成目标翻译类型对应的目标子网络。
S404、通过所述目标子网络,将所述待翻译信息翻译成与所述目标语种对 应的翻译信息。
电子设备将待翻译信息输入至预训练翻译模型中的目标子网络,通过目标子网络对待翻译信息进行语种翻译,从而得到与目标语种对应的翻译信息。
继续以上述S401中的例子为例,电子设备可以选取德语-英语对所对应的子网络中的编码器,以及选取英语-法语对所对应的子网络中的解码器,并将所选取的编码器和解码器组成新的网络,从而得到德语-法语对所对应的目标子网络。这样,电子设备将德语的待翻译信息输入至预训练翻译模型中,便可以通过目标子网络将待翻译信息翻译成同含义的法语信息。
在本实施例中,对于零资源场景,通过拼接已有的样本翻译类型在预训练翻译模型中对应的子网络,来获得零资源的目标翻译类型所对应的子网络,由于已有的样本翻译类型所对应的子网络的训练相对独立,使得已有的样本翻译类型所对应的子网络的翻译性能较高,因此,通过翻译性能较高的已有子网络组合而成的目标子网络的翻译性能也较高,从而提高了对零资源下的待翻译信息的翻译效果。
在一个实施例中,还提供了一种预训练翻译模型的训练过程。在上述实施例的基础上,可选地,如图5所示,在上述S101之前,该方法还可以包括:
S501、获取平行语料集。
所述平行语料集中包括至少两种样本翻译类型的平行语料。
通常,语料数据库中存储有大量的语料,因此,电子设备可以直接从语料数据库中获取多个平行语料,该多个平行语料中包括至少两种样本翻译类型。
S502、根据所述至少两种样本翻译类型的平行语料以及对应的网络掩码,对初始翻译模型进行训练,得到所述预训练翻译模型。
所述初始翻译模型是通过所述平行语料集训练得到的。网络掩码用于控制预训练翻译模型中对样本翻译类型的语料进行处理的子网络,即子网络通过网络掩码来表示。当网络掩码相同时,子网络也相同,当网络掩码不同时,子网络也不相同。
为了得到翻译性能较高的多语种翻译模型,电子设备可以使用包括至少两种样本翻译类型的平行语料集训练得到一个初始翻译模型,该初始翻译模型可以理解为已经学习到至少两种样本翻译类型中源语种与目标语种之间的语法结构和以及词汇关联。但是,在训练初始翻译模型的过程中,至少两种样本翻译类型的语料之间会相互干扰,导致训练得到的初始翻译模型的翻译性能有所下降。例如,以英语为中心的语料资源较为丰富,在采用以英语为中心的语料和其它非通用语种为中心的语料同时对初始翻译模型进行训练时,其它非通用语 种为中心的语料会对以英语为中心的语对产生干扰,导致训练后的初始翻译模型对以英语为中心的翻译类型的预测性能下降。
基于此,为了降低不同样本翻译类型的语料之间的相互干扰,可以预先在预训练翻译模型中为每个样本翻译类型分配对应的子网络,并使用样本翻译类型的语料仅对自身所分配的子网络进行训练,使得每个子网络的训练相对独立,从而大大降低了建模过程中不同样本翻译类型的语料之间的干扰,进而提高了最终训练得到的预训练翻译模型的翻译性能。
可选地,上述S502的过程可以为:针对每种样本翻译类型,将所述样本翻译类型的平行语料中源端语料作为所述初始翻译模型的输入,所述源端语料对应的目标端语料作为期望输出,采用预设的损失函数,对所述初始翻译模型中对应的网络掩码所表示的子网络进行训练。
上述损失函数可以为最大似然损失函数或者交叉熵损失函数等。电子设备可以将样本翻译类型的平行语料中源端语料输入至初始翻译模型中,通过初始翻译模型中与样本翻译类型对应的网络掩码所表示的子网络对源端语料进行处理,得到预测语料,并基于预测语料与目标端语料计算上述损失函数的损失值。当损失值大于预设阈值时,对网络掩码所表示的子网络的参数进行更新,并基于更新后的子网络,继续对源端语料进行处理,直至得到的损失函数的损失值小于等于预设阈值为止。
对于其它样本翻译类型,参照上述过程仅对自身所分配的子网络进行训练,直至达到子网络的收敛条件。
以图3所示的网络架构,样本翻译类型1对应子网络1,样本翻译类型2对应子网络2为例,电子设备使用样本翻译类型1的语料对预训练翻译模型中的子网络1进行训练,使用样本翻译类型2的语料对预训练翻译模型中的子网络2进行训练,使得子网络1和子网络2的训练相对独立。当子网络1和子网络2中存在共享分支(如图3中子网络1和子网络2共享的连接线)时,通过样本翻译类型1和样本翻译类型2的语料不仅实现了对独占分支(如图3中子网络1独占的连接线,子网络2独占的连接线)的训练,也实现了对共享分支(共享分支能够共享子网络1和子网络2的参数)的训练,使得最终得到的预训练翻译模型不仅学习到样本翻译类型1和样本翻译类型2中源语种与目标语种之间的语法结构和词汇关联,实现了多语种之间的相互翻译,同时也降低了样本翻译类型1和样本翻译类型2的语料之间的干扰,从而提高了预训练翻译模型对多语种的翻译性能。
在本实施例中,通过至少两种样本翻译类型的平行语料,对初始翻译模型中对应的网络掩码所表示的子网络进行训练,使得每个子网络的训练相对独立, 减少了建模过程中不同样本翻译类型的语料之间的相互干扰,从而提高了训练得到的预训练翻译模型的翻译性能。同时,在采用新的样本翻译类型的语料训练预训练翻译模型后,由于新的样本翻译类型的语料仅对自身所分配的子网络进行训练,因此,采用预训练翻译模型对新的样本翻译类型的待翻译信息进行预测,能够得到准确性较高的翻译结果,并且对原有样本翻译类型的翻译性能影响很小。
可选地,在训练预训练翻译模型之前,该方法还包括:生成每种样本翻译类型对应的网络掩码。接下来,介绍样本翻译类型对应的网络掩码的生成过程,如图6所示,在上述S502之前,该方法还包括:
S601、根据所述平行语料集对预设的基础翻译模型进行训练,得到所述初始翻译模型。
基础翻译模型可以包括序列到序列模型,是一种编码-解码结构的神经网络,输入是一个序列,输出也是一个序列。电子设备使用包括至少两种样本翻译类型的平行语料集对基础翻译模型进行训练,在达到模型收敛条件之后,得到初始翻译模型。此时,初始翻译模型已经能够学习到至少两种样本翻译类型中源语种与目标语种之间的语法结构和以及词汇关联。
S602、针对每种样本翻译类型,采用所述样本翻译类型的平行语料对所述初始翻译模型进行训练,得到训练后的初始翻译模型;对所述训练后的初始翻译模型进行裁剪,以生成所述样本翻译类型对应的网络掩码。
在得到经过多语种语料训练的初始翻译模型之后,针对每种样本翻译类型,电子设备使用样本翻译类型的平行语料对初始翻译模型进行微训练,通过微训练放大初始翻译模型中对样本翻译类型重要的分支(这里的分支可以理解为不同层神经元之间的连接线)的权值。接着,对训练后的初始翻译模型进行裁剪,以保留对样本翻译类型比较重要的分支,裁剪掉对样本翻译类型影响较小的分支,从而得到样本翻译类型对应的网络掩码。
可选地,电子设备可以基于训练后的初始翻译模型中不同层之间连接线的权值,对训练后的初始翻译模型进行裁剪,以生成样本翻译类型对应的网络掩码。
上述连接线表示了训练后的初始翻译模型中多层神经元之间的连接关系。经过样本翻译类型的平行语料对初始翻译模型进行微训练之后,初始翻译模型中不同层之间连接线所对应的权值发生了变化,此时,通过每条连接线对应的权值,便可以获知哪些连接线对样本翻译类型的语料的处理影响较大,哪些连接线对样本翻译类型的语料的处理影响较小,进而保留对样本翻译类型比较重 要的连接线,裁剪掉对样本翻译类型影响较小的连接线,从而得到样本翻译类型对应的网络掩码。
作为一种可选地实施方式,电子设备可以按照连接线的权值对训练后的初始翻译模型中多层之间的连接线进行排序,得到排序结果。接着,从排序结果中选取权值最大的N个连接线,并生成该N个连接线对应的第一标志码,以表示该N条连接线在训练后的初始翻译模型中被保留。例如,需要保留的连接线可以使用数字1来表示。对于其它连接线,分别生成其它连接线对应的第二标志码,以表示该部分连接线在训练后的初始翻译模型中被裁剪。例如,需要裁剪的连接线可以使用数字0来表示。将生成的第一标志码和第二标志码进行组合,从而得到样本翻译类型的网络掩码。其中,上述N为大于1的自然数,N的取值可以基于实际需求进行设置,本实施例对此不做限定。
作为另一种可选的实施方式,电子设备也可以按照下述过程生成样本翻译类型的对应的网络掩码:
步骤A、判断训练后的初始翻译模型中不同层之间连接线的权值是否大于或等于预设阈值。
该预设阈值可以基于实际需求进行设置,本实施例对此不做限定。若训练后的初始翻译模型中不同层之间连接线的权值大于或等于预设阈值,则执行下述步骤B,若训练后的初始翻译模型中不同层之间连接线的权值小于预设阈值,则执行下述步骤C。
步骤B、生成连接线对应的第一标志码。
所述第一标志码用于表示在训练后的初始翻译模型中保留该连接线。例如,可以将第一标志码设置为1,以表示保留该连接线。
步骤C、生成连接线对应的第二标志码。
所述第二标志码用于表示从训练后的初始翻译模型中裁剪该连接线。例如,可以将第二标志码设置为0,以表示裁剪该连接线。
步骤D、将生成的所有第一标志码和所有第二标志码进行组合,得到所述样本翻译类型对应的网络掩码。
通过为样本翻译类型生成对应的网络掩码,通过网络掩码控制初始翻译模型中对样本翻译类型的语料进行处理的子网络,这样,在采用不同样本翻译类型的语料对初始翻译模型进行训练的过程中,仅使用样本翻译类型的语料对初始翻译模型中对应的网络掩码所表示的子网络进行训练,降低了不同样本翻译类型的语料之间的相互干扰,提高了预训练翻译模型的翻译性能。
同时,采用样本翻译类型对初始翻译模型进行微训练,并对微训练后的初始翻译模型进行裁剪,能够保留下对样本翻译类型的语料的处理较为重要的连接线,裁剪掉对样本翻译类型的语料的处理影响较小的连接线,使得所生成的网络掩码所表示的子网络更能准确地处理样本翻译类型的语料,提高了预训练翻译模型对多语种的翻译性能。
图7为本公开实施例提供的一种信息翻译装置的结构示意图。如图7所示,该装置可以包括:第一获取模块701和翻译模块702。
第一获取模块701设置为获取待翻译信息以及为所述待翻译信息指定的目标翻译类型;其中,所述目标翻译类型用于指示所述待翻译信息的源语种和目标语种;翻译模块702设置为通过预训练翻译模型中与所述目标翻译类型对应的子网络,将所述待翻译信息翻译成与所述目标语种对应的翻译信息;其中,所述预训练翻译模型是通过至少两种样本翻译类型的平行语料以及对应的网络掩码训练得到的,所述网络掩码用于控制所述预训练翻译模型中对所述样本翻译类型的语料进行处理的子网络。
本公开实施例提供的信息翻译装置,获取待翻译信息以及为待翻译信息指定的目标翻译类型,通过预训练翻译模型中与目标翻译类型对应的子网络,将待翻译信息翻译成与目标语种对应的翻译信息。由于上述预训练翻译模型是通过至少两种样本翻译类型的平行语料以及对应的网络掩码训练得到的,且该网络掩码用于控制预训练翻译模型中对样本翻译类型的语料进行处理的子网络,即不同样本翻译类型均在预训练翻译模型中被分配了对应的子网络,这样,样本翻译类型的语料仅对自身所分配的子网络进行训练,大大降低了建模过程中不同样本翻译类型的语料之间的干扰,从而提高了预训练翻译模型中每个子网络的翻译性能。由于预训练翻译模型中每个子网络的翻译性能得到了提升,这样,仅需要通过预训练翻译模型中与目标翻译类型对应的子网络对待翻译信息进行翻译即可,从而提高了翻译结果的准确性。
可选地,不同样本翻译类型对应的网络掩码不同,以控制所述预训练翻译模型中不同子网络对所述不同样本翻译类型的语料进行处理。
在上述实施例的基础上,可选地,翻译模块702可以包括:第一获取单元、第二获取单元、确定单元和翻译单元。
第一获取单元设置为分别获取与所述目标翻译类型的源语种相同的第一样本翻译类型,以及与所述目标翻译类型的目标语种相同的第二样本翻译类型;其中,所述第一样本翻译类型的目标语种与所述第二样本翻译类型的源语种相同;第二获取单元设置为分别获取所述第一样本翻译类型在预训练翻译模型中对应的第一子网络,以及所述第二样本翻译类型在所述预训练翻译模型中对应 的第二子网络;确定单元设置为根据所述第一子网络和所述第二子网络,确定所述目标翻译类型在所述预训练翻译模型中对应的目标子网络;翻译单元设置为通过所述目标子网络,将所述待翻译信息翻译成与所述目标语种对应的翻译信息。
可选地,所述预训练翻译模型包括编码器和解码器;上述确定单元设置为将所述第一子网络的编码器以及所述第二子网络的解码器进行组合,得到所述目标翻译类型在所述预训练翻译模型中对应的目标子网络。
在上述实施例的基础上,可选地,该装置还包括:第二获取模块和第一训练模块。
第二获取模块设置为在第一获取模块701获取待翻译信息以及为所述待翻译信息指定的目标翻译类型之前,获取平行语料集;其中,所述平行语料集中包括至少两种样本翻译类型的平行语料;第一训练模块设置为根据所述至少两种样本翻译类型的平行语料以及对应的网络掩码,对初始翻译模型进行训练,得到所述预训练翻译模型;其中,所述初始翻译模型是通过所述平行语料集训练得到的。
在上述实施例的基础上,可选地,该装置还包括:第二训练模块、第三训练模块和裁剪模块。
第二训练模块设置为在第一训练模块根据所述至少两种样本翻译类型的平行语料以及对应的网络掩码,对初始翻译模型进行训练,得到所述预训练翻译模型之前,根据所述平行语料集对预设的基础翻译模型进行训练,得到所述初始翻译模型;第三训练模块设置为针对每种样本翻译类型,采用所述样本翻译类型的平行语料对所述初始翻译模型进行训练,得到训练后的初始翻译模型;裁剪模块设置为针对每种样本翻译类型,对所述训练后的初始翻译模型进行裁剪,以生成所述样本翻译类型对应的网络掩码。
在上述实施例的基础上,可选地,裁剪模块设置为基于所述训练后的初始翻译模型中不同层之间连接线的权值,对所述训练后的初始翻译模型进行裁剪,以生成所述样本翻译类型对应的网络掩码。
在上述实施例的基础上,可选地,裁剪模块设置为判断所述训练后的初始翻译模型中不同层之间连接线的权值是否大于或等于预设阈值;在确定所述训练后的初始翻译模型中不同层之间连接线的权值大于或等于预设阈值时,生成所述连接线对应的第一标志码;在确定所述训练后的初始翻译模型中不同层之间连接线的权值小于预设阈值时,生成所述连接线对应的第二标志码;将生成的所有第一标志码和所有第二标志码进行组合,得到所述样本翻译类型对应的 网络掩码;其中,所述第一标志码用于表示在所述训练后的初始翻译模型中保留所述连接线;所述第二标志码用于表示从所述训练后的初始翻译模型中裁剪所述连接线。
在上述实施例的基础上,可选地,第一训练模块设置为针对每种样本翻译类型,将所述样本翻译类型的平行语料中源端语料作为所述初始翻译模型的输入,所述源端语料对应的目标端语料作为期望输出,采用预设的损失函数,对所述初始翻译模型中对应的网络掩码所表示的子网络进行训练。
下面参考图8,其示出了适于用来实现本公开实施例的电子设备(即信息翻译设备)800的结构示意图。本公开实施例中的电子设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、个人数字助理(Personal Digital Assistant,PDA)、平板电脑(PAD)、便携式多媒体播放器(Portable Media Player,PMP)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字电视(Television,TV)、台式计算机等等的固定终端。图8示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图8所示,电子设备800可以包括处理装置(例如中央处理器、图形处理器等)801,其可以根据存储在只读存储器(Read-Only Memory,ROM)802中的程序或者从存储装置808加载到随机访问存储器(Random Access Memory,RAM)803中的程序而执行多种适当的动作和处理。在RAM803中,还存储有电子设备800操作所需的多种程序和数据。处理装置801、ROM 802以及RAM 803通过总线804彼此相连。输入/输出(Input/Output,I/O)接口805也连接至总线804。
通常,以下装置可以连接至I/O接口805:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置806;包括例如液晶显示器(Liquid Crystal Display,LCD)、扬声器、振动器等的输出装置807;包括例如磁带、硬盘等的存储装置808;以及通信装置809。通信装置809可以允许电子设备800与其他设备进行无线或有线通信以交换数据。虽然图8示出了具有多种装置的电子设备800,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。
根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在非暂态计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置809从网络上被下载和安装,或者从存储装置808被安装,或者从ROM 802被安装。在该计算机程序被处理装置801执行时,执行本公开实施例的方法中限定的上 述功能。
本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、RAM、ROM、可擦式可编程只读存储器(Erasable Programmable Read-Only Memory,EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(Compact Disc Read-Only Memory,CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、射频(Radio Frequency,RF)等等,或者上述的任意合适的组合。
在一些实施方式中,客户端、服务器可以利用诸如超文本传输协议(HyperText Transfer Protocol,HTTP)之类的任何当前已知或未来研发的网络协议进行通信,并且可以与任意形式或介质的数字数据通信(例如,通信网络)互连。通信网络的示例包括局域网(Local Area Network,LAN),广域网(Wide Area Network,WAN),网际网(例如,互联网)以及端对端网络(例如,ad hoc端对端网络),以及任何当前已知或未来研发的网络。
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:获取待翻译信息以及为所述待翻译信息指定的目标翻译类型;其中,所述目标翻译类型用于指示所述待翻译信息的源语种和目标语种;通过预训练翻译模型中与所述目标翻译类型对应的子网络,将所述待翻译信息翻译成与所述目标语种对应的翻译信息;其中,所述预训练翻译模型是通过至少两种样本翻译类型的平行语料以及对应的网络掩码训练得到的,所述网络掩码用于控制所述预训练翻译模型中对所述样本翻译类型的语 料进行处理的子网络。
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括但不限于面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括LAN或WAN—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开多种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,单元的名称在一种情况下并不构成对该单元本身的限定,例如,第一获取单元还可以被描述为“获取至少两个网际协议地址的单元”。
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(Field Programmable Gate Array,FPGA)、专用集成电路(Application Specific Integrated Circuit,ASIC)、专用标准产品(Application Specific Standard Parts,ASSP)、片上系统(System on Chip,SOC)、复杂可编程逻辑设备(Complex Programmable Logic Device,CPLD)等等。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或 半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、RAM、ROM、EPROM或快闪存储器、光纤、CD-ROM、光学储存设备、磁储存设备、或上述内容的任何合适组合。
在一个实施例中,提供了一种信息翻译设备,包括存储器和处理器,存储器存储有计算机程序,该处理器执行计算机程序时实现:
获取待翻译信息以及为所述待翻译信息指定的目标翻译类型;其中,所述目标翻译类型用于指示所述待翻译信息的源语种和目标语种;
通过预训练翻译模型中与所述目标翻译类型对应的子网络,将所述待翻译信息翻译成与所述目标语种对应的翻译信息;
其中,所述预训练翻译模型是通过至少两种样本翻译类型的平行语料以及对应的网络掩码训练得到的,所述网络掩码用于控制所述预训练翻译模型中对所述样本翻译类型的语料进行处理的子网络。
可选地,不同样本翻译类型对应的网络掩码不同,以控制所述预训练翻译模型中不同子网络对所述不同样本翻译类型的语料进行处理。
在一个实施例中,处理器执行计算机程序时还实现:分别获取与所述目标翻译类型的源语种相同的第一样本翻译类型,以及与所述目标翻译类型的目标语种相同的第二样本翻译类型;分别获取所述第一样本翻译类型在预训练翻译模型中对应的第一子网络,以及所述第二样本翻译类型在所述预训练翻译模型中对应的第二子网络;根据所述第一子网络和所述第二子网络,确定所述目标翻译类型在所述预训练翻译模型中对应的目标子网络;通过所述目标子网络,将所述待翻译信息翻译成与所述目标语种对应的翻译信息;其中,所述第一样本翻译类型的目标语种与所述第二样本翻译类型的源语种相同。
可选地,所述预训练翻译模型包括编码器和解码器;
在一个实施例中,处理器执行计算机程序时还实现:将所述第一子网络的编码器以及所述第二子网络的解码器进行组合,得到所述目标翻译类型在所述预训练翻译模型中对应的目标子网络。
在一个实施例中,处理器执行计算机程序时还实现:获取平行语料集;其中,所述平行语料集中包括至少两种样本翻译类型的平行语料;根据所述至少两种样本翻译类型的平行语料以及对应的网络掩码,对初始翻译模型进行训练,得到所述预训练翻译模型;其中,所述初始翻译模型是通过所述平行语料集训练得到的。
在一个实施例中,处理器执行计算机程序时还实现:根据所述平行语料集 对预设的基础翻译模型进行训练,得到所述初始翻译模型;针对每种样本翻译类型,采用所述样本翻译类型的平行语料对所述初始翻译模型进行训练,得到训练后的初始翻译模型;对所述训练后的初始翻译模型进行裁剪,以生成所述样本翻译类型对应的网络掩码。
在一个实施例中,处理器执行计算机程序时还实现:基于所述训练后的初始翻译模型中不同层之间连接线的权值,对所述训练后的初始翻译模型进行裁剪,以生成所述样本翻译类型对应的网络掩码。
在一个实施例中,处理器执行计算机程序时还实现:判断所述训练后的初始翻译模型中不同层之间连接线的权值是否大于或等于预设阈值;若所述训练后的初始翻译模型中不同层之间连接线的权值大于或等于所述预设阈值,则生成所述连接线对应的第一标志码;若所述训练后的初始翻译模型中不同层之间连接线的权值小于所述预设阈值,则生成所述连接线对应的第二标志码;将生成的所有第一标志码和所有第二标志码进行组合,得到所述样本翻译类型对应的网络掩码;其中,所述第一标志码用于表示在所述训练后的初始翻译模型中保留所述连接线,所述第二标志码用于表示从所述训练后的初始翻译模型中裁剪所述连接线。
在一个实施例中,处理器执行计算机程序时还实现:针对每种样本翻译类型,将所述样本翻译类型的平行语料中源端语料作为所述初始翻译模型的输入,所述源端语料对应的目标端语料作为期望输出,采用预设的损失函数,对所述初始翻译模型中对应的网络掩码所表示的子网络进行训练。
在一个实施例中,还提供了一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现:
获取待翻译信息以及为所述待翻译信息指定的目标翻译类型;其中,所述目标翻译类型用于指示所述待翻译信息的源语种和目标语种;
通过预训练翻译模型中与所述目标翻译类型对应的子网络,将所述待翻译信息翻译成与所述目标语种对应的翻译信息;
其中,所述预训练翻译模型是通过至少两种样本翻译类型的平行语料以及对应的网络掩码训练得到的,所述网络掩码用于控制所述预训练翻译模型中对所述样本翻译类型的语料进行处理的子网络。
上述实施例中提供的信息翻译装置、设备以及存储介质可执行本公开任意实施例所提供的信息翻译方法,具备执行该方法相应的功能模块和效果。未在上述实施例中详尽描述的技术细节,可参见本公开任意实施例所提供的信息翻译方法。
根据本公开的一个或多个实施例,提供一种信息翻译方法,包括:
获取待翻译信息以及为所述待翻译信息指定的目标翻译类型;其中,所述目标翻译类型用于指示所述待翻译信息的源语种和目标语种;
通过预训练翻译模型中与所述目标翻译类型对应的子网络,将所述待翻译信息翻译成与所述目标语种对应的翻译信息;
其中,所述预训练翻译模型是通过至少两种样本翻译类型的平行语料以及对应的网络掩码训练得到的,所述网络掩码用于控制所述预训练翻译模型中对所述样本翻译类型的语料进行处理的子网络。
可选地,不同样本翻译类型对应的网络掩码不同,以控制所述预训练翻译模型中不同子网络对所述不同样本翻译类型的语料进行处理。
根据本公开的一个或多个实施例,提供了如上的信息翻译方法,还包括:分别获取与所述目标翻译类型的源语种相同的第一样本翻译类型,以及与所述目标翻译类型的目标语种相同的第二样本翻译类型;分别获取所述第一样本翻译类型在预训练翻译模型中对应的第一子网络,以及所述第二样本翻译类型在所述预训练翻译模型中对应的第二子网络;根据所述第一子网络和所述第二子网络,确定所述目标翻译类型在所述预训练翻译模型中对应的目标子网络;通过所述目标子网络,将所述待翻译信息翻译成与所述目标语种对应的翻译信息;其中,所述第一样本翻译类型的目标语种与所述第二样本翻译类型的源语种相同。
可选地,所述预训练翻译模型包括编码器和解码器;根据本公开的一个或多个实施例,提供了如上的信息翻译方法,还包括:将所述第一子网络的编码器以及所述第二子网络的解码器进行组合,得到所述目标翻译类型在所述预训练翻译模型中对应的目标子网络。
根据本公开的一个或多个实施例,提供了如上的信息翻译方法,还包括:获取平行语料集;其中,所述平行语料集中包括至少两种样本翻译类型的平行语料;根据所述至少两种样本翻译类型的平行语料以及对应的网络掩码,对初始翻译模型进行训练,得到所述预训练翻译模型;其中,所述初始翻译模型是通过所述平行语料集训练得到的。
根据本公开的一个或多个实施例,提供了如上的信息翻译方法,还包括:根据所述平行语料集对预设的基础翻译模型进行训练,得到所述初始翻译模型;针对每种样本翻译类型,采用所述样本翻译类型的平行语料对所述初始翻译模型进行训练,得到训练后的初始翻译模型;对所述训练后的初始翻译模型进行裁剪,以生成所述样本翻译类型对应的网络掩码。
根据本公开的一个或多个实施例,提供了如上的信息翻译方法,还包括:基于所述训练后的初始翻译模型中不同层之间连接线的权值,对所述训练后的初始翻译模型进行裁剪,以生成所述样本翻译类型对应的网络掩码。
根据本公开的一个或多个实施例,提供了如上的信息翻译方法,还包括:判断所述训练后的初始翻译模型中不同层之间连接线的权值是否大于或等于预设阈值;若所述训练后的初始翻译模型中不同层之间连接线的权值大于或等于所述预设阈值,则生成所述连接线对应的第一标志码;若所述训练后的初始翻译模型中不同层之间连接线的权值小于所述预设阈值,则生成所述连接线对应的第二标志码;将生成的所有第一标志码和所有第二标志码进行组合,得到所述样本翻译类型对应的网络掩码;其中,所述第一标志码用于表示在所述训练后的初始翻译模型中保留所述连接线,所述第二标志码用于表示从所述训练后的初始翻译模型中裁剪所述连接线。
根据本公开的一个或多个实施例,提供了如上的信息翻译方法,还包括:针对每种样本翻译类型,将所述样本翻译类型的平行语料中源端语料作为所述初始翻译模型的输入,所述源端语料对应的目标端语料作为期望输出,采用预设的损失函数,对所述初始翻译模型中对应的网络掩码所表示的子网络进行训练。
此外,虽然采用特定次序描绘了多个操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了多个实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的一些特征还可以组合地实现在单个实施例中。相反地,在单个实施例的上下文中描述的多种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。

Claims (12)

  1. 一种信息翻译方法,包括:
    获取待翻译信息以及为所述待翻译信息指定的目标翻译类型;其中,所述目标翻译类型用于指示所述待翻译信息的源语种和目标语种;
    通过预训练翻译模型中与所述目标翻译类型对应的子网络,将所述待翻译信息翻译成与所述目标语种对应的翻译信息;
    其中,所述预训练翻译模型是通过至少两种样本翻译类型的平行语料以及对应的网络掩码训练得到的,所述网络掩码用于控制所述预训练翻译模型中对所述样本翻译类型的语料进行处理的子网络。
  2. 根据权利要求1所述的方法,其中,不同样本翻译类型对应的网络掩码不同,以控制所述预训练翻译模型中不同子网络对所述不同样本翻译类型的语料进行处理。
  3. 根据权利要求1所述的方法,其中,所述通过预训练翻译模型中与所述目标翻译类型对应的子网络,将所述待翻译信息翻译成与所述目标语种对应的翻译信息,包括:
    分别获取与所述目标翻译类型的源语种相同的第一样本翻译类型,以及与所述目标翻译类型的目标语种相同的第二样本翻译类型;其中,所述第一样本翻译类型的目标语种与所述第二样本翻译类型的源语种相同;
    分别获取所述第一样本翻译类型在所述预训练翻译模型中对应的第一子网络,以及所述第二样本翻译类型在所述预训练翻译模型中对应的第二子网络;
    根据所述第一子网络和所述第二子网络,确定所述目标翻译类型在所述预训练翻译模型中对应的目标子网络;
    通过所述目标子网络,将所述待翻译信息翻译成与所述目标语种对应的翻译信息。
  4. 根据权利要求3所述的方法,其中,所述预训练翻译模型包括编码器和解码器;
    所述根据所述第一子网络和所述第二子网络,确定所述目标翻译类型在所述预训练翻译模型中对应的目标子网络,包括:
    将所述第一子网络的编码器以及所述第二子网络的解码器进行组合,得到所述目标翻译类型在所述预训练翻译模型中对应的目标子网络。
  5. 根据权利要求1所述的方法,在所述获取待翻译信息以及为所述待翻译信息指定的目标翻译类型之前,所述方法还包括:
    获取平行语料集;其中,所述平行语料集中包括至少两种样本翻译类型的平行语料;
    根据所述至少两种样本翻译类型的平行语料以及对应的网络掩码,对初始翻译模型进行训练,得到所述预训练翻译模型;其中,所述初始翻译模型是通过所述平行语料集训练得到的。
  6. 根据权利要求5所述的方法,在所述根据所述至少两种样本翻译类型的平行语料以及对应的网络掩码,对初始翻译模型进行训练,得到所述预训练翻译模型之前,所述方法还包括:
    根据所述平行语料集对预设的基础翻译模型进行训练,得到所述初始翻译模型;
    针对每种样本翻译类型,采用所述样本翻译类型的平行语料对所述初始翻译模型进行训练,得到训练后的初始翻译模型;对所述训练后的初始翻译模型进行裁剪,以生成所述样本翻译类型对应的网络掩码。
  7. 根据权利要求6所述的方法,其中,所述对所述训练后的初始翻译模型进行裁剪,以生成所述样本翻译类型对应的网络掩码,包括:
    基于所述训练后的初始翻译模型中不同层之间连接线的权值,对所述训练后的初始翻译模型进行裁剪,以生成所述样本翻译类型对应的网络掩码。
  8. 根据权利要求7所述的方法,其中,所述基于所述训练后的初始翻译模型中不同层之间连接线的权值,对所述训练后的初始翻译模型进行裁剪,以生成所述样本翻译类型对应的网络掩码,包括:
    判断所述训练后的初始翻译模型中不同层之间连接线的权值是否大于或等于预设阈值;
    响应于所述训练后的初始翻译模型中不同层之间连接线的权值大于或等于所述预设阈值,生成所述连接线对应的第一标志码;其中,所述第一标志码用于表示在所述训练后的初始翻译模型中保留所述连接线;
    响应于所述训练后的初始翻译模型中不同层之间连接线的权值小于所述预设阈值,生成所述连接线对应的第二标志码;其中,所述第二标志码用于表示从所述训练后的初始翻译模型中裁剪所述连接线;
    将生成的所有第一标志码和所有第二标志码进行组合,得到所述样本翻译类型对应的网络掩码。
  9. 根据权利要求5至8中任一项所述的方法,其中,所述根据所述至少两种样本翻译类型的平行语料以及对应的网络掩码,对初始翻译模型进行训练, 得到所述预训练翻译模型,包括:
    针对每种样本翻译类型,将所述样本翻译类型的平行语料中源端语料作为所述初始翻译模型的输入,所述源端语料对应的目标端语料作为期望输出,采用预设的损失函数,对所述初始翻译模型中对应的网络掩码所表示的子网络进行训练。
  10. 一种信息翻译装置,包括:
    第一获取模块,设置为获取待翻译信息以及为所述待翻译信息指定的目标翻译类型;其中,所述目标翻译类型用于指示所述待翻译信息的源语种和目标语种;
    翻译模块,设置为通过预训练翻译模型中与所述目标翻译类型对应的子网络,将所述待翻译信息翻译成与所述目标语种对应的翻译信息;
    其中,所述预训练翻译模型是通过至少两种样本翻译类型的平行语料以及对应的网络掩码训练得到的,所述网络掩码用于控制所述预训练翻译模型中对所述样本翻译类型的语料进行处理的子网络。
  11. 一种信息翻译设备,包括存储器和处理器,所述存储器存储有计算机程序,其中,所述处理器执行所述计算机程序时实现权利要求1至9中任一项所述的信息翻译方法。
  12. 一种计算机可读存储介质,存储有计算机程序,其中,所述计算机程序被处理器执行时实现权利要求1至9中任一项所述的信息翻译方法。
PCT/CN2022/087801 2021-04-29 2022-04-20 信息翻译方法、装置、设备和存储介质 WO2022228221A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110474872.1A CN113204977B (zh) 2021-04-29 2021-04-29 信息翻译方法、装置、设备和存储介质
CN202110474872.1 2021-04-29

Publications (1)

Publication Number Publication Date
WO2022228221A1 true WO2022228221A1 (zh) 2022-11-03

Family

ID=77029436

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/087801 WO2022228221A1 (zh) 2021-04-29 2022-04-20 信息翻译方法、装置、设备和存储介质

Country Status (2)

Country Link
CN (1) CN113204977B (zh)
WO (1) WO2022228221A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113204977B (zh) * 2021-04-29 2023-09-26 北京有竹居网络技术有限公司 信息翻译方法、装置、设备和存储介质
CN114495112B (zh) * 2022-01-20 2024-07-19 北京字节跳动网络技术有限公司 图像中文本的处理方法、装置、可读介质和电子设备
CN114818748B (zh) * 2022-05-10 2023-04-21 北京百度网讯科技有限公司 用于生成翻译模型的方法、翻译方法及装置
CN116957991B (zh) * 2023-09-19 2023-12-15 北京渲光科技有限公司 三维模型补全方法

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180052828A1 (en) * 2016-08-16 2018-02-22 Samsung Electronics Co., Ltd. Machine translation method and apparatus
CN110543643A (zh) * 2019-08-21 2019-12-06 语联网(武汉)信息技术有限公司 文本翻译模型的训练方法及装置
CN111008533A (zh) * 2019-12-09 2020-04-14 北京字节跳动网络技术有限公司 一种翻译模型的获取方法、装置、设备和存储介质
CN111046677A (zh) * 2019-12-09 2020-04-21 北京字节跳动网络技术有限公司 一种翻译模型的获取方法、装置、设备和存储介质
CN111709249A (zh) * 2020-05-29 2020-09-25 北京百度网讯科技有限公司 多语种模型的训练方法、装置、电子设备和存储介质
CN112270200A (zh) * 2020-11-11 2021-01-26 北京有竹居网络技术有限公司 一种文本信息的翻译方法、装置、电子设备和存储介质
CN112633017A (zh) * 2020-12-24 2021-04-09 北京百度网讯科技有限公司 翻译模型训练、翻译处理方法、装置、设备和存储介质
CN113204977A (zh) * 2021-04-29 2021-08-03 北京有竹居网络技术有限公司 信息翻译方法、装置、设备和存储介质

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100527125C (zh) * 2007-05-29 2009-08-12 中国科学院计算技术研究所 一种统计机器翻译中的在线翻译模型选择方法和系统
CN110472251B (zh) * 2018-05-10 2023-05-30 腾讯科技(深圳)有限公司 翻译模型训练的方法、语句翻译的方法、设备及存储介质
CN110874537B (zh) * 2018-08-31 2023-06-27 阿里巴巴集团控股有限公司 多语言翻译模型的生成方法、翻译方法及设备
CN109190134B (zh) * 2018-11-21 2023-05-30 科大讯飞股份有限公司 一种文本翻译方法及装置
CN111259850B (zh) * 2020-01-23 2022-12-16 同济大学 一种融合随机批掩膜和多尺度表征学习的行人重识别方法
CN111814975B (zh) * 2020-07-09 2023-07-28 广东工业大学 一种基于剪枝的神经网络模型构建方法及相关装置

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180052828A1 (en) * 2016-08-16 2018-02-22 Samsung Electronics Co., Ltd. Machine translation method and apparatus
CN110543643A (zh) * 2019-08-21 2019-12-06 语联网(武汉)信息技术有限公司 文本翻译模型的训练方法及装置
CN111008533A (zh) * 2019-12-09 2020-04-14 北京字节跳动网络技术有限公司 一种翻译模型的获取方法、装置、设备和存储介质
CN111046677A (zh) * 2019-12-09 2020-04-21 北京字节跳动网络技术有限公司 一种翻译模型的获取方法、装置、设备和存储介质
CN111709249A (zh) * 2020-05-29 2020-09-25 北京百度网讯科技有限公司 多语种模型的训练方法、装置、电子设备和存储介质
CN112270200A (zh) * 2020-11-11 2021-01-26 北京有竹居网络技术有限公司 一种文本信息的翻译方法、装置、电子设备和存储介质
CN112633017A (zh) * 2020-12-24 2021-04-09 北京百度网讯科技有限公司 翻译模型训练、翻译处理方法、装置、设备和存储介质
CN113204977A (zh) * 2021-04-29 2021-08-03 北京有竹居网络技术有限公司 信息翻译方法、装置、设备和存储介质

Also Published As

Publication number Publication date
CN113204977B (zh) 2023-09-26
CN113204977A (zh) 2021-08-03

Similar Documents

Publication Publication Date Title
WO2022228221A1 (zh) 信息翻译方法、装置、设备和存储介质
JP7208952B2 (ja) 対話モデルを生成するための方法及び装置
CN110472251B (zh) 翻译模型训练的方法、语句翻译的方法、设备及存储介质
CN111008533B (zh) 一种翻译模型的获取方法、装置、设备和存储介质
CN111046677B (zh) 一种翻译模型的获取方法、装置、设备和存储介质
CN113139391B (zh) 翻译模型的训练方法、装置、设备和存储介质
US8682640B2 (en) Self-configuring language translation device
WO2022116841A1 (zh) 文本翻译方法、装置、设备及存储介质
CN112883968B (zh) 图像字符识别方法、装置、介质及电子设备
CN111597825B (zh) 语音翻译方法、装置、可读介质及电子设备
WO2022100481A1 (zh) 一种文本信息的翻译方法、装置、电子设备和存储介质
WO2022116821A1 (zh) 基于多语言机器翻译模型的翻译方法、装置、设备和介质
WO2022127620A1 (zh) 语音唤醒方法、装置、电子设备及存储介质
CN112883966B (zh) 图像字符识别方法、装置、介质及电子设备
CN111382261B (zh) 摘要生成方法、装置、电子设备及存储介质
CN111368560A (zh) 文本翻译方法、装置、电子设备及存储介质
JP2023515392A (ja) 情報処理方法、システム、装置、電子機器及び記憶媒体
CN112883967A (zh) 图像字符识别方法、装置、介质及电子设备
WO2022116819A1 (zh) 模型训练方法及装置、机器翻译方法及装置、设备、存储介质
CN112309384B (zh) 一种语音识别方法、装置、电子设备及介质
CN115640815A (zh) 翻译方法、装置、可读介质及电子设备
CN114765025A (zh) 语音识别模型的生成方法、识别方法、装置、介质及设备
WO2022121859A1 (zh) 口语信息处理方法、装置和电子设备
CN112257459B (zh) 语言翻译模型的训练方法、翻译方法、装置和电子设备
CN111221424B (zh) 用于生成信息的方法、装置、电子设备和计算机可读介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22794692

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22794692

Country of ref document: EP

Kind code of ref document: A1