WO2023082900A1 - 用于机器翻译的方法、设备和介质 - Google Patents

用于机器翻译的方法、设备和介质 Download PDF

Info

Publication number
WO2023082900A1
WO2023082900A1 PCT/CN2022/123788 CN2022123788W WO2023082900A1 WO 2023082900 A1 WO2023082900 A1 WO 2023082900A1 CN 2022123788 W CN2022123788 W CN 2022123788W WO 2023082900 A1 WO2023082900 A1 WO 2023082900A1
Authority
WO
WIPO (PCT)
Prior art keywords
bilingual
source language
phrase
bilingual phrase
phrases
Prior art date
Application number
PCT/CN2022/123788
Other languages
English (en)
French (fr)
Inventor
王明轩
蒋庆男
孙泽维
曹军
Original Assignee
北京有竹居网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京有竹居网络技术有限公司 filed Critical 北京有竹居网络技术有限公司
Publication of WO2023082900A1 publication Critical patent/WO2023082900A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Definitions

  • the present disclosure relates to methods, devices and media for machine translation.
  • PTM pre-trained model
  • PTM pre-trained model
  • hint-based learning has emerged as an attractive approach for adapting PTMs to specific tasks. With either human-created cues or automatically created cues, PTM can achieve good performance in many downstream tasks without fine-tuning. Unlike fine-tuning and feature-based adaptation, hint-based learning does not require additional training for downstream tasks. It formulates the downstream task as a language model fill-in-the-blank task with hints.
  • hint-based learning using a pre-trained language model to predict a specific task involves 3 stages: (i) constructing hints with some unfilled gaps based on the input; (ii) filling these gaps with the pre-trained model. unfilled gaps; and (iii) deriving the final prediction from filled gaps.
  • the hint format depends on the pretrained model and downstream tasks. There are two main classes of hints: cloze hints, where the unfilled voids are predefined spaces, and prefix hints, where filling the voids is a continuation of the generation process utilizing the prefix. Cloze prompts are often used for natural language understanding tasks, while prefix prompts are mainly used for natural language generation tasks.
  • a method for machine translation comprising: obtaining original training data including source language sentences used for training and target language sentences used for training;
  • the bilingual phrase prompts related to the source language sentence are concatenated to the source language sentence used for training to generate new training data with bilingual phrase prompts, wherein the bilingual phrase prompts include one or more bilingual phrases, each bilingual Phrases include source language phrases and corresponding target language phrases; at least the machine translation model is pre-trained using new training data with bilingual phrase cues.
  • an apparatus for machine translation including: an original training data acquisition unit configured to acquire original training data including source language sentences for training and target language sentences for training; A new training data generation unit configured to concatenate bilingual phrase prompts related to the source language sentences used for training to the source language sentences used for training to generate new training data with bilingual phrase prompts, wherein,
  • the bilingual phrase prompt includes one or more bilingual phrases, each bilingual phrase includes a source language phrase and a corresponding target language phrase; and a pre-training unit configured to at least utilize new training data with bilingual phrase prompts to pre-train the machine translation model train.
  • an electronic device comprising: a memory; and a processor coupled to the memory, the memory having stored therein instructions which, when executed by the processor, cause the The processor executes the method according to the embodiment of the present disclosure.
  • a non-transitory computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the method according to the embodiments of the present disclosure is implemented.
  • a computer program product comprising instructions which, when executed by a processor, implement the method according to the embodiments of the present disclosure.
  • a computer program including instructions, which implement the method according to the embodiments of the present disclosure when executed by a processor.
  • FIG. 1 is a schematic diagram showing a comparison between a method for machine translation according to an embodiment of the present disclosure and an existing Vanilla method
  • FIG. 2 is a flowchart illustrating a method for machine translation according to an embodiment of the present disclosure
  • FIG. 3 is a flow chart showing a method for retrieving bilingual phrases from a bilingual phrase database according to an embodiment of the present disclosure
  • Fig. 4 is a flow chart illustrating the process of translating a source language phrase into a target language phrase based on a bilingual phrase database
  • FIG. 5 shows an example of English-German translation in the medical field according to an embodiment of the present disclosure
  • FIG. 6 is a block diagram illustrating an apparatus for machine translation according to an embodiment of the present disclosure
  • FIG. 7 is a block diagram illustrating an electronic device according to an embodiment of the present disclosure.
  • FIG. 8 is a block diagram showing an example structure of a computer system employable in an embodiment of the present disclosure.
  • comprising and its variants used in the present disclosure mean an open term including at least the following elements/features but not excluding other elements/features, ie “including but not limited to”.
  • the term “comprising” and its variants used in the present disclosure mean an open term that includes at least the following elements/features but does not exclude other elements/features, namely “comprising but not limited to”. Thus, including is synonymous with containing.
  • the term “based on” means “based at least in part on”.
  • references throughout this specification to "one embodiment,” “some embodiments,” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure.
  • the term “one embodiment” means “at least one embodiment”; the term “another embodiment” means “at least one additional embodiment”; the term “some embodiments” means “at least some embodiments.”
  • appearances of the phrase “in one embodiment,” “in some embodiments,” or “in an embodiment” in various places throughout the specification are not necessarily all referring to the same embodiment, but may also refer to the same embodiment. Example.
  • Embodiments of the present disclosure can effectively solve the difficulties encountered when applying hint-based learning to machine translation.
  • Embodiments of the present disclosure utilize bilingual phrase prompts and use prompt-aware pre-training (PAPT) for prompt-based machine translation.
  • PAPT prompt-aware pre-training
  • the embodiments of the present disclosure construct bilingual phrase prompts for machine translation.
  • Bilingual phrase hints are concatenated from phrase-level translation examples (e.g., bilingual phrases) to alleviate the sparsity of sentence-level translation examples.
  • phrase-level translation examples e.g., bilingual phrases
  • embodiments of the present disclosure are able to build input-related hints that can provide useful knowledge for translation generation.
  • the hints can be made aware in the pre-training. Therefore, a hint-aware pre-training task can be designed, which can be a sequence-to-sequence generation task.
  • Embodiments of the present disclosure pre-train a hint-aware model for machine translation, thereby being able to mitigate inconsistencies between pre-training and hint-based predictions.
  • FIG. 1 is a schematic diagram showing a comparison between a method for machine translation according to an embodiment of the present disclosure and an existing vanilla method.
  • the input is the source language sentence in the training sample
  • the output is the target language sentence in the training sample
  • the prompt is the bilingual phrase prompt.
  • PAPT is a cue-aware pre-training method according to an embodiment of the present disclosure. In PAPT, not only each training sample including the source language sentence and the target language sentence is used for training, but also for each training sample, bilingual phrase prompts are concatenated to the source language sentence and then combined with the new training sample formed by the target language sentence. train.
  • concatenating the bilingual phrase prompt to the source language sentence refers to using the bilingual phrase prompt as a prefix of the source language sentence.
  • the embodiments of the present disclosure are not limited thereto, for example, the bilingual phrase prompt may be used as a suffix rather than a prefix of the source language sentence.
  • FIG. 2 is a flowchart illustrating a method 200 for machine translation according to an embodiment of the present disclosure.
  • original training data including source language sentences for training and target language sentences for training are obtained.
  • the original training data can be translation data in general domain, for example, WMT14EN-DE dataset.
  • step S220 bilingual phrase prompts related to the source language sentences for training are concatenated to the source language sentences for training to generate new training data with bilingual phrase prompts.
  • the bilingual phrase prompt includes one or more bilingual phrases, and each bilingual phrase includes a source language phrase and a corresponding target language phrase.
  • Bilingual phrase hints can provide useful knowledge for machine translation.
  • Multiple bilingual phrases may be separated by a first tag (eg, ⁇ r>).
  • the source language phrase and the corresponding target language phrase in the bilingual phrase may be separated by a second tag (eg, ⁇ q>).
  • Bilingual phrase cues related to source language sentences for training and source language sentences for training may be separated by a third tag (eg, ⁇ p>).
  • Bilingual phrase prompts can be constructed by retrieving bilingual phrases from a pre-built database of bilingual phrases.
  • bilingual phrases related to the source language sentences used for training can be retrieved from the pre-built bilingual phrase database to construct bilingual phrase hints related to the source language sentences used for training.
  • bilingual phrases related to the source language sentences to be translated can be retrieved from a pre-built bilingual phrase database to construct bilingual phrase hints related to the source language sentences to be translated.
  • Bilingual phrase databases can be pre-built and can be offline.
  • the multilingual BERT proposed by Devlin et al. in "BERT: Pre-training of deep bidirectional transformers for language understanding.
  • pages 4171-4186 can be used to extract bilingual phrases from parallel translation data and calculate Context representations of source language phrases to build a bilingual phrase database.
  • the contextual representation of the source language phrase and the corresponding bilingual phrase are stored as key-value pairs in the bilingual phrase database.
  • the context representation of the source language phrase is used as the key in the key-value pair, and the corresponding bilingual phrase is used as the value in the key-value pair.
  • a bilingual phrase database is a collection of key-value pairs created from parallel translation data.
  • the method for extracting bilingual phrases may include: First, you can use Dou et al. in "Word alignment by fine-tuning embeddings on parallel corpora.In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 2112 –2128" to extract word alignments; the algorithm described in "Statistical Machine Translation” by Koehn et al. can then be used to extract bilingual phrases from the word alignments.
  • the contextual representation of a phrase can be computed by pooling the hidden states of the words in the phrase. Subword segmentation can be performed using the joint byte pair encoding method described by Sennrich et al. in "Neural machine translation of rare words with subword units. In Proc. of ACL, pages 1715–1725" with 32k merge operations.
  • FIG. 3 is a flowchart illustrating a method 300 for retrieving bilingual phrases from a bilingual phrase database according to an embodiment of the present disclosure.
  • step S310 the source language phrases in the bilingual phrase database are loaded into the dictionary tree.
  • step S320 the source language phrases existing in the dictionary tree in the source language sentence are extracted.
  • step S330 the context representation of the source language phrase in the source language sentence is calculated.
  • step S340 the bilingual phrases are retrieved from the bilingual phrase database based on the context representation of the source language phrases in the source language sentences.
  • the most similar bilingual phrases can be retrieved from a bilingual phrase database based on the L2 distance of the contextual representation of the source language phrases.
  • the second similar bilingual phrase is retrieved from the bilingual phrase database based on the contextual representation of the source language phrase to avoid the excessive Fit the retrieved bilingual phrases.
  • the new training data includes source language sentences and target language sentences concatenated with bilingual phrase prompts.
  • step S230 at least the new training data with bilingual phrase prompts is used to pre-train the machine translation model.
  • both original training data and new training data with bilingual phrase prompts can be used to pre-train the machine translation model to obtain a model with better prediction accuracy.
  • the machine translation model may be an encoder-decoder model. Encoder-decoder models can be pretrained for hint-based learning in machine translation. As described in "Attention is all you need. In Proc. of NeurIPS, pages 5998–6008" by Vaswani et al., the encoder-decoder model can utilize a Transformer architecture. Machine translation models can be implemented based on the Fairseq method described by Ott et al. in "fairseq: A fast, extensible toolkit for sequence modeling. In Proc.
  • the method 200 for machine translation may further include step S240.
  • step S240 a source language sentence to be translated is received, and a pre-trained machine translation model is used to translate the source language sentence to be translated into a target language sentence to be output.
  • a pre-trained machine translation model can translate source language sentences with bilingual phrase prompts.
  • the bilingual phrase hints associated with the source language sentence to be translated are concatenated to the source language sentence to be translated to generate the source language sentence to be translated with bilingual phrase hints.
  • the source language sentences to be translated with bilingual phrase prompts are then fed into the pretrained machine translation model.
  • a pre-trained machine translation model can translate source language sentences that do not have bilingual phrase hints. In this case the source language sentences to be translated are fed into a pretrained machine translation model.
  • bilingual phrase prompts are manually created for translation intervention.
  • lexical constraints can be expressed as bilingual phrase hints to intervene in lexical choice during translation. Assume a lexical constraint to translate a word x in the input sentence to a target word y in the output sentence. This lexical constraint can be expressed as a bilingual phrase hint "x ⁇ q>y" and the input sentence with this hint translated.
  • Lexical constraints can be specified as soft lexical constraints. Compared with hard lexical constraints, the advantage of soft lexical constraints is that there is no need to specify the word form of the phrase.
  • FIG. 4 is a flowchart illustrating a process 400 of translating source language phrases into target language phrases based on a bilingual phrase database.
  • the source language phrase 402 in the source language sentence 401 to be translated is extracted, and the bilingual phrase 405 is retrieved from the bilingual phrase database 404 to construct a bilingual phrase prompt 406 .
  • the most similar bilingual phrase 405 may be retrieved from a bilingual phrase database 404 by computing a contextual representation 403 of the source language phrase 402 and based on the contextual representation 403 .
  • the constructed bilingual phrase prompt 406 is concatenated to the source language sentence 401 to be translated to generate a source language sentence 407 with a bilingual phrase prompt.
  • the source language sentence 407 with bilingual phrase prompts is input into the pre-trained machine translation model 408 to be translated into the target language sentence 409 to be output.
  • Table 1 compares the BLEU scores of the PAPT scheme of the embodiment of the present disclosure and the existing Vanilla scheme.
  • the database size indicates the number of bilingual phrases in the database.
  • PAPT no hints
  • PAPT with prompts
  • Table 2 shows the sizes of the datasets used for training, development and testing respectively.
  • the PAPT (without prompting) scheme has comparable performance to the Vanilla scheme when translating English to German and German to English in a specific domain.
  • the PAPT (with hints) scheme achieves 6.7 and 5.6 higher average BLEU scores than the Vanilla scheme without additional training.
  • the results show that bilingual phrase hints are helpful for domain-specific machine translation.
  • Table 3 compares the performance of the PAPT scheme of the embodiment of the present disclosure and the existing Vanilla scheme in English-German translation under the condition of lexical constraints.
  • the test set used in Table 3 is the Wikomary and IATE ( Interactive Terminology for Europe) terminology database.
  • Table 3 compared to Vanilla, the PAPT scheme of the disclosed embodiment achieves 10.7% and 12.3% accuracy improvements in Witechnischary and IATE, respectively, without additional training.
  • the precision represents the ratio of occurrences of the target phrase in the translation output.
  • Overall translation performance as measured by the BLEU score improves slightly. This shows that PAPT is able to effectively incorporate lexical constraints into the translation process. Moreover, by concatenating multiple bilingual phrases as hints, PAPT can conveniently incorporate multiple lexical constraints.
  • FIG. 5 shows an example of English-German translation in the medical field according to an embodiment of the present disclosure. It can be seen from Figure 5 that PAPT will output different target language sentences when the bilingual phrase prompts are different. It can be seen that the translation process can be intervened by modifying bilingual phrase prompts.
  • FIG. 6 is a block diagram illustrating an apparatus 600 for machine translation according to an embodiment of the present disclosure.
  • the device 600 includes an original training data acquisition unit 601 , a new training data generation unit 602 and a pre-training unit 603 .
  • the original training data acquiring unit 601 is configured to acquire original training data including source language sentences for training and target language sentences for training.
  • the new training data generating unit 602 is configured to concatenate bilingual phrase hints related to the source language sentences for training to the source language sentences for training, so as to generate new training data with bilingual phrase hints.
  • the bilingual phrase prompt includes one or more bilingual phrases, and each bilingual phrase includes a source language phrase and a corresponding target language phrase.
  • the pre-training unit 603 is configured to at least use new training data with bilingual phrase prompts to pre-train the machine translation model.
  • the apparatus 600 may further include a translation unit 604 . Configured to receive a source language sentence to be translated and utilize a pre-trained machine translation model to translate the source language sentence to be translated into a target language sentence to be output.
  • the present disclosure identifies difficulties in applying hint-based learning to machine translation.
  • the present disclosure proposes effective methods to address this difficulty.
  • the data show that the technical solution disclosed in the present disclosure can effectively improve machine translation in a specific field and machine translation based on lexical constraints without additional training.
  • each of the above units may be implemented as an independent physical entity, or may also be implemented by a single entity (for example, a processor (CPU or DSP, etc.), an integrated circuit, etc.).
  • the above-mentioned units are shown with dotted lines in the drawings to indicate that these units may not actually exist, and the operations/functions realized by them may be realized by the processing circuit itself.
  • the device may also include a memory that can store various information generated in operation by the device, each unit included in the device, programs and data for operations, data to be transmitted by a communication unit, etc. .
  • the memory can be volatile memory and/or non-volatile memory.
  • memory may include, but is not limited to, random access memory (RAM), dynamic random access memory (DRAM), static random access memory (SRAM), read only memory (ROM), flash memory.
  • RAM random access memory
  • DRAM dynamic random access memory
  • SRAM static random access memory
  • ROM read only memory
  • flash memory flash memory
  • the device may also include a communication unit, which may be used to communicate with other devices.
  • the communication unit may be implemented in an appropriate manner known in the art, for example including communication components such as antenna arrays and/or radio frequency links, various types of interfaces, communication units and the like. It will not be described in detail here.
  • the device may also include other components not shown, such as a radio frequency link, a baseband processing unit, a network interface, a processor, a controller, and the like. It will not be described in detail here.
  • FIG. 7 is a block diagram illustrating an electronic device according to some embodiments of the present disclosure.
  • the electronic device 700 can be various types of devices, such as but not limited to mobile phones, notebook computers, digital broadcast receivers, PDAs (Personal Digital Assistants), PADs (Tablet Computers), Mobile terminals such as PMPs (Portable Multimedia Players), vehicle-mounted terminals (eg, vehicle-mounted navigation terminals), and the like, and stationary terminals such as digital TVs, desktop computers, and the like.
  • the electronic device 700 may include a display panel for displaying data and/or execution results utilized in the solutions according to the present disclosure.
  • the display panel can be in various shapes, such as a rectangular panel, an oval panel, or a polygonal panel, and the like.
  • the display panel can be not only a flat panel, but also a curved panel, or even a spherical panel.
  • the electronic device 700 of this embodiment includes: a memory 701 and a processor 702 coupled to the memory 701 .
  • the processor 702 may control other components in the electronic device 700 to perform desired functions.
  • memory 701 is used to store one or more computer readable instructions.
  • the processor 702 is used to execute computer-readable instructions, the computer-readable instructions are executed by the processor 702 to implement the method according to any of the foregoing embodiments.
  • the specific implementation and related explanations of each step of the method reference may be made to the above-mentioned embodiments, and repeated descriptions will not be repeated here.
  • processor 702 and the memory 701 may directly or indirectly communicate with each other.
  • processor 702 and memory 701 may communicate over a network.
  • a network may include a wireless network, a wired network, and/or any combination of a wireless network and a wired network.
  • the processor 702 and the memory 701 may also communicate with each other through a system bus, which is not limited in the present disclosure.
  • the processor 702 can be embodied as various appropriate processors, processing devices, etc., such as a central processing unit (CPU), a graphics processing unit (Graphics Processing Unit, GPU), a network processor (NP), etc.; Signal Processor (DSP), Application Specific Integrated Circuit (ASIC), Field Programmable Gate Array (FPGA) or other Programmable Logic Devices, Discrete Gate or Transistor Logic Devices, Discrete Hardware Components.
  • the central processing unit (CPU) may be an X86 or ARM architecture or the like.
  • memory 701 may include any combination of various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory.
  • the memory 701 may include, for example, a system memory, and the system memory stores, for example, an operating system, an application program, a boot loader (Boot Loader), a database, and other programs. Various application programs, various data, and the like can also be stored in the storage medium.
  • a system memory stores, for example, an operating system, an application program, a boot loader (Boot Loader), a database, and other programs.
  • Various application programs, various data, and the like can also be stored in the storage medium.
  • FIG. 8 is a block diagram showing an example structure of a computer system employable in an embodiment of the present disclosure.
  • a central processing unit (CPU) 801 executes various processes according to programs stored in a read only memory (ROM) 802 or loaded from a storage section 808 to a random access memory (RAM) 803 .
  • ROM read only memory
  • RAM random access memory
  • the central processing unit is only exemplary, and it may also be other types of processors, such as the various processors mentioned above.
  • ROM 802, RAM 803, and storage portion 808 may be various forms of computer-readable storage media, as described below. It should be noted that although ROM 802, RAM 803 and storage device 808 are shown separately in FIG. 8, one or more of them may be combined or located in the same or different memory or storage modules.
  • the CPU 801, ROM 802, and RAM 803 are connected to each other via a bus 804.
  • the input/output interface 805 is also connected to the bus 804 .
  • the following components are connected to the input/output interface 805: an input section 806, such as a touch screen, touch pad, keyboard, mouse, image sensor, microphone, accelerometer, gyroscope, etc.; an output section 807, including a display, such as a cathode ray tube (CRT ), a liquid crystal display (LCD), a speaker, a vibrator, etc.; a storage section 808, including a hard disk, a magnetic tape, etc.; and a communication section 809, including a network interface card such as a LAN card, a modem, and the like.
  • the communication section 809 allows communication processing to be performed via a network such as the Internet. It is easy to understand that although it is shown in FIG. 8 that each device or module in the computer system 800 communicates through the bus 804, they may also communicate through a network or other methods, wherein the network may include a wireless network, a wired network , and/or any combination of wireless and wired networks.
  • a drive 810 is also connected to the input/output interface 805 as needed.
  • a removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as necessary, so that a computer program read therefrom is installed into the storage section 808 as necessary.
  • programs constituting the software can be installed from a network such as the Internet or a storage medium such as the removable medium 811 .
  • embodiments of the present disclosure include a computer program product, which includes a computer program carried on a computer-readable medium, where the computer program includes program codes for executing the methods shown in the flowcharts.
  • the computer program may be downloaded and installed from a network via communication means 809, or from storage means 808, or from ROM 802.
  • the CPU 801 When the computer program is executed by the CPU 801, the above-mentioned functions defined in the methods of the embodiments of the present disclosure are executed.
  • a computer-readable medium may be a tangible medium that may contain or store information for use by or in conjunction with an instruction execution system, device, or device. program.
  • a computer readable medium may be a computer readable signal medium or a computer readable storage medium or any combination of the two.
  • a computer-readable storage medium may be, for example, but not limited to: an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination thereof.
  • Computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave carrying computer-readable program code therein.
  • Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can transmit, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device .
  • Program code embodied on a computer readable medium may be transmitted by any appropriate medium, including but not limited to wires, optical cables, RF (radio frequency), etc., or any suitable combination of the above.
  • the above-mentioned computer-readable medium may be included in the above-mentioned electronic device, or may exist independently without being incorporated into the electronic device.
  • a computer program including: instructions, which when executed by a processor cause the processor to execute the method of any one of the above embodiments.
  • instructions may be embodied as computer program code.
  • the computer program codes for performing the operations of the present disclosure may be written in one or more programming languages or a combination thereof, the above-mentioned programming languages include but not limited to object-oriented programming languages, Such as Java, Smalltalk, C++, also includes conventional procedural programming languages, such as the "C" language or similar programming languages.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer can be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as through the Internet using an Internet service provider). connect).
  • LAN local area network
  • WAN wide area network
  • connect an external computer (such as through the Internet using an Internet service provider).
  • each block in a flowchart or block diagram may represent a module, program segment, or portion of code that contains one or more logical functions for implementing specified executable instructions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented by a dedicated hardware-based system that performs the specified functions or operations , or may be implemented by a combination of dedicated hardware and computer instructions.
  • modules, components or units involved in the embodiments described in the present disclosure may be implemented by software or by hardware. Wherein, the name of a module, component or unit does not constitute a limitation on the module, component or unit itself under certain circumstances.
  • exemplary hardware logic components include: Field Programmable Gate Array (FPGA), Application Specific Integrated Circuit (ASIC), Application Specific Standard Product (ASSP), System on Chip (SOC), Complex Programmable Logical device (CPLD) and so on.
  • FPGA Field Programmable Gate Array
  • ASIC Application Specific Integrated Circuit
  • ASSP Application Specific Standard Product
  • SOC System on Chip
  • CPLD Complex Programmable Logical device
  • a method for machine translation comprising: obtaining original training data including source language sentences used for training and target language sentences used for training;
  • the bilingual phrase prompts related to the source language sentence are concatenated to the source language sentence used for training to generate new training data with bilingual phrase prompts, wherein the bilingual phrase prompts include one or more bilingual phrases, each bilingual Phrases include source language phrases and corresponding target language phrases; at least the machine translation model is pre-trained using new training data with bilingual phrase cues.
  • the one or more bilingual phrases are separated by a first token
  • the source language phrase and the corresponding target language phrase are separated by a second token
  • the source language sentences for training are separated by a third token.
  • bilingual phrases are retrieved from a bilingual phrase database to construct the bilingual phrase prompt.
  • the context representation of the source language phrase of each bilingual phrase and the bilingual phrase are stored as key-value pairs in the bilingual phrase database.
  • retrieving bilingual phrases from the bilingual phrase database includes: loading the source language phrases in the bilingual phrase database into the dictionary tree; extracting the source language phrases existing in the dictionary tree in the source language sentences; calculating a contextual representation of the source language phrase in the source language sentence; and retrieving the bilingual phrase from a bilingual phrase database based on the contextual representation of the source language phrase in the source language sentence.
  • the bilingual phrase database when the bilingual phrase database is not constructed based on the original training data, the most similar bilingual phrase is retrieved from the bilingual phrase database based on the contextual representation of the source language phrase, and in the If the bilingual phrase database is constructed based on the original training data, a second similar bilingual phrase is retrieved from the bilingual phrase database based on the context representation of the source language phrase.
  • the machine translation model is an encoder-decoder model.
  • At least using the new training data with bilingual phrase prompts to pre-train the machine translation model includes: using both the original training data and the new training data with bilingual phrase prompts to pre-train the machine translation model.
  • a source language sentence to be translated is received, and a pre-trained machine translation model is used to translate the source language sentence to be translated into a target language sentence to be output.
  • bilingual phrase hints associated with the source language sentence to be translated are concatenated to the source language sentence to be translated to generate a source language sentence to be translated with bilingual phrase hints; and Feed source language sentences to be translated with bilingual phrase cues into a pretrained machine translation model.
  • the bilingual phrase prompts related to the source language sentences to be translated are created manually.
  • source language phrases in the source language sentences to be translated are extracted; and bilingual phrases are retrieved from a bilingual phrase database to construct bilingual phrase prompts related to the source language sentences to be translated.
  • a contextual representation of the source language phrases in the source language sentences to be translated is computed, wherein retrieving bilingual phrases from a bilingual phrase database includes The context of phrases indicates that the most similar bilingual phrases are retrieved from the bilingual phrase database.
  • an electronic device including: a memory; and a processor coupled to the memory, the memory storing instructions that, when executed by the processor, The process is caused to perform a method according to an embodiment of the present disclosure.
  • a non-transitory computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the method according to the embodiments of the present disclosure is implemented.
  • an apparatus for machine translation including: an original training data acquisition unit configured to acquire original training data including source language sentences for training and target language sentences for training A new training data generation unit configured to concatenate bilingual phrase prompts related to the source language sentences used for training to the source language sentences used for training to generate new training data with bilingual phrase prompts, wherein , the bilingual phrase prompts include one or more bilingual phrases, each bilingual phrase includes a source language phrase and a corresponding target language phrase; and a pre-training unit configured to at least utilize new training data with bilingual phrase prompts for machine translation model pre-training.
  • a computer program comprising: instructions, which when executed by a processor cause the processor to perform a method according to an embodiment of the present disclosure.
  • a computer program product comprising instructions which when executed by a processor implement a method according to an embodiment of the present disclosure.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

一种用于机器翻译的方法、设备和介质。该方法,包括:获取包括用于训练的源语言句子和用于训练的目标语言句子的原训练数据(S210);将与所述用于训练的源语言句子相关的双语短语提示串接至所述用于训练的源语言句子,以生成具有双语短语提示的新训练数据(S220),其中,所述双语短语提示包括一个或多个双语短语,每个双语短语包括源语言短语和对应目标语言短语;至少利用具有双语短语提示的新训练数据对机器翻译模型进行预训练(S230)。

Description

用于机器翻译的方法、设备和介质
本申请要求于2021年11月10日递交、申请号为202111325941.9、名称为“用于机器翻译的方法、设备和介质”的中国专利申请的优先权,其全部内容通过引用并入本文。
技术领域
本公开涉及用于机器翻译的方法、设备和介质。
背景技术
预训练模型(PTM,pre-trained model)显著推进了自然语言处理。近年来,基于提示的学习已经成为用于使PTM适应具体任务的颇具吸引力的方法。利用人工创建的提示或者自动创建的提示,PTM可以在不需要微调的情况下在许多下游任务中实现良好的表现。与微调和基于特征的适应不同,基于提示的学习不需要对下游任务进行额外的训练。其将下游任务制定成具有提示的语言模型填空任务。一般而言,在基于提示的学习中,利用预训练语言模型对具体任务进行预测包括3个阶段:(i)基于输入构建具有一些未填充的空隙的提示;(ii)利用预训练模型填充这些未填充的空隙;以及(iii)根据填充的空隙导出最终预测。
提示格式取决于预训练模型和下游任务。主要有两类提示:完形填空式提示,其中未填充的空隙是预定义的空格;以及前缀式提示,其中对空隙进行填充是继续利用前缀的生成过程。完形填空式提示通常用于自然语言理解任务,而前缀式提示主要用于自然语言生成任务。
发明内容
提供该发明内容部分以便以简要的形式介绍构思,这些构思将在后面的具体实施方式部分被详细描述。该发明内容部分并不旨在标识要求保护的技术方案的关键特征或必要特征,也不旨在用于限制所要求的保护的技术方案的范围。
根据本公开的一些实施例,提供一种用于机器翻译的方法,包括:获取包括用于训练的源语言句子和用于训练的目标语言句子的原训练数据;将与所述用于训练的源语言句子相关的双语短语提示串接至所述用于训练的源语言句子,以生成具有双语短 语提示的新训练数据,其中,所述双语短语提示包括一个或多个双语短语,每个双语短语包括源语言短语和对应目标语言短语;至少利用具有双语短语提示的新训练数据对机器翻译模型进行预训练。
根据本公开的一些实施例,提供一种用于机器翻译的装置,包括:原训练数据获取单元,配置为获取包括用于训练的源语言句子和用于训练的目标语言句子的原训练数据;新训练数据生成单元,配置为将与所述用于训练的源语言句子相关的双语短语提示串接至所述用于训练的源语言句子,以生成具有双语短语提示的新训练数据,其中,所述双语短语提示包括一个或多个双语短语,每个双语短语包括源语言短语和对应目标语言短语;以及预训练单元,配置为至少利用具有双语短语提示的新训练数据对机器翻译模型进行预训练。
根据本公开的一些实施例,提供一种电子设备,包括:存储器;和耦接至所述存储器的处理器,所述存储器中存储有指令,所述指令当由所述处理器执行时使得所述处理器执行根据本公开的实施例的方法。
根据本公开的一些实施例,提供一种非暂态计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现根据本公开的实施例的方法。
根据本公开的一些实施例,提供一种计算机程序产品,包括指令,所述指令由处理器执行时实现根据本公开的实施例的方法。
根据本公开的一些实施例,提供一种计算机程序,包括指令,所述指令由处理器执行时实现根据本公开的实施例的方法。
通过以下参照附图对本公开的示例性实施例的详细描述,本公开的其它特征、方面及其优点将会变得清楚。
附图说明
下面参照附图说明本公开的优选实施例。此处所说明的附图用来提供对本公开的进一步理解,各附图连同下面的具体描述一起包含在本说明书中并形成说明书的一部分,用于解释本公开。应当理解的是,下面描述中的附图仅仅涉及本公开的一些实施例,而非对本公开构成限制。在附图中:
图1是示出根据本公开的实施例的用于机器翻译的方法与已有的Vanilla方法的对比的示意图;
图2是示出根据本公开的实施例的用于机器翻译的方法的流程图;
图3是示出根据本公开的实施例的从双语短语数据库中检索双语短语的方法的流程图;
图4是示出基于双语短语数据库将源语言短语翻译成目标语言短语的过程的流程图;
图5示出了根据本公开的实施例的医药领域中的英德翻译的示例;
图6是示出根据本公开的实施例的用于机器翻译的装置的框图;
图7是示出根据本公开的实施例的电子设备的框图;以及
图8是示出本公开的实施例中可采用的计算机系统的示例结构的框图。
应当明白,为了便于描述,附图中所示出的各个部分的尺寸并不一定是按照实际的比例关系绘制的。在各附图中使用了相同或相似的附图标记来表示相同或者相似的部件。因此,一旦某一项在一个附图中被定义,则在随后的附图中可能不再对其进行进一步讨论。
具体实施方式
下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,但是显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。以下对实施例的描述实际上也仅仅是说明性的,决不作为对本公开及其应用或使用的任何限制。应当理解的是,本公开可以通过各种形式来实现,而且不应该被解释为限于这里阐述的实施例。
应当理解,本公开的方法实施方式中记载的各个步骤可以按照不同的顺序执行,和/或并行执行。此外,方法实施方式可以包括附加的步骤和/或省略执行示出的步骤。本公开的范围在此方面不受限制。除非另外具体说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值应被解释为仅仅是示例性的,不限制本公开的范围。
本公开中使用的术语“包括”及其变型意指至少包括后面的元件/特征、但不排除其他元件/特征的开放性术语,即“包括但不限于”。此外,本公开使用的术语“包含”及其变型意指至少包含后面的元件/特征、但不排除其他元件/特征的开放性术语,即“包含但不限于”。因此,包括与包含是同义的。术语“基于”意指“至少部分地基于”。
整个说明书中所称“一个实施例”、“一些实施例”或“实施例”意味着与实施 例结合描述的特定的特征、结构或特性被包括在本公开的至少一个实施例中。例如,术语“一个实施例”表示“至少一个实施例”;术语“另一实施例”表示“至少一个另外的实施例”;术语“一些实施例”表示“至少一些实施例”。而且,短语“在一个实施例中”、“在一些实施例中”或“在实施例中”在整个说明书中各个地方的出现不一定全都指的是同一个实施例,但是也可以指同一个实施例。
需要注意,本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分,并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。除非另有指定,否则“第一”、“第二”等概念并非意图暗示如此描述的对象必须按时间上、空间上、排名上的给定顺序或任何其他方式的给定顺序。
需要注意,本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的,本领域技术人员应当理解,除非在上下文另有明确指出,否则应该理解为“一个或多个”。
本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的,而并不是用于对这些消息或信息的范围进行限制。
下面结合附图对本公开的实施例进行详细说明,但是本公开并不限于这些具体的实施例。下面这些具体实施例可以相互结合,对于相同或者相似的概念或过程可能在某些实施例不再赘述。此外,在一个或多个实施例中,特定的特征、结构或特性可以由本领域的普通技术人员从本公开将清楚的任何合适的方式组合。
在将基于提示的学习应用于机器翻译时存在两点主要的困难。首先,难以创建用于机器翻译的有效提示。Brown等人在“Language models are few-shot learners.In Advances in Neural Information Processing Systems,volume 33,pages 1877-1901”中提出通过串接一些上下文内翻译示例来构建提示。Liu等人在“What makes good in-context examples for gpt-3?arXiv preprint arXiv:2101.06804”中揭露下游任务的表现严重依赖于上下文内示例的选择。然而,句子级翻译示例是稀疏的。难以找到与输入句子相关的句子级翻译示例来构建有效提示。其次,语言模型的预训练任务不是联合基于提示的机器翻译预测来设计的。预训练和预测之间的不一致性限制了基于提示的学习的潜力。
本公开的实施例能够有效地解决在将基于提示的学习应用于机器翻译时所遇到的困难。本公开的实施例利用了双语短语提示,并将知晓提示的预训练(PAPT,prompt-aware pre-training)用于基于提示的机器翻译。实验表明,本公开的实施例在 无需额外训练的情况下能够将具体领域的机器翻译的BLEU分数提升6.2,并且将词法约束的机器翻译的精度提升11.5%。
首先,本公开的实施例构建了双语短语提示用于机器翻译。双语短语提示由短语级翻译示例(例如,双语短语)串接而成,以缓解句子级翻译示例的稀疏性。通过从预构建的双语短语数据库中检索相关的双语短语,本公开的实施例能够构建与输入相关的提示,其能够为翻译生成提供有用的知识。然后,为了缓解预训练和基于提示的预测之间的不一致性,可以使得在预训练中知晓提示。因此,可以设计知晓提示的预训练任务,其可以是序列到序列的生成任务。本公开的实施例对知晓提示的模型进行预训练以用于机器翻译,从而能够缓解预训练和基于提示的预测之间的不一致性。
图1是示出根据本公开的实施例的用于机器翻译的方法与已有的Vanilla方法的对比的示意图。在图1中,输入是训练样本中的源语言句子,输出是训练样本中的的目标语言句子,并且提示是双语短语提示。在已有的Vanilla方法中,仅对包括源语言子和目标语言句子的每个训练样本进行训练,而没有使用双语短语提示。PAPT是根据本公开的实施例的知晓提示的预训练方法。在PAPT中,不仅将包括源语言句子和目标语言句子的每个训练样本用于训练,而且对于每个训练样本将双语短语提示串接至源语言句子之后与目标语言句子构成的新训练样本进行训练。为了示例性地说明本公开的实施例,将双语短语提示串接至源语言句子是指将双语短语提示作为源语言句子的前缀。但是,在本公开的实施例不限于此,例如,可以将双语短语提示作为源语言句子的后缀而不是前缀。
图2是示出根据本公开的实施例的用于机器翻译的方法200的流程图。在步骤S210,获取包括用于训练的源语言句子和用于训练的目标语言句子的原训练数据。原训练数据可以是通用领域的翻译数据,例如,WMT14EN-DE数据集。
在步骤S220,将与用于训练的源语言句子相关的双语短语提示串接至用于训练的源语言句子,以生成具有双语短语提示的新训练数据。
双语短语提示包括一个或多个双语短语,每个双语短语包括源语言短语和对应目标语言短语。双语短语提示能够为机器翻译提供有用的知识。
多个双语短语之间可以由第一标记(例如,<r>)分隔。双语短语中的源语言短语和对应目标语言短语可以由第二标记(例如,<q>)分隔。与用于训练的源语言句子相关的双语短语提示和用于训练的源语言句子可以由第三标记(例如,<p>)分隔。
可以从预构建的双语短语数据库中检索双语短语来构建双语短语提示。对于用于训练 的源语言句子,可以从预构建的双语短语数据库中检索与用于训练的源语言句子相关的双语短语来构建与用于训练的源语言句子相关的双语短语提示。对于要翻译的源语言句子,可以从预构建的双语短语数据库中检索与要翻译的源语言句子相关的双语短语来构建与要翻译的源语言句子相关的双语短语提示。
双语短语数据库可以被预先构建并且可以是离线的。可以利用例如Devlin等在“BERT:Pre-training of deep bidirectional transformers for language understanding.In Proc.of NAACL-HLT,pages 4171-4186”中提出的多语种BERT从平行的翻译数据中提取双语短语并且计算源语言短语的上下文表示来构建双语短语数据库。源语言短语的上下文表示和对应双语短语作为键值对存储在双语短语数据库中。源语言短语的上下文表示作为键值对中的键,对应双语短语作为键值对中的值。双语短语数据库是从平行的翻译数据创建的键值对的集合。
提取双语短语的方法可以包括:首先,可以利用Dou等人在“Word alignment by fine-tuning embeddings on parallel corpora.In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics:Main Volume,pages 2112–2128”中描述的awesome-align方法提取词对齐;然后,可以利用Koehn等人在“Statistical Machine Translation”中描述的算法从词对齐中提取双语短语。短语的上下文表示可以通过对短语中的词的隐藏状态进行池平均来计算。可以利用Sennrich等人在“Neural machine translation of rare words with subword units.In Proc.of ACL,pages 1715–1725”中描述的联合字节对编码方法通过32k次合并操作进行子词分割。
图3是示出根据本公开的实施例的从双语短语数据库中检索双语短语的方法300的流程图。在步骤S310,将双语短语数据库中的源语言短语加载到字典树中。在步骤S320,提取源语言句子中的存在于字典树中的源语言短语。在步骤S330,计算源语言句子中的源语言短语的上下文表示。在步骤S340,基于源语言句子中的源语言短语的上下文表示从双语短语数据库中检索双语短语。
一般情况下,可以基于源语言短语的上下文表示的L 2距离从双语短语数据库中检索最相似的双语短语。然而,在双语短语数据库是基于原训练数据构建的情况下,基于源语言短语的上下文表示从双语短语数据库中检索第二相似的双语短语,以避免检索到从当前翻译样本提取的双语短语而过度拟合检索到的双语短语。
在检索到双语短语之后就可以用其构建双语短语提示并将构建的双语短语提示串接 至用于训练的源语言句子,以生成具有双语短语提示的新训练数据。该新训练数据包括串接了双语短语提示的源语言句子和目标语言句子。
回到图2,在步骤S230,至少利用具有双语短语提示的新训练数据对机器翻译模型进行预训练。在一些实施例中,可以利用原训练数据和具有双语短语提示的新训练数据二者对机器翻译模型进行预训练,以获得具有较好预测精度的模型。
可以对机器翻译模型进行一轮或多轮(例如,10轮)预训练以获得具有较好预测精度的模型。预训练的过程中可以采用交叉熵损失。机器翻译模型可以是编码器-解码器模型。可以对编码器-解码器模型进行预训练以用于机器翻译中的基于提示的学习。如Vaswani等人在“Attention is all you need.In Proc.of NeurIPS,pages 5998–6008”中所述,编码器-解码器模型可以利用转换器(Transformer)架构。可以基于Ott等人在“fairseq:A fast,extensible toolkit for sequence modeling.In Proc.of NAACL-Demonstrations,pages 48–53”中描述的Fairseq方法来实现机器翻译模型。为了进行高效的双语短语检索,可以参考Johnson等人在“Billion-scale similarity search with gpus.arXiv preprint arXiv:1702.08734”中描述的内容构建具有FAISS的IVFPQ索引。
在本公开的一些实施例中,用于机器翻译的方法200还可以包括步骤S240。在步骤S240,接收要翻译的源语言句子,并且利用经预训练的机器翻译模型将要翻译的源语言句子翻译成要输出的目标语言句子。在本公开的一些实施例中,经预训练的机器翻译模型可以翻译具有双语短语提示的源语言句子。在这种情况下,将与要翻译的源语言句子相关的双语短语提示串接至要翻译的源语言句子,以生成具有双语短语提示的要翻译的源语言句子。然后,将具有双语短语提示的要翻译的源语言句子输入经预训练的机器翻译模型。在本公开的一些实施例中,经预训练的机器翻译模型可以翻译不具有双语短语提示的源语言句子。在这种情况下将要翻译的源语言句子输入经预训练的机器翻译模型。
在本公开的一些实施例中,通过手动创建双语短语提示以用于进行翻译干预。例如,词法约束可以表示为双语短语提示以干预翻译过程中的词汇选择。假设词法约束为将输入句子中的词语x翻译成输出句子中的目标词语y。可以将该词法约束表示为双语短语提示“x<q>y”,并且翻译具有该提示的输入句子。可以将词法约束指定为软词法约束。相比于硬词法约束,软词法约束的好处是不需要指定短语的词形。
在本公开的一些实施例中,通过从双语短语数据库中检索双语短语来构建与要翻译的源语言句子相关的双语短语提示。图4是示出基于双语短语数据库将源语言短语翻译成 目标语言短语的过程400的流程图。首先,提取要翻译的源语言句子401中的源语言短语402,并从双语短语数据库404中检索双语短语405来构建双语短语提示406。可以通过计算源语言短语402的上下文表示403并基于上下文表示403从双语短语数据库404中检索最相似的双语短语405。然后,将构建的双语短语提示406串接至要翻译的源语言句子401以生成具有双语短语提示的源语言句子407。最后,将具有双语短语提示的源语言句子407输入经预训练的机器翻译模型408以翻译成要输出的目标语言句子409。
表1对比了本公开的实施例的PAPT方案与已有的Vanilla方案的BLEU分数。在表1中,数据库大小表示数据库中双语短语的数量。PAPT(无提示)表示在翻译阶段不将双语短语提示串接至要翻译的源语言句子。PAPT(有提示)表示在翻译阶段将双语短语提示串接至要翻译的源语言句子。表2示出了分别用于训练、开发和测试的数据集的大小。
如表1所示,在特定领域中将英语翻译成德语和将德语翻译成英语时,PAPT(无提示)方案与Vanilla方案具有相当的性能。PAPT(有提示)方案不需要额外训练就能比Vanilla方案的平均BLEU分数分别高6.7分和5.6分。结果表明双语短语提示对特定领域的机器翻译是有帮助的。
表1
Figure PCTCN2022123788-appb-000001
表2
Figure PCTCN2022123788-appb-000002
表3对比了本公开的实施例的PAPT方案与已有的Vanilla方案在具有词法约束的情况下进行英德翻译的表现。表3中所使用的测试集是Susanto等人在“Lexically constrained neural machine translation with levenshtein transformer.In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics,pages3536–3543”中通过Wiktionary和IATE(Interactive Terminology for Europe)术语数据库提取的。如表3所示,相比于Vanilla,本公开的实施例的PAPT方案在没有额外训练的情况下分别在Wiktionary和IATE中获得了10.7%和12.3%的精度提升。该精度表示目标短语在翻译输出中出现的比率。BLEU分数测量的整体翻译性能稍微有提升。这表明PAPT能够将词法约束有效地结合到翻译过程中。而且,通过串接多个双语短语作为提示,PAPT能够方便地合并多个词法约束。
表3
Figure PCTCN2022123788-appb-000003
图5示出了根据本公开的实施例的医药领域中的英德翻译的示例。从图5中可以看出,在双语短语提示不同的情况下,PAPT会输出不同的目标语言句子。可见,通过修改双语短语提示可以干预翻译过程。
图6是示出根据本公开的实施例的用于机器翻译的装置600的框图。如图6所示,装置600包括原训练数据获取单元601、新训练数据生成单元602以及预训练单元603。原训练数据获取单元601被配置为获取包括用于训练的源语言句子和用于训练的目标语言句子的原训练数据。新训练数据生成单元602被配置为将与所述用于训练的源语言句子相关的双语短语提示串接至所述用于训练的源语言句子,以生成具有双语短语提示的新训练数据。双语短语提示包括一个或多个双语短语,每个双语短语包括源语言短语和对应目标语言短语。预训练单元603被配置为至少利用具有双语短语提示的新训练数据对机器翻译模型进行预训练。
在本公开的一些实施例中,装置600还可以包括翻译单元604。被配置为接收要翻译的源语言句子,并且利用经预训练的机器翻译模型将要翻译的源语言句子翻译成要输 出的目标语言句子。
由于图6中每个单元所执行的操作的具体实现方式已经在前面进行了详细的描述,所以这里不再赘述。
如上所述,本公开发现了将基于提示的学习应用于机器翻译时的困难。本公开提出了有效的方法来解决该困难。数据表明,本公开的技术方案能够有效地提升特定领域的机器翻译以及基于词法约束的机器翻译而不需要额外的训练。
应注意,上述各个单元仅是根据其所实现的具体功能划分的逻辑模块,而不是用于限制具体的实现方式,例如可以以软件、硬件或者软硬件结合的方式来实现。在实际实现时,上述各个单元可被实现为独立的物理实体,或者也可由单个实体(例如,处理器(CPU或DSP等)、集成电路等)来实现。此外,上述各个单元在附图中用虚线示出指示这些单元可以并不实际存在,而它们所实现的操作/功能可由处理电路本身来实现。
此外,尽管未示出,该设备也可以包括存储器,其可以存储由设备、设备所包含的各个单元在操作中产生的各种信息、用于操作的程序和数据、将由通信单元发送的数据等。存储器可以是易失性存储器和/或非易失性存储器。例如,存储器可以包括但不限于随机存储存储器(RAM)、动态随机存储存储器(DRAM)、静态随机存取存储器(SRAM)、只读存储器(ROM)、闪存存储器。当然,存储器可也位于该设备之外。可选地,尽管未示出,但是该设备也可以包括通信单元,其可用于与其它装置进行通信。在一个示例中,通信单元可以被按照本领域已知的适当方式来实现,例如包括天线阵列和/或射频链路等通信部件,各种类型的接口、通信单元等等。这里将不再详细描述。此外,设备还可以包括未示出的其它部件,诸如射频链路、基带处理单元、网络接口、处理器、控制器等。这里将不再详细描述。
本公开的一些实施例还提供一种电子设备。图7是示出根据本公开的一些实施例的电子设备的框图。例如,在一些实施例中,电子设备700可以为各种类型的设备,例如可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。例如,电子设备700可以包括显示面板,以用于显示根据本公开的方案中所利用的数据和/或执行结果。例如,显示面板可以为各种形状,例如矩形面板、椭圆形面板或多边形面板等。另外,显示 面板不仅可以为平面面板,也可以为曲面面板,甚至球面面板。
如图7所示,该实施例的电子设备700包括:存储器701以及耦接至该存储器701的处理器702。应当注意,图7所示的电子设备700的组件只是示例性的,而非限制性的,根据实际应用需要,该电子设备700还可以具有其他组件。处理器702可以控制电子设备700中的其它组件以执行期望的功能。
在一些实施例中,存储器701用于存储一个或多个计算机可读指令。处理器702用于运行计算机可读指令时,计算机可读指令被处理器702运行时实现根据上述任一实施例所述的方法。关于该方法的各个步骤的具体实现以及相关解释内容可以参见上述的实施例,重复之处在此不作赘述。
例如,处理器702和存储器701之间可以直接或间接地互相通信。例如,处理器702和存储器701可以通过网络进行通信。网络可以包括无线网络、有线网络、和/或无线网络和有线网络的任意组合。处理器702和存储器701之间也可以通过系统总线实现相互通信,本公开对此不作限制。
例如,处理器702可以体现为各种适当的处理器、处理装置等,诸如中央处理器(CPU)、图形处理器(Graphics Processing Unit,GPU)、网络处理器(NP)等;还可以是数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。中央处理元(CPU)可以为X86或ARM架构等。例如,存储器701可以包括各种形式的计算机可读存储介质的任意组合,例如易失性存储器和/或非易失性存储器。存储器701例如可以包括系统存储器,系统存储器例如存储有操作系统、应用程序、引导装载程序(Boot Loader)、数据库以及其他程序等。在存储介质中还可以存储各种应用程序和各种数据等。
另外,根据本公开的一些实施例,根据本公开的各种操作/处理在通过软件和/或固件实现的情况下,可从存储介质或网络向具有专用硬件结构的计算机系统,例如图8所示的计算机系统800安装构成该软件的程序,该计算机系统在安装有各种程序时,能够执行各种功能,包括诸如前文所述的功能等等。图8是示出本公开的实施例中可采用的计算机系统的示例结构的框图。
在图8中,中央处理单元(CPU)801根据只读存储器(ROM)802中存储的程序或从存储部分808加载到随机存取存储器(RAM)803的程序执行各种处理。在RAM 803中,也根据需要存储当CPU 801执行各种处理等时所需的数据。中央处理单元仅仅是示例性的,其也可以是其它类型的处理器,诸如前文所述的各种处理器。 ROM 802、RAM 803和存储部分808可以是各种形式的计算机可读存储介质,如下文所述。需要注意的是,虽然图8中分别示出了ROM 802、RAM 803和存储装置808,但是它们中的一个或多个可以合并或者位于相同或不同的存储器或存储模块中。
CPU 801、ROM 802和RAM 803经由总线804彼此连接。输入/输出接口805也连接到总线804。
下述部件连接到输入/输出接口805:输入部分806,诸如触摸屏、触摸板、键盘、鼠标、图像传感器、麦克风、加速度计、陀螺仪等;输出部分807,包括显示器,比如阴极射线管(CRT)、液晶显示器(LCD),扬声器,振动器等;存储部分808,包括硬盘,磁带等;和通信部分809,包括网络接口卡比如LAN卡、调制解调器等。通信部分809允许经由网络比如因特网执行通信处理。容易理解的是,虽然图8中示出计算机系统800中的各个装置或模块是通过总线804来通信的,但它们也可以通过网络或其它方式进行通信,其中,网络可以包括无线网络、有线网络、和/或无线网络和有线网络的任意组合。
根据需要,驱动器810也连接到输入/输出接口805。可拆卸介质811比如磁盘、光盘、磁光盘、半导体存储器等等根据需要被安装在驱动器810上,使得从中读出的计算机程序根据需要被安装到存储部分808中。
在通过软件实现上述系列处理的情况下,可以从网络比如因特网或存储介质比如可拆卸介质811安装构成软件的程序。
根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置809从网络上被下载和安装,或者从存储装置808被安装,或者从ROM 802被安装。在该计算机程序被CPU 801执行时,执行本公开实施例的方法中限定的上述功能。
需要说明的是,在本公开的上下文中,计算机可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是,但不限于:电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携 式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。
在一些实施例中,还提供了一种计算机程序,包括:指令,指令当由处理器执行时使处理器执行上述任一个实施例的方法。例如,指令可以体现为计算机程序代码。
在本公开的实施例中,可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括但不限于面向对象的程序设计语言,诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言,诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络(包括局域网(LAN)或广域网(WAN))连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和 /或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施例中所涉及到的模块、部件或单元可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,模块、部件或单元的名称在某种情况下并不构成对该模块、部件或单元本身的限定。
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示例性的硬件逻辑部件包括:现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、片上系统(SOC)、复杂可编程逻辑设备(CPLD)等等。
根据本公开的一些实施例,提供一种用于机器翻译的方法,包括:获取包括用于训练的源语言句子和用于训练的目标语言句子的原训练数据;将与所述用于训练的源语言句子相关的双语短语提示串接至所述用于训练的源语言句子,以生成具有双语短语提示的新训练数据,其中,所述双语短语提示包括一个或多个双语短语,每个双语短语包括源语言短语和对应目标语言短语;至少利用具有双语短语提示的新训练数据对机器翻译模型进行预训练。
根据本公开的一些实施例,所述一个或多个双语短语由第一标记分隔,所述源语言短语和所述对应目标语言短语由第二标记分隔,所述双语短语提示和所述用于训练的源语言句子由第三标记分隔。
根据本公开的一些实施例,从双语短语数据库中检索双语短语来构建所述双语短语提示。
根据本公开的一些实施例,每个双语短语的源语言短语的上下文表示和该双语短语作为键值对存储在所述双语短语数据库中。
根据本公开的一些实施例,从双语短语数据库中检索双语短语包括:将双语短语数据库中的源语言短语加载到字典树中;提取源语言句子中的存在于字典树中的源语言短语;计算源语言句子中的源语言短语的上下文表示;以及基于源语言句子中的源语言短语的上下文表示从双语短语数据库中检索双语短语。
根据本公开的一些实施例,在所述双语短语数据库不是基于所述原训练数据构建的情况下,基于源语言短语的上下文表示从所述双语短语数据库中检索最相似的双语短语,以及在所述双语短语数据库是基于所述原训练数据构建的情况下,基于源语言短语的上下文表示从所述双语短语数据库中检索第二相似的双语短语。
根据本公开的一些实施例,所述机器翻译模型是编码器-解码器模型。
根据本公开的一些实施例,至少利用具有双语短语提示的新训练数据对机器翻译模型进行预训练包括:利用原训练数据和具有双语短语提示的新训练数据二者对机器翻译模型进行预训练。
根据本公开的一些实施例,接收要翻译的源语言句子,并且利用经预训练的机器翻译模型将所述要翻译的源语言句子翻译成要输出的目标语言句子。
根据本公开的一些实施例,将与所述要翻译的源语言句子相关的双语短语提示串接至所述要翻译的源语言句子,以生成具有双语短语提示的要翻译的源语言句子;以及将具有双语短语提示的要翻译的源语言句子输入经预训练的机器翻译模型。
根据本公开的一些实施例,与所述要翻译的源语言句子相关的双语短语提示是手动创建的。
根据本公开的一些实施例,提取所述要翻译的源语言句子中的源语言短语;以及从双语短语数据库中检索双语短语来构建与所述要翻译的源语言句子相关的双语短语提示。
根据本公开的一些实施例,计算所述要翻译的源语言句子中的源语言短语的上下文表示,其中,从双语短语数据库中检索双语短语包括基于所述要翻译的源语言句子中的源语言短语的上下文表示从所述双语短语数据库中检索最相似的双语短语。
根据本公开的一些实施例,提供了一种电子设备,包括:存储器;和耦接至所述存储器的处理器,所述存储器中存储有指令,所述指令当由所述处理器执行时,使得所述处理执行根据本公开的实施例的方法。
根据本公开的一些实施例,提供了一种非暂态计算机可读存储介质,其上存储有计算机程序,该程序由处理器执行时实现根据本公开的实施例的方法。
根据本公开的一些实施例,提供了一种用于机器翻译的装置,包括:原训练数据获取单元,配置为获取包括用于训练的源语言句子和用于训练的目标语言句子的原训练数据;新训练数据生成单元,配置为将与所述用于训练的源语言句子相关的双语短语提示串接至所述用于训练的源语言句子,以生成具有双语短语提示的新训练数据,其中,所述双语短语提示包括一个或多个双语短语,每个双语短语包括源语言短语和对应目标语言短语;以及预训练单元,配置为至少利用具有双语短语提示的新训练数据对机器翻译模型进行预训练。
根据本公开的一些实施例,提供一种计算机程序,包括:指令,指令当由处理器 执行时使处理器执行根据本公开的实施例的方法。
根据本公开的一些实施例,提供一种计算机程序产品,包括指令,所述指令当由处理器执行时实现根据本公开的实施例的方法。
以上描述仅为本公开的一些实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开中所涉及的公开范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述公开构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。
在本文提供的描述中,阐述了许多特定细节。然而,理解的是,可以在没有这些特定细节的情况下实施本公开的实施例。在其他情况下,为了不模糊该描述的理解,没有对众所周知的方法、结构和技术进行详细展示。
此外,虽然采用特定次序描绘了各操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实施例中。相反地,在单个实施例的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。
虽然已经通过示例对本公开的一些特定实施例进行了详细说明,但是本领域的技术人员应该理解,以上示例仅是为了进行说明,而不是为了限制本公开的范围。本领域的技术人员应该理解,可在不脱离本公开的范围和精神的情况下,对以上实施例进行修改。本公开的范围由所附权利要求来限定。

Claims (18)

  1. 一种用于机器翻译的方法,包括:
    获取包括用于训练的源语言句子和用于训练的目标语言句子的原训练数据;
    将与所述用于训练的源语言句子相关的双语短语提示串接至所述用于训练的源语言句子,以生成具有双语短语提示的新训练数据,其中,所述双语短语提示包括一个或多个双语短语,每个双语短语包括源语言短语和对应目标语言短语;
    至少利用具有双语短语提示的新训练数据对机器翻译模型进行预训练。
  2. 如权利要求1所述的方法,其中,所述一个或多个双语短语由第一标记分隔,所述源语言短语和所述对应目标语言短语由第二标记分隔,所述双语短语提示和所述用于训练的源语言句子由第三标记分隔。
  3. 如权利要求1所述的方法,还包括:
    从双语短语数据库中检索双语短语来构建所述双语短语提示。
  4. 如权利要求3所述的方法,还包括:
    每个双语短语的源语言短语的上下文表示和该双语短语作为键值对存储在所述双语短语数据库中。
  5. 如权利要求3所述的方法,其中,从双语短语数据库中检索双语短语包括:
    将双语短语数据库中的源语言短语加载到字典树中;
    提取源语言句子中的存在于字典树中的源语言短语;
    计算源语言句子中的源语言短语的上下文表示;以及
    基于源语言句子中的源语言短语的上下文表示从双语短语数据库中检索双语短语。
  6. 如权利要求3所述的方法,其中,
    在所述双语短语数据库不是基于所述原训练数据构建的情况下,基于源语言短语的上下文表示从所述双语短语数据库中检索最相似的双语短语,以及
    在所述双语短语数据库是基于所述原训练数据构建的情况下,基于源语言短语的上下 文表示从所述双语短语数据库中检索第二相似的双语短语。
  7. 如权利要求1所述的方法,其中,所述机器翻译模型是编码器-解码器模型。
  8. 如权利要求1所述的方法,其中,至少利用具有双语短语提示的新训练数据对机器翻译模型进行预训练包括:
    利用原训练数据和具有双语短语提示的新训练数据二者对机器翻译模型进行预训练。
  9. 如权利要求1所述的方法,还包括:
    接收要翻译的源语言句子,并且利用经预训练的机器翻译模型将所述要翻译的源语言句子翻译成要输出的目标语言句子。
  10. 如权利要求9所述的方法,还包括:
    将与所述要翻译的源语言句子相关的双语短语提示串接至所述要翻译的源语言句子,以生成具有双语短语提示的要翻译的源语言句子;以及
    将具有双语短语提示的要翻译的源语言句子输入经预训练的机器翻译模型。
  11. 如权利要求10所述的方法,其中,与所述要翻译的源语言句子相关的双语短语提示是手动创建的。
  12. 如权利要求10所述的方法,所述方法还包括:
    提取所述要翻译的源语言句子中的源语言短语;以及
    从双语短语数据库中检索双语短语来构建与所述要翻译的源语言句子相关的双语短语提示。
  13. 如权利要求12所述的方法,还包括:
    计算所述要翻译的源语言句子中的源语言短语的上下文表示,
    其中,从双语短语数据库中检索双语短语包括基于所述要翻译的源语言句子中的源语言短语的上下文表示从所述双语短语数据库中检索最相似的双语短语。
  14. 一种电子设备,包括:
    存储器;和
    耦接至所述存储器的处理器,所述存储器中存储有指令,所述指令当由所述处理器执行时,使得所述处理执行根据权利要求1-13中任一项所述的方法。
  15. 一种非暂态计算机可读存储介质,其上存储有计算机程序,该程序由处理器执行时实现根据权利要求1-13中任一项所述的方法。
  16. 一种用于机器翻译的装置,包括:
    原训练数据获取单元,配置为获取包括用于训练的源语言句子和用于训练的目标语言句子的原训练数据;
    新训练数据生成单元,配置为将与所述用于训练的源语言句子相关的双语短语提示串接至所述用于训练的源语言句子,以生成具有双语短语提示的新训练数据,其中,所述双语短语提示包括一个或多个双语短语,每个双语短语包括源语言短语和对应目标语言短语;以及
    预训练单元,配置为至少利用具有双语短语提示的新训练数据对机器翻译模型进行预训练。
  17. 一种计算机程序产品,包括指令,所述指令由处理器执行时实现根据权利要求1-13中任一项所述的方法。
  18. 一种计算机程序,包括指令,所述指令由处理器执行时实现根据权利要求1-13中任一项所述的方法。
PCT/CN2022/123788 2021-11-10 2022-10-08 用于机器翻译的方法、设备和介质 WO2023082900A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111325941.9A CN113887253A (zh) 2021-11-10 2021-11-10 用于机器翻译的方法、设备和介质
CN202111325941.9 2021-11-10

Publications (1)

Publication Number Publication Date
WO2023082900A1 true WO2023082900A1 (zh) 2023-05-19

Family

ID=79017160

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/123788 WO2023082900A1 (zh) 2021-11-10 2022-10-08 用于机器翻译的方法、设备和介质

Country Status (2)

Country Link
CN (1) CN113887253A (zh)
WO (1) WO2023082900A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113887253A (zh) * 2021-11-10 2022-01-04 北京有竹居网络技术有限公司 用于机器翻译的方法、设备和介质
CN114444523A (zh) * 2022-02-10 2022-05-06 北京间微科技有限责任公司 一种便携式的离线机器翻译智能盒子

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106372187A (zh) * 2016-08-31 2017-02-01 中译语通科技(北京)有限公司 一种面向大数据的跨语言检索方法
CN111401079A (zh) * 2018-12-14 2020-07-10 波音公司 神经网络机器翻译模型的训练方法、装置及存储介质
CN111460838A (zh) * 2020-04-23 2020-07-28 腾讯科技(深圳)有限公司 智能翻译模型的预训练方法、装置和存储介质
WO2020242567A1 (en) * 2019-05-27 2020-12-03 Microsoft Technology Licensing, Llc Cross-lingual task training
CN113297841A (zh) * 2021-05-24 2021-08-24 哈尔滨工业大学 基于预训练双语词向量的神经机器翻译方法
CN113887253A (zh) * 2021-11-10 2022-01-04 北京有竹居网络技术有限公司 用于机器翻译的方法、设备和介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106372187A (zh) * 2016-08-31 2017-02-01 中译语通科技(北京)有限公司 一种面向大数据的跨语言检索方法
CN111401079A (zh) * 2018-12-14 2020-07-10 波音公司 神经网络机器翻译模型的训练方法、装置及存储介质
WO2020242567A1 (en) * 2019-05-27 2020-12-03 Microsoft Technology Licensing, Llc Cross-lingual task training
CN111460838A (zh) * 2020-04-23 2020-07-28 腾讯科技(深圳)有限公司 智能翻译模型的预训练方法、装置和存储介质
CN113297841A (zh) * 2021-05-24 2021-08-24 哈尔滨工业大学 基于预训练双语词向量的神经机器翻译方法
CN113887253A (zh) * 2021-11-10 2022-01-04 北京有竹居网络技术有限公司 用于机器翻译的方法、设备和介质

Also Published As

Publication number Publication date
CN113887253A (zh) 2022-01-04

Similar Documents

Publication Publication Date Title
KR102401942B1 (ko) 번역품질 평가 방법 및 장치
WO2023082900A1 (zh) 用于机器翻译的方法、设备和介质
US20200410396A1 (en) Implicit bridging of machine learning tasks
CN112256860B (zh) 客服对话内容的语义检索方法、系统、设备及存储介质
US20200387677A1 (en) Electronic device and method for controlling the electronic device thereof
CN109522552B (zh) 一种医疗信息的归一化方法、装置、介质及电子设备
US11308286B2 (en) Method and device for retelling text, server, and storage medium
CN107861954B (zh) 基于人工智能的信息输出方法和装置
CN110457708B (zh) 基于人工智能的词汇挖掘方法、装置、服务器及存储介质
US20220293092A1 (en) Method and apparatus of training natural language processing model, and method and apparatus of processing natural language
WO2023082916A1 (zh) 训练方法、语音翻译方法、设备和计算机可读介质
CN109710951B (zh) 基于翻译历史的辅助翻译方法、装置、设备及存储介质
WO2021184769A1 (zh) 神经网络文本翻译模型的运行方法、装置、设备、及介质
CN115983294B (zh) 翻译模型的训练方法、翻译方法及设备
WO2023082931A1 (zh) 用于语音识别标点恢复的方法、设备和存储介质
CN113139391A (zh) 翻译模型的训练方法、装置、设备和存储介质
CN110889295B (zh) 机器翻译模型、伪专业平行语料的确定方法、系统及设备
US20210073480A1 (en) Automatic preprocessing for black box translation
KR20210125449A (ko) 업계 텍스트를 증분하는 방법, 관련 장치 및 매체에 저장된 컴퓨터 프로그램
CN110472241B (zh) 生成去冗余信息句向量的方法及相关设备
CN113591498B (zh) 翻译处理方法、装置、设备及介质
CN115620726A (zh) 语音文本生成方法、语音文本生成模型的训练方法、装置
WO2024146328A1 (zh) 翻译模型的训练方法、翻译方法及设备
Pozo et al. A hand-held multimedia translation and interpretation system for diet management
US11314725B2 (en) Integrated review and revision of digital content

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22891701

Country of ref document: EP

Kind code of ref document: A1