WO2019113783A1 - 一种机器翻译数字泛化方法及系统、计算机、计算机程序 - Google Patents

一种机器翻译数字泛化方法及系统、计算机、计算机程序 Download PDF

Info

Publication number
WO2019113783A1
WO2019113783A1 PCT/CN2017/115691 CN2017115691W WO2019113783A1 WO 2019113783 A1 WO2019113783 A1 WO 2019113783A1 CN 2017115691 W CN2017115691 W CN 2017115691W WO 2019113783 A1 WO2019113783 A1 WO 2019113783A1
Authority
WO
WIPO (PCT)
Prior art keywords
translation
generalization
digital
training
word
Prior art date
Application number
PCT/CN2017/115691
Other languages
English (en)
French (fr)
Inventor
贝超
程国艮
Original Assignee
中译语通科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中译语通科技股份有限公司 filed Critical 中译语通科技股份有限公司
Priority to US16/315,655 priority Critical patent/US10929619B2/en
Publication of WO2019113783A1 publication Critical patent/WO2019113783A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • G06F40/45Example-based machine translation; Alignment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/42Data-driven translation
    • G06F40/47Machine-assisted translation, e.g. using translation memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks

Definitions

  • the invention belongs to the technical field of computer software, and in particular relates to a method and system for computer translation digital generalization, a computer and a computer program.
  • Machine translation is a process that uses machine learning techniques to translate a natural language into another natural language.
  • existing machine translations are primarily neural network machine translations based on the coding mechanism of the attention mechanism.
  • the shortcoming of neural network machine translation is that it is difficult to control the content, and it is difficult to artificially control the output of the model.
  • the most obvious problem is the digital translation error, that is, the number of the original text and the translation are inconsistent or the translation is not translated.
  • the existing problems in the prior art are: the current neural network model has difficulty in controlling the content, and it is difficult to artificially control the output of the model, and the numbers of the original text and the translation are inconsistent or the translation is not translated.
  • the present invention provides a machine translation digital generalization method and system System, computer, computer program.
  • the present invention is implemented as a machine translation digital generalization method, which comprises: performing special processing on a training corpus, and performing a training phase of normal training without changing the structure of the neural network model. Replace the generalized label in the translated translation with the translation phase of the normal translation;
  • training phase includes:
  • translation phase specifically includes:
  • Another object of the present invention is to provide a machine translation digital generalization system for the machine translation digital generalization method, the machine translation digital generalization system comprising:
  • Training module for special processing of training corpus
  • a translation module for replacing a generalized label in a translated translation with a normal translation.
  • the training module further includes:
  • a first word finding unit for finding a word or phrase containing a number
  • a first replacement unit for using a parallel corpus of words or phrases containing numbers to provide an alternate translation for the translation phase
  • a training unit that replaces the normal training of corpus for digital generalization tags.
  • the translation module further includes:
  • the second word-finding unit is configured to perform similar training corpus processing on the original text, and replace the word or phrase containing the number as a generalized label, which is consistent with the format of the training corpus;
  • Another object of the present invention is to provide a computer program for implementing the machine translation digital generalization method.
  • Another object of the present invention is to provide a computer on which the computer program is mounted.
  • Another object of the present invention is to provide a computer readable storage medium comprising instructions which, when executed on a computer, cause the computer to perform the machine translation digital generalization method.
  • the invention expands the application of generalization technology in neural network machine translation.
  • the model can be directly intervened, but the neural network model cannot use the same strategy.
  • the pre-processing and post-processing can be applied to apply the generalization technology, and the generalization technology is extended in the neural network machine translation.
  • the application is better adapted to the new machine translation model structure. It can not only translate words or phrases containing numbers more accurately, but also replace the numbers in the vocabulary with generalized labels, which reduces the size of the vocabulary and improves the training efficiency of the neural network model.
  • FIG. 1 is a flow chart of a machine translation digital generalization method according to an embodiment of the present invention.
  • FIG. 2 is a schematic structural diagram of a machine translation digital generalization system according to an embodiment of the present invention.
  • training module 1-1, first word finding unit; 1-2, first replacement unit; 1-3, training unit; 2, translation module; 2-1, second word finding unit; - 2, second replacement unit; 2-3, replacement translation unit.
  • the digital generalization of the present invention can alleviate such problems simply and effectively, and can not only translate digital words or phrases more accurately, but also reduce the size of the vocabulary, so that the numbers in the vocabulary are replaced with generalized labels, thereby improving The efficiency of training.
  • the machine translation digital generalization method includes the following steps:
  • S101 in the training phase, after the word segmentation, find words; align and replace with labels; normal training;
  • S102 in the translation stage, find a word; replace the label as a label and translate; replace the label with a translation.
  • the machine translation digital generalization system provided by the embodiment of the present invention includes:
  • Training module 1 for performing special processing on the training corpus
  • the translation module 2 is configured to replace the generalized label in the translated translation with a normal translation.
  • the training module 1 further includes:
  • the first word finding unit 1-1 is used to find a word or phrase containing a number.
  • the first replacement unit 1-2 is configured to provide a parallel translation of the word or phrase containing the number to provide an alternate translation for the translation phase.
  • Training unit 1-3 is used to replace the normal training of the corpus of the digital generalized label.
  • Translation module 2 further includes:
  • the second word finding unit 2-1 is configured to perform similar training corpus processing on the original text, and replace the word or phrase containing the number as a generalized label, which is consistent with the format of the training corpus.
  • the second replacing unit 2-2 is configured to translate the originally processed original text.
  • the training phase includes: first finding a word after word segmentation; secondly aligning and replacing with a tag; finally, normal training;
  • the translation phase includes: first finding a word after segmentation; then replacing the tag as a tag and translating; and finally replacing the tag with the translation.
  • training phase specifically includes:
  • the corpus is based on the hidden Markov model-based word segmentation algorithm, and the normal word segmentation is performed. According to different language features, regular expressions are used to find words or phrases containing numbers. This step is mainly to find words or phrases containing numbers. To prepare for the word alignment in the next step.
  • the alignment tool uses the alignment tool to perform word alignment, calculate the co-occurrence word frequency in the bilingual corpus, find the translation corresponding to the word or phrase containing the number, and replace it with a digital label; wherein the parallel corpus of the word or phrase containing the number can be the translation stage Provide a replacement translation.
  • translation phase specifically includes:
  • the original text is segmented and the regular expression is used to find the word or phrase containing the number, replaced by the digital generalization label; this step is similar to the first step of the training phase, and the original text is treated similarly to the training corpus.
  • a word or phrase containing a number is a generalized label that is consistent with the format of the training corpus.
  • the attention information in the neural network model is used to find the original text corresponding to the digital generalization label in the translation, and then the digital generalization label is replaced with the translation according to the word pair information obtained by using the word alignment tool in the training phase, that is, The final translation is available.
  • the computer program product comprises one or more computer instructions.
  • Loading or executing the computer program on a computer means The processes or functions described in accordance with embodiments of the present invention are generated in whole or in part.
  • the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
  • the computer instructions can be stored in a computer readable storage medium or transferred from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions can be from a website site, computer, server or data center Transmission to another website site, computer, server, or data center by wire (eg, coaxial cable, fiber optic, digital subscriber line (DSL), or wireless (eg, infrared, wireless, microwave, etc.).
  • the computer readable storage The medium may be any available medium that the computer can access or a data storage device such as a server, data center, or the like that includes one or more available media.
  • the available media may be magnetic media (eg, floppy disk, hard disk, tape), Optical medium (for example, DVD), or semiconductor medium (such as solid state disk (SSD)).

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)

Abstract

一种机器翻译数字泛化方法及系统、计算机、计算机程序,属于计算机软件技术领域,所述方法包括以下步骤:用于对训练语料进行特殊的处理,在不改变神经网络模型结构的情况下进行正常训练的训练阶段;把翻译得到的译文中的泛化标签替换为正常的译文的翻译阶段。所述方法只是改变了预处理以及后处理即可应用泛化技术,拓展了泛化技术在神经网络机器翻译中的应用,更好地适应了新的机器翻译模型结构,既可以更为准确地翻译含有数字的词或者短语,又可以把词表中的数字替换为泛化标签,降低了词表大小,提高了神经网络模型训练效率。

Description

一种机器翻译数字泛化方法及系统、计算机、计算机程序 技术领域
本发明属于计算机软件技术领域,尤其涉及一种机器翻译数字泛化方法及系统、计算机、计算机程序。
背景技术
机器翻译是一种利用机器学习的技术将一种自然语言翻译成另外一种自然语言的过程。作为计算语言学的一个重要分支,涉及认知科学、语言学等学科,是人工智能的终极目标之一。不同于基于短语的统计机器翻译,现有机器翻译主要是基于注意力机制的编码到解码结构的神经网络机器翻译。神经网络机器翻译的缺点为对内容难以控制,很难再对模型的输出进行人为的调控。其中最为明显的问题就在于,数字翻译错误,即原文与译文的数字不一致或者漏译多译。然而,很难在模型中控制对于数字翻译问题,对于译文,也很难进行后处理来弥补错误。因此,要避免这简单的数字翻译问题,进一步提高翻译质量,是个急需解决但不好解决的问题。数字泛化技术在神经网络机器翻译中,既无法指定标签不翻译,也不能指定替换为原文,这是由神经网络模型的可控程度差导致的。
综上所述,现有技术存在的问题是:目前的神经网络模型存在内容难以控制,很难再对模型的输出进行人为的调控,原文与译文的数字不一致或者漏译多译的问题。
发明内容
针对现有技术存在的问题,本发明提供了一种机器翻译数字泛化方法及系 统、计算机、计算机程序。
本发明是这样实现的,一种机器翻译数字泛化方法,所述机器翻译数字泛化方法包括:对训练语料进行特殊的处理,在不改变神经网络模型结构的情况下进行正常训练的训练阶段;把翻译得到的译文中的泛化标签替换为正常的译文的翻译阶段;
进一步,所述训练阶段包括:
(1)对语料使用基于隐马尔科夫模型的分词算法,正常的分词,根据不同的语言使用正则表达式找到含有数字的词或者短语;
(2)词对齐,统计双语语料中共现词对频率,找到含有数字的词或者短语所对应的翻译,并替换为数字标签;
(3)已经替换为数字泛化标签的语料正常训练。
进一步,所述翻译阶段具体包括:
(1)对原文分词找到含有数字的词或者短语,替换为数字泛化标签;
(2)泛化后的语料训练的神经网络模型,对已泛化处理的原文进行翻译;
(3)得到译文后,通过神经网络模型中的注意力信息,找到译文中数字泛化标签所对应的原文,根据训练阶段使用词对齐工具得到的词对信息,把数字泛化标签替换为译文,得到最后的译文。
本发明的另一目的在于提供一种所述机器翻译数字泛化方法的机器翻译数字泛化系统,所述机器翻译数字泛化系统包括:
训练模块,用于对训练语料进行特殊的处理;
翻译模块,用于把翻译得到的译文中的泛化标签替换为正常的译文。
所述训练模块进一步包括:
第一找词单元,用于找到含有数字的词或者短语;
第一替换单元,用于将含有数字的词或者短语的平行语料可为翻译阶段提供替换的译文;
训练单元,用于替换数字泛化标签的语料正常训练。
所述翻译模块进一步包括:
第二找词单元,用于对原文进行类似训练语料的处理,替换其中含有数字的词或者短语为泛化标签,与训练语料的格式一致;
第二替换单元,用于对已泛化处理的原文进行翻译;
替换译文单元,用于替换译文中的数字泛化标签,得到正常的译文。
本发明的另一目的在于提供一种实现所述机器翻译数字泛化方法的计算机程序。
本发明的另一目的在于提供一种搭载有所述计算机程序的计算机。
本发明的另一目的在于提供一种计算机可读存储介质,包括指令,当其在计算机上运行时,使得计算机执行所述的机器翻译数字泛化方法。
本发明拓展了泛化技术在神经网络机器翻译中的应用,基于短语的统计机器翻译中,可直接干预模型,而神经网络模型却无法使用相同的策略。针对神经网络模型较难人为干预的情况,在不改变神经网络机器翻译模型的前提下,只是改变了预处理以及后处理即可应用泛化技术,拓展了泛化技术在神经网络机器翻译中的应用,更好地适应了新的机器翻译模型结构。既可以更为准确地翻译含有数字的词或者短语,又可以把词表中的数字替换为泛化标签,降低了词表大小,提高了神经网络模型训练效率。
附图说明
图1是本发明实施例提供的机器翻译数字泛化方法流程图。
图2是本发明实施例提供的机器翻译数字泛化系统结构示意图;
图中:1、训练模块;1-1、第一找词单元;1-2、第一替换单元;1-3、训练单元;2、翻译模块;2-1、第二找词单元;2-2、第二替换单元;2-3、替换译文单元。
具体实施方式
为了使本发明的目的、技术方案及优点更加清楚明白,以下结合实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。
本发明的数字泛化可以简单却有效地缓解这样的问题,既可更为准确地翻译数字的词或者短语,也可以减少词表大小,使得词表中的数字均替换为泛化标签,提高了训练的效率。
下面结合附图对本发明的应用原理作详细的描述。
如图1所示,本发明实施例提供的机器翻译数字泛化方法包括以下步骤:
S101:训练阶段,分词后找词;对齐并替换为标签;正常训练;
S102:翻译阶段,找词;替换标签为标签并翻译;把标签替换回译文。
如图2所示,本发明实施例提供的机器翻译数字泛化系统包括:
训练模块1,用于对训练语料进行特殊的处理;
翻译模块2,用于把翻译得到的译文中的泛化标签替换为正常的译文。
训练模块1进一步包括:
第一找词单元1-1,用于找到含有数字的词或者短语。
第一替换单元1-2,用于将含有数字的词或者短语的平行语料可为翻译阶段提供替换的译文。
训练单元1-3,用于替换数字泛化标签的语料正常训练。
翻译模块2进一步包括:
第二找词单元2-1,用于对原文进行类似训练语料的处理,替换其中含有数字的词或者短语为泛化标签,与训练语料的格式一致。
第二替换单元2-2,用于对已泛化处理的原文进行翻译。
替换译文单元2-3,用于替换译文中的数字泛化标签,得到正常的译文。
下面结合具体实施例对本发明的应用原理作进一步的描述。
本发明实施例提供的机器翻译数字泛化方法包括以下步骤:
对训练语料进行特殊的处理,在不改变神经网络模型结构的情况下进行正 常训练的训练阶段;
把翻译得到的译文中的泛化标签替换为正常的译文的翻译阶段;
所述训练阶段包括:首先进行分词后找词;其次对齐并替换为标签;最后,正常训练;
所述翻译阶段包括:首先进行分词后找词;然后替换标签为标签并翻译;最后把标签替换回译文。
进一步,所述训练阶段具体包括:
首先,对语料使用基于隐马尔科夫模型的分词算法,进行正常的分词,并且根据不同的语言特点,使用正则表达式找到含有数字的词或者短语;这一步主要是找到含有数字的词或者短语,为后一步的词对齐做准备。
然后使用对齐工具进行词对齐,统计双语语料中共现词对频率,找到含有数字的词或者短语所对应的翻译,并替换为数字标签;其中,含有数字的词或者短语的平行语料可为翻译阶段提供替换的译文。
最后使用已经替换为数字泛化标签的语料进行正常训练。
进一步,所述翻译阶段具体包括:
首先对原文进行分词并使用正则表达式找到含有数字的词或者短语,替换为数字泛化标签;这一步与训练阶段的第一步是类似的,同样对原文进行类似训练语料的处理,替换其中含有数字的词或者短语为泛化标签,做到与训练语料的格式一致。
使用由泛化后的语料训练的神经网络模型,对已泛化处理的原文进行翻译;
得到译文后,通过神经网络模型中的注意力信息,找到译文中数字泛化标签所对应的原文,再根据训练阶段使用词对齐工具得到的词对信息,把数字泛化标签替换为译文,即可得到最后的译文。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用全部或部分地以计算机程序产品的形式实现,所述计算机程序产品包括一个或多个计算机指令。在计算机上加载或执行所述计算机程序指 令时,全部或部分地产生按照本发明实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读取存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘Solid State Disk(SSD))等。
以上所述仅为本发明的较佳实施例而已,并不用以限制本发明,凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等,均应包含在本发明的保护范围之内。

Claims (9)

  1. 一种机器翻译数字泛化方法,其特征在于,所述机器翻译数字泛化方法包括:对训练语料进行特殊的处理,在不改变神经网络模型结构的情况下进行正常训练的训练阶段;把翻译得到的译文中的泛化标签替换为正常的译文的翻译阶段。
  2. 如权利要求1所述的机器翻译数字泛化方法,其特征在于,所述训练阶段包括:
    (1)对语料使用基于隐马尔科夫模型的分词算法,正常的分词,根据不同的语言使用正则表达式找到含有数字的词或者短语;
    (2)词对齐,统计双语语料中共现词对频率,找到含有数字的词或者短语所对应的翻译,并替换为数字标签;
    (3)已经替换为数字泛化标签的语料正常训练。
  3. 如权利要求1所述的机器翻译数字泛化方法,其特征在于,所述翻译阶段具体包括:
    (1)对原文分词找到含有数字的词或者短语,替换为数字泛化标签;
    (2)泛化后的语料训练的神经网络模型,对已泛化处理的原文进行翻译;
    (3)得到译文后,通过神经网络模型中的注意力信息,找到译文中数字泛化标签所对应的原文,根据训练阶段使用词对齐工具得到的词对信息,把数字泛化标签替换为译文,得到最后的译文。
  4. 一种如权利要求1所述机器翻译数字泛化方法的机器翻译数字泛化系统,其特征在于,所述机器翻译数字泛化系统包括:
    训练模块,用于对训练语料进行特殊的处理;
    翻译模块,用于把翻译得到的译文中的泛化标签替换为正常的译文。
  5. 如权利要求4所述的机器翻译数字泛化系统,其特征在于,所述训练模块进一步包括:
    第一找词单元,用于找到含有数字的词或者短语;
    第一替换单元,用于将含有数字的词或者短语的平行语料可为翻译阶段提供替换的译文;
    训练单元,用于替换数字泛化标签的语料正常训练。
  6. 如权利要求4所述的机器翻译数字泛化系统,其特征在于,所述翻译模块进一步包括:
    第二找词单元,用于对原文进行类似训练语料的处理,替换其中含有数字的词或者短语为泛化标签,与训练语料的格式一致;
    第二替换单元,用于对已泛化处理的原文进行翻译;
    替换译文单元,用于替换译文中的数字泛化标签,得到正常的译文。
  7. 一种实现权利要求1~3任意一项所述机器翻译数字泛化方法的计算机程序。
  8. 一种搭载有权利要求7所述计算机程序的计算机。
  9. 一种计算机可读存储介质,包括指令,当其在计算机上运行时,使得计算机执行如权利要求1~3任意一项所述的机器翻译数字泛化方法。
PCT/CN2017/115691 2017-12-11 2017-12-12 一种机器翻译数字泛化方法及系统、计算机、计算机程序 WO2019113783A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/315,655 US10929619B2 (en) 2017-12-11 2017-12-12 Numerical generalization method for machine translation and system, computer and computer program thereof

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711309873.0 2017-12-11
CN201711309873.0A CN107967263A (zh) 2017-12-11 2017-12-11 一种机器翻译数字泛化方法及系统、计算机、计算机程序

Publications (1)

Publication Number Publication Date
WO2019113783A1 true WO2019113783A1 (zh) 2019-06-20

Family

ID=61999626

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/115691 WO2019113783A1 (zh) 2017-12-11 2017-12-12 一种机器翻译数字泛化方法及系统、计算机、计算机程序

Country Status (3)

Country Link
US (1) US10929619B2 (zh)
CN (1) CN107967263A (zh)
WO (1) WO2019113783A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112417897A (zh) * 2020-11-30 2021-02-26 上海携旅信息技术有限公司 词对齐模型训练、文本处理的方法、系统、设备和介质

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11466554B2 (en) * 2018-03-20 2022-10-11 QRI Group, LLC Data-driven methods and systems for improving oil and gas drilling and completion processes
US11506052B1 (en) 2018-06-26 2022-11-22 QRI Group, LLC Framework and interface for assessing reservoir management competency
CN109359304B (zh) * 2018-08-22 2023-04-18 新译信息科技(深圳)有限公司 限定性神经网络机器翻译方法及存储介质
CN109558599B (zh) * 2018-11-07 2023-04-18 北京搜狗科技发展有限公司 一种转换方法、装置和电子设备
CN109871550B (zh) * 2019-01-31 2022-11-22 沈阳雅译网络技术有限公司 一种基于后处理技术的提高数字翻译质量的方法
CN111563387B (zh) * 2019-02-12 2023-05-02 阿里巴巴集团控股有限公司 语句相似度确定方法及装置、语句翻译方法及装置
CN109902314B (zh) * 2019-04-18 2023-11-24 中译语通科技股份有限公司 一种术语的翻译方法和装置
CN110765792A (zh) * 2019-11-01 2020-02-07 北京中献电子技术开发有限公司 基于词类别的神经网络机器翻译方法及系统、训练方法
CN111178088B (zh) * 2019-12-20 2023-06-02 沈阳雅译网络技术有限公司 一种面向xml文档的可配置神经机器翻译方法
CN113255337B (zh) * 2021-05-21 2024-02-02 广州欢聚时代信息科技有限公司 词表构建方法、机器翻译方法及其装置、设备与介质
CN115130481A (zh) * 2022-06-16 2022-09-30 京东科技信息技术有限公司 一种模型训练、机器翻译方法、装置、设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105068998A (zh) * 2015-07-29 2015-11-18 百度在线网络技术(北京)有限公司 基于神经网络模型的翻译方法及装置
CN106663092A (zh) * 2014-10-24 2017-05-10 谷歌公司 具有罕见词处理的神经机器翻译系统
CN106815215A (zh) * 2015-11-30 2017-06-09 华为技术有限公司 生成标注库的方法和装置
CN107329960A (zh) * 2017-06-29 2017-11-07 哈尔滨工业大学 一种上下文敏感的神经网络机器翻译中未登录词翻译装置和方法
US20170323203A1 (en) * 2016-05-06 2017-11-09 Ebay Inc. Using meta-information in neural machine translation

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4419871B2 (ja) * 2005-03-02 2010-02-24 富士ゼロックス株式会社 翻訳依頼装置およびプログラム
US8145473B2 (en) * 2006-10-10 2012-03-27 Abbyy Software Ltd. Deep model statistics method for machine translation
US9798720B2 (en) * 2008-10-24 2017-10-24 Ebay Inc. Hybrid machine translation
US20140163951A1 (en) * 2012-12-07 2014-06-12 Xerox Corporation Hybrid adaptation of named entity recognition
CN104298662B (zh) * 2014-04-29 2017-10-10 中国专利信息中心 一种基于有机物命名实体的机器翻译方法及翻译系统
CN106484682B (zh) * 2015-08-25 2019-06-25 阿里巴巴集团控股有限公司 基于统计的机器翻译方法、装置及电子设备
CN106126507B (zh) * 2016-06-22 2019-08-09 哈尔滨工业大学深圳研究生院 一种基于字符编码的深度神经翻译方法及系统
CN107391501A (zh) * 2017-09-11 2017-11-24 南京大学 一种基于词预测的神经机器翻译方法
WO2019060353A1 (en) * 2017-09-21 2019-03-28 Mz Ip Holdings, Llc SYSTEM AND METHOD FOR TRANSLATION OF KEYBOARD MESSAGES

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106663092A (zh) * 2014-10-24 2017-05-10 谷歌公司 具有罕见词处理的神经机器翻译系统
CN105068998A (zh) * 2015-07-29 2015-11-18 百度在线网络技术(北京)有限公司 基于神经网络模型的翻译方法及装置
CN106815215A (zh) * 2015-11-30 2017-06-09 华为技术有限公司 生成标注库的方法和装置
US20170323203A1 (en) * 2016-05-06 2017-11-09 Ebay Inc. Using meta-information in neural machine translation
CN107329960A (zh) * 2017-06-29 2017-11-07 哈尔滨工业大学 一种上下文敏感的神经网络机器翻译中未登录词翻译装置和方法

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112417897A (zh) * 2020-11-30 2021-02-26 上海携旅信息技术有限公司 词对齐模型训练、文本处理的方法、系统、设备和介质

Also Published As

Publication number Publication date
US10929619B2 (en) 2021-02-23
CN107967263A (zh) 2018-04-27
US20200302125A1 (en) 2020-09-24

Similar Documents

Publication Publication Date Title
WO2019113783A1 (zh) 一种机器翻译数字泛化方法及系统、计算机、计算机程序
CN108563640A (zh) 一种多语言对的神经网络机器翻译方法及系统
CN107066455B (zh) 一种多语言智能预处理实时统计机器翻译系统
CN108549646B (zh) 一种基于胶囊的神经网络机器翻译系统、信息数据处理终端
CN108932226A (zh) 一种对无标点文本添加标点符号的方法
WO2022188584A1 (zh) 基于预训练语言模型的相似语句生成方法和装置
Wu et al. Finding better subword segmentation for neural machine translation
CN107273356A (zh) 基于人工智能的分词方法、装置、服务器和存储介质
US20120296633A1 (en) Syntax-based augmentation of statistical machine translation phrase tables
US11126797B2 (en) Toxic vector mapping across languages
CN113743101B (zh) 文本纠错方法、装置、电子设备和计算机存储介质
CN112417897B (zh) 词对齐模型训练、文本处理的方法、系统、设备和介质
Zafarian et al. Semi-supervised learning for named entity recognition using weakly labeled training data
CN116595999B (zh) 一种机器翻译模型训练方法和装置
CN116150613A (zh) 信息抽取模型训练方法、信息抽取方法及装置
CN111553157A (zh) 一种基于实体替换的对话意图识别方法
CN113095063A (zh) 一种基于遮蔽语言模型的两阶段情感迁移方法和系统
Huang et al. Domain-aware word segmentation for Chinese language: A document-level context-aware model
CN113901205A (zh) 基于情感语义对抗的跨语言情感分类方法
Vania et al. Improving distantly supervised document-level relation extraction through natural language inference
Ning et al. Design and Testing of Automatic Machine Translation System Based on Chinese‐English Phrase Translation
CN115906854A (zh) 一种基于多级对抗的跨语言命名实体识别模型训练方法
Che et al. A word segmentation method of ancient Chinese based on word alignment
Huang et al. [Retracted] Deep Learning‐Based English‐Chinese Translation Research
Goyal et al. Linguistically informed hindi-english neural machine translation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17934998

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17934998

Country of ref document: EP

Kind code of ref document: A1