WO2022242535A1 - 一种翻译方法、翻译装置、翻译设备以及存储介质 - Google Patents

一种翻译方法、翻译装置、翻译设备以及存储介质 Download PDF

Info

Publication number
WO2022242535A1
WO2022242535A1 PCT/CN2022/092392 CN2022092392W WO2022242535A1 WO 2022242535 A1 WO2022242535 A1 WO 2022242535A1 CN 2022092392 W CN2022092392 W CN 2022092392W WO 2022242535 A1 WO2022242535 A1 WO 2022242535A1
Authority
WO
WIPO (PCT)
Prior art keywords
word
sentence
probability
original
target
Prior art date
Application number
PCT/CN2022/092392
Other languages
English (en)
French (fr)
Inventor
程善伯
王明轩
李磊
Original Assignee
北京有竹居网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京有竹居网络技术有限公司 filed Critical 北京有竹居网络技术有限公司
Publication of WO2022242535A1 publication Critical patent/WO2022242535A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the present application relates to the field of machine learning, and in particular to a translation method, a translation device, a translation device and a storage medium.
  • machine learning In the field of machine learning, it is mainly to replace repeated manual operations by training neural network models.
  • machine learning can also be applied to the field of translation, and neural network models can be trained for translation.
  • NMT Neural Machine Translation
  • the main translation process is as follows: input the original sentence s into NMT, the encoder in NMT encodes the original sentence s into a vector, and then calculates the vector of the original sentence s to obtain the vector of the translated sentence t of the original sentence s , using a decoder to decode the vector of the translated sentence t to obtain the translated sentence t.
  • the embodiment of the present application provides a translation method, which can improve the translation accuracy.
  • An embodiment of the present application provides a translation method, the method comprising:
  • the target sentence pair includes a first original sentence and a first translation sentence, and the first translation sentence is a translation of the first original sentence;
  • the reference sentence pair includes a second original sentence and a second translation sentence, the second original sentence and the first original sentence are semantically similar, and the second The translation sentence is the translation of the second source text sentence;
  • the target original words also belong to the second original text sentence
  • the first translated word is the target original word
  • the second translation word is the corresponding word in the second translation sentence of the target original word
  • the first probability is the target original word in the The probability that the first translation sentence is translated into the first translation word
  • the second probability is the probability that the target original word is translated into the second translation word in the second translation sentence
  • the target translation sentence of the first original text sentence is determined according to the target translation word of the target original word.
  • the determining the target translated word of the target original word according to the first probability and the second probability of the target original word includes at least one of the following:
  • the first translated word or the second translated word is determined as a target translated word of the target original word.
  • the method also includes:
  • the first original sentence is input into a translation model to obtain the first probability output by the translation model.
  • the method also includes:
  • determining the target translated word of the target original word according to the first probability and the second probability of the target original word includes:
  • a fourth probability is obtained according to the second probability and the third probability, and the third probability is that the target original word obtained by inputting the first original sentence into the translation model is translated into the second translation word probability;
  • a target translated word of the target original word is determined according to the first probability and the fourth probability.
  • the embodiment of the present application also provides a translation device, the device includes:
  • a first acquiring unit configured to acquire a target sentence pair, the target sentence pair comprising a first original sentence and a first translation sentence, the first translation sentence being a translation of the first original sentence;
  • the second acquisition unit is configured to acquire a reference sentence pair according to the first original sentence, the reference sentence pair includes a second original sentence and a second translation sentence, and the second original sentence and the first original sentence are semantically similar in appearance, the second translation sentence is the translation of the second original sentence;
  • a first determining unit configured to determine a target original word included in the first original text sentence, and the target original word also belongs to the second original text sentence;
  • the second determining unit is configured to determine the target translated word of the target original word according to the first probability and the second probability of the target original word in response to the difference between the first translated word and the second translated word, and the first translated word
  • the word is the word corresponding to the target original word in the first translation sentence
  • the second translated word is the word corresponding to the target original word in the second translation sentence
  • the first probability is the The probability that the target original word is translated into the first translated word in the first translated sentence
  • the second probability is the probability that the target original word is translated into the second translated word in the second translated sentence the probability of translated words
  • a third determining unit configured to determine the target translation sentence of the first original text sentence according to the target translation word of the target original word.
  • the second determining unit determines the target translated word of the target original word according to the first probability and the second probability of the target original word, including at least one of the following:
  • the second determining unit determines the second translated word as the target translated word of the target original word in response to the first probability being smaller than the second probability;
  • the second determining unit determines the first translated word as a target translated word of the target original word in response to the first probability being greater than the second probability;
  • the second determining unit determines the first translated word or the second translated word as a target translated word of the target original word in response to the first probability being equal to the second probability.
  • the device also includes:
  • the first input unit is configured to input the first original sentence into the translation model to obtain the first probability output by the translation model.
  • the device also includes:
  • a second input unit configured to input the second original sentence and the second translated sentence into the translation model to obtain the second probability output by the translation model.
  • the second determining unit determines the target translated word of the target original word according to the first probability and the second probability of the target original word, including:
  • the second determining unit obtains a fourth probability according to the second probability and the third probability, and the third probability is that the target original word obtained by inputting the first original sentence into the translation model is translated is the probability of the second translated word;
  • the second determination unit determines the target translated word of the target original word according to the first probability and the fourth probability.
  • the embodiment of the present application also provides a translation device, the device includes: a processor and a memory;
  • the memory is used to store instructions
  • the processor is configured to execute the instructions in the memory, and execute the methods described in the foregoing embodiments.
  • Embodiments of the present application also provide a computer-readable storage medium, including instructions, which, when run on a computer, cause the computer to execute the method described in the foregoing embodiments.
  • the translation method provided in the embodiment of the present application first determines the second original sentence that is semantically similar to the first original sentence, and then determines the target original word that appears in both the first original sentence and the second original sentence. If the translated words in the first original text sentence and the second original text sentence are different, the target translated word of the target original word is determined according to the probability that the target original word is translated into the first translated word or the second translated word. It can be seen that the translation method of the embodiment of the present application not only utilizes the information of the second target sentence of the second original sentence, but also uses the information of the second original sentence. According to the similar vocabulary information of the second original sentence and the first original sentence, , correcting the translation of the first original sentence, which can increase the accuracy of the translation.
  • FIG. 1 is a flow chart of an embodiment of a translation method provided by an embodiment of the present application
  • FIG. 2 is a schematic diagram of a translation matrix provided by an embodiment of the present application.
  • FIG. 3 is a structural block diagram of a translation device provided by an embodiment of the present application.
  • FIG. 4 is a structural block diagram of a translation device provided by an embodiment of the present application.
  • machine learning In the field of machine learning, it is mainly to replace repeated manual operations by training neural network models.
  • machine learning can also be applied to the field of translation, and neural network models can be trained for translation.
  • NMT Neural Machine Translation
  • the main translation process is as follows: input the original sentence s into the NMT, the encoder in the NMT encodes the original sentence s into a vector, and then finds a similar sentence s' that is semantically similar to the original sentence s, and obtains the similar sentence s' Similar to the translation sentence t', an additional encoder (encoder) is introduced in NMT to encode the similar translation sentence t' into a vector, and the vector of the original sentence s and the similar translation sentence t' are modeled to obtain the translation sentence t of the original sentence s Use the decoder to decode the vector of the translated sentence t to obtain the translated sentence t.
  • the inventors have found that when translating in the prior art, only the information of the similar sentence t' of the translation is referred to, and the information of the similar sentence s' of the original sentence s is ignored, which will lead to inaccurate translation during the translation process The problem.
  • translation in the prior art needs to additionally introduce a new encoder to encode the similar translation sentence t', which increases the number of parameters in the model calculation process and increases the translation cost.
  • the translation model needs to be retrained, which consumes too much time.
  • the embodiment of the present application provides a translation method, which first determines the second original sentence that is semantically similar to the first original sentence, and then determines the target original word that appears in both the first original sentence and the second original sentence. If the target original word has different translated words in the first original text sentence and the second original text sentence, the target translated word of the target original word is determined according to the probability that the target original word is translated into the first translated word or the second translated word. It can be seen that the translation method of the embodiment of the present application not only utilizes the information of the second target sentence of the second original sentence, but also uses the information of the second original sentence. According to the similar vocabulary information of the second original sentence and the first original sentence, , correcting the translation of the first original sentence, which can increase the accuracy of the translation. In addition, the translation method in the embodiment of the present application does not need to introduce a new encoder in the translation model to encode the translation sentence t' into a vector, which reduces the translation cost and increases the translation efficiency.
  • FIG. 1 this figure is a flow chart of a translation method provided by an embodiment of the present application.
  • a target sentence pair is obtained, and the target sentence pair is a sentence pair translated by a translation model.
  • the target sentence pair includes a first source text sentence and a first translation sentence.
  • the first translation sentence is a sentence obtained by translating the first source text sentence through the translation model. There may be translation errors in the first translation sentence.
  • the first source text sentence may be "she is a student from China”
  • the first translation sentence may be "She is a student to China", wherein the word to in the first translation sentence is incorrectly translated.
  • the translation model used in the embodiment of the present application may be a neural network machine translation (Neural Machine Translation, NMT) model.
  • NMT Neural Machine Translation
  • the reference sentence pair is obtained according to the first original sentence, and the reference sentence pair includes the second original sentence and the second translation sentence, wherein the second translation sentence is the translation of the first original text sentence, and the second translation sentence is the correct translation of the sentence in the first source language.
  • the first original text sentence is semantically similar to the second original text sentence, and the second original text sentence that is semantically similar to the first original text sentence can be queried in the translation memory.
  • the first original sentence and the second original sentence are semantically similar.
  • Semantic similarity can be implemented in the following two ways: the first way is to know the proportion of similar vocabulary in the first original sentence and the second original sentence by comparison, when the similar vocabulary reaches the predetermined ratio, it is considered that the first original sentence It is semantically similar to the second original sentence; the second implementation method is to calculate the similarity between the vectors after encoding the first original sentence and the second original sentence into vectors, and when the similarity between the vectors reaches a predetermined ratio , which means that the first original sentence and the second original sentence are semantically similar.
  • the first source sentence could be "She is a student from China”
  • the first target sentence could be "She is a student to China”
  • the second source sentence could be "He is a teacher from the United States "
  • the second translation sentence can be "He is a teacher from America”.
  • the target original word is a word that appears in both the first original sentence and the second original sentence.
  • the first original sentence and the second original sentence can be compared word by word, and it can be obtained that both appear in the first original sentence and appear in the second original sentence word.
  • the first original sentence may be "she is a student from China”
  • the second original sentence may be "he is a teacher from the United States”. It can be determined that the target original words in the first original sentence include "is”, “a”, "from” and "of” through the comparison of each word.
  • the target original word by means of word-by-word comparison, for example, the first original text sentence and the second original text sentence are divided according to grammar to obtain several parts, and the corresponding parts of the first original text sentence and the second original text sentence are compared to determine target word.
  • S104 In response to the difference between the first translated word and the second translated word, determine the target translated word of the target original word according to the first probability and the second probability of the target original word.
  • the first translated word is the word corresponding to the target original word in the first translated sentence
  • the second translated word is the word corresponding to the target original word in the second translated sentence
  • the first probability is the target original word The probability that a word is translated into the first translation word in the first translation sentence
  • the second probability is the probability that the target original word is translated into the second translation word in the second translation sentence.
  • the first translated word in the first translated sentence of the target original word may be different from the second translated word in the second translated sentence.
  • the first probability of being translated into the first translated word and the second probability of being translated into the second translated word in the second translated sentence determine the target translated word of the target original word, and the target translated word is the correct one corresponding to the target original word translate words.
  • the first probability is obtained by inputting the first original sentence into the translation model.
  • the main translation process is as follows: the first original sentence is input into the translation model, and the encoder in the translation model encodes the first original sentence into a vector, and then translates to obtain the vector of the first translation sentence. ) decodes the vector of the first translation sentence to obtain the first translation sentence, at this time, the probability that the target original word is translated into the first translation word in the first translation sentence is the first probability.
  • the second probability is obtained by inputting the second original sentence and the second target sentence into the translation model. Input the second original text sentence and the second translated text sentence into the translation model, and the translation model will decode the second original text sentence and the second translated text sentence, that is, encode the second original text sentence and the second translated text sentence into vectors, and obtain the target original word at The second probability of the second source sentence being translated into the second translation word.
  • the translation method in the embodiment of the present application does not need to introduce a new encoder in the translation model to encode the second translation sentence into a vector, but only needs to use the original encoder and decoder in the translation model to encode the second original sentence and the second translation.
  • the second probability can be obtained by mandatory decoding of the second translation sentence, that is, the method of the embodiment of the present application reduces the translation cost and increases the translation efficiency.
  • the she is a student to China she 0.8 0.02 0.02 0.02 0.02 0.02 Yes 0.02 0.8 0.02 0.02 0.02 0.02 0.02
  • the first probability of the target original word "one" in the first original sentence being translated as "a” is 0.7, and the target original word “from” in the first original sentence is translated as "to".
  • the first probability of the target original word "from” in the first original sentence is 0.5, and the probability of being translated as "from” in the first original sentence is 0.3, that is to say, since the first original sentence is input into the translation model, the first original sentence in The word "from” has the highest probability of being translated as "to", which is 0.5, so the translation model mistranslated the first original sentence, and "from” was wrongly translated as "to”.
  • the target translated word of the target original word can be determined according to the first probability and the second probability of the target original word in the following three cases:
  • the first case is that the first probability is smaller than the second probability, and the second translated word is determined as the target translated word of the target original word. That is to say, if the probability of the target original word being translated into the first translated word is less than the probability of being translated into the second translated word, then the target translated word is the second translated word.
  • the first probability of translating the target original word “from” into the first translated word “to” is 0.5
  • the second probability of translating the target original word “from” into the second translated word “from” is 0.8
  • the first probability 0.5 is less than the second probability 0.8
  • the target translated word of the target original word "from” is determined to be the second translated word "from”.
  • the second case is that the first probability is greater than the second probability, and the first translated word is determined as the target translated word of the target original word. That is to say, if the probability of the target original word being translated into the first translated word is greater than the probability of being translated into the second translated word, then the target translated word is the first translated word.
  • the first probability of translating the target original word “from” into the first translated word “to” is 0.8
  • the second probability of translating the target original word “from” into the second translated word “from” is 0.5
  • the first probability 0.8 is greater than the second probability 0.5
  • the third case is that the first probability is equal to the second probability, and the first translated word or the second translated word is determined as the target translated word of the target original word. That is to say, if the probability of the target original word being translated into the first translated word is equal to the probability of being translated into the second translated word, then the target translated word is the first translated word or the second translated word.
  • the first probability of translating the target original word “from” into the first translated word “to” is 0.7
  • the second probability of translating the target original word “from” into the second translated word “from” is 0.7
  • the first probability 0.7 is equal to the second probability 0.7
  • the fourth probability is obtained according to the second probability and the third probability, and the target translated word of the target original word is determined according to the first probability and the fourth probability, wherein the third probability is the translation of the first original text
  • the second probability and the third probability may be weighted and summed to obtain the fourth probability.
  • the magnitudes of the first probability and the fourth probability may be compared, and the translated word corresponding to the higher probability is determined as the target translated word. If the first probability is less than the fourth probability, then determine the second translated word corresponding to the fourth probability as the target translated word; if the first probability is greater than the fourth probability, then determine the first translated word corresponding to the first probability as the target translated word If the first probability is equal to the fourth probability, then determine the first translated word corresponding to the first probability as the target translated word or determine the second translated word corresponding to the fourth probability as the target translated word.
  • the first probability that the target original word "from” is translated into the first translated word “to” is 0.5
  • the target original word “from” is translated into the second
  • the third probability of the translated word “from” is 0.3.
  • the second probability of the target original word "from” being translated into the second translated word “from” is 0.8.
  • S105 Determine the target translation sentence of the first original text sentence according to the target translation word of the target original word.
  • the target translation sentence of the first original sentence can be determined according to the determined target translation of the target original word.
  • the target original word of the word "from” is the second translation word "from"
  • the first translation word "to” in the first translation sentence is corrected to the second translation word "from” to obtain the target translation sentence.
  • the embodiment of the present application provides a translation method, which first determines the second original sentence that is semantically similar to the first original sentence, and then determines the target original word that appears in both the first original sentence and the second original sentence. If the translated words in the first original text sentence and the second original text sentence are different, the target translated word of the target original word is determined according to the probability that the target original word is translated into the first translated word or the second translated word. It can be seen that the translation method of the embodiment of the present application not only utilizes the information of the second target sentence of the second original sentence, but also uses the information of the second original sentence. According to the similar vocabulary information of the second original sentence and the first original sentence, , correcting the translation of the first original sentence, which can increase the accuracy of the translation. In addition, the translation method in the embodiment of the present application does not need to introduce a new encoder in the translation model to encode the translation sentence t' into a vector, which reduces the translation cost and increases the translation efficiency.
  • the embodiment of the present application also provides a translation device, and its working principle will be described in detail below with reference to the accompanying drawings.
  • this figure is a structural block diagram of a translation device provided by an embodiment of the present application.
  • the translation device 300 provided in this embodiment includes:
  • the first acquisition unit 310 is configured to acquire a target sentence pair, the target sentence pair includes a first original sentence and a first translation sentence, and the first translation sentence is a translation of the first original sentence;
  • the second acquisition unit 320 is configured to acquire a reference sentence pair according to the first original text sentence, the reference sentence pair includes a second original text sentence and a second translation sentence, and the second original text sentence and the first original text sentence are in Semantically similar, the second translation sentence is the translation of the second original sentence;
  • the first determining unit 330 is configured to determine a target original word in the first original sentence, where the target original word is a word that appears in both the first original sentence and the second original sentence;
  • the second determining unit 340 is configured to determine the target translated word of the target original word according to the first probability and the second probability of the target original word in response to the difference between the first translated word and the second translated word, and the first The translated word is the word corresponding to the target original word in the first translated sentence, the second translated word is the corresponding word in the second translated sentence of the target original word, and the first probability is The probability that the target original word is translated into the first translated word in the first translated sentence, and the second probability is that the target original word is translated into the first translated word in the second translated sentence The probability of the second translation;
  • the third determining unit 350 is configured to determine the target translation sentence of the first original text sentence according to the target translation word of the target original word.
  • the second determining unit determines the target translated word of the target original word according to the first probability and the second probability of the target original word, including at least one of the following:
  • the second determining unit determines the second translated word as the target translated word of the target original word in response to the first probability being smaller than the second probability;
  • the second determining unit determines the first translated word as a target translated word of the target original word in response to the first probability being greater than the second probability;
  • the second determining unit determines the first translated word or the second translated word as a target translated word of the target original word in response to the first probability being equal to the second probability.
  • the device also includes:
  • the first input unit is configured to input the first original sentence into the translation model to obtain the first probability.
  • the device also includes:
  • a second input unit configured to input the second original sentence and the second translated sentence into the translation model to obtain the second probability.
  • the second determining unit determines the target translated word of the target original word according to the first probability and the second probability of the target original word, including:
  • the second determining unit obtains a fourth probability according to the second probability and the third probability, and the third probability is that the target original word obtained by inputting the first original sentence into the translation model is translated is the probability of the second translated word;
  • the second determination unit determines the target translated word of the target original word according to the first probability and the fourth probability.
  • the translation device 400 includes:
  • the number of the processor 410 and the memory 420 may be one or more. In some embodiments of the present application, the processor and the memory may be connected through a bus or in other ways.
  • the memory which can include read only memory and random access memory, provides instructions and data to the processor.
  • a portion of the memory may also include NVRAM.
  • the memory stores operating systems and operating instructions, executable modules or data structures, or their subsets, or their extended sets, wherein the operating instructions may include various operating instructions for implementing various operations.
  • the operating system may include various system programs for implementing various basic services and processing hardware-based tasks.
  • the processor controls the operation of the terminal device, and the processor may also be referred to as a CPU.
  • the methods disclosed in the foregoing embodiments of the present application may be applied to, or implemented by, a processor.
  • the processor can be an integrated circuit chip with signal processing capability.
  • each step of the above method can be completed by an integrated logic circuit of hardware in a processor or an instruction in the form of software.
  • the above-mentioned processor may be a general-purpose processor, DSP, ASIC, FPGA or other programmable logic devices, discrete gate or transistor logic devices, and discrete hardware components.
  • Various methods, steps, and logic block diagrams disclosed in the embodiments of the present application may be implemented or executed.
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • the steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, register.
  • the storage medium is located in the memory, and the processor reads the information in the memory, and completes the steps of the above method in combination with its hardware.
  • An embodiment of the present application further provides a computer-readable storage medium for storing a program code, and the program code is used to execute any one implementation manner of a translation method in the foregoing embodiments.
  • the storage medium may be a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM) or a random access memory (Random Access Memory, RAM), etc.
  • each embodiment in this specification is described in a progressive manner, the same and similar parts of each embodiment can be referred to each other, and each embodiment focuses on the differences from other embodiments.
  • the description is relatively simple, and for relevant parts, please refer to part of the description of the method embodiment.
  • the device embodiments described above are only illustrative, and the units and modules described as separate components may or may not be physically separated. In addition, some or all of the units and modules can also be selected according to actual needs to achieve the purpose of the solution of this embodiment. It can be understood and implemented by those skilled in the art without creative effort.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Machine Translation (AREA)

Abstract

一种翻译方法、翻译装置、翻译设备以及存储介质,首先确定与第一原文语句在语义上相似的第二原文语句(102),之后确定在第一原文语句和第二原文语句中都出现的目标原词(103),若目标原词在第一原文语句和第二原文语句中的译词不同,则根据目标原词被翻译为第一译词或第二译词的概率确定目标原词的目标译词(104)。由此可见,所述翻译方法不仅利用了第二原文语句的第二译文语句的信息,还采用了第二原文语句的信息,根据第二原文语句和第一原文语句的相似词汇信息,对第一原文语句的翻译进行校正,能够增大翻译的准确性。

Description

一种翻译方法、翻译装置、翻译设备以及存储介质
本申请要求于2021年05月21日提交中国专利局、申请号为202110560294.3、申请名称为“一种翻译方法、翻译装置、翻译设备以及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及机器学习领域,尤其涉及一种翻译方法、翻译装置、翻译设备以及存储介质。
背景技术
随着计算机技术的快速发展,机器学习领域也有了很多的技术进步。在机器学习领域主要是通过训练神经网络模型来代替重复的人工操作。当前机器学习也可以应用到翻译领域,可以训练神经网络模型来进行翻译。
神经网络机器翻译(Neural Machine Translation,NMT)翻译质量较高,已经成为当前最广泛的机器翻译技术。主要的翻译过程如下:将原文语句s输入NMT中,NMT中的编码器(encoder)将原文语句s编码为向量,之后将原文语句s的向量进行模型计算得到原文语句s的译文语句t的向量,利用解码器(decoder)对译文语句t的向量进行解码,得到译文语句t。
但是,现有技术中进行翻译时,存在翻译的过程中出现翻译不准确的问题。
发明内容
为了解决现有技术在翻译的过程中出现翻译不准确的问题,本申请实施例提供了一种翻译方法,能够提高翻译的准确性。
本申请实施例提供一种翻译方法,所述方法包括:
获取目标句对,所述目标句对包括第一原文语句和第一译文语句,所述第一译文语句为所述第一原文语句的译文;
根据所述第一原文语句获取参考句对,所述参考句对包括第二原文语句和第二译文语句,所述第二原文语句和所述第一原文语句在语义上相似,所述第二译文语句为所述第二原文语句的译文;
确定所述第一原文语句包括的目标原词,所述目标原词还属于所述第二原文语句;
响应于第一译词和第二译词不同,根据所述目标原词的第一概率和第二概率确定所述目标原词的目标译词,所述第一译词为所述目标原词在所述第一译文语句中对应的词,所述第二译词为所述目标原词在所述第二译文语句中对应的词,所述第一概率为所述目标原词在所述第一译文语句中被翻译为所述第一译词的概率,所述第二概率为所述目标原词在所述第二译文语句中被翻译为所述第二译词的概率;
根据所述目标原词的目标译词确定所述第一原文语句的目标译文语句。
可选的,所述根据所述目标原词的第一概率和第二概率确定所述目标原词的目标译词,包括下述至少一项:
响应于所述第一概率小于所述第二概率,将所述第二译词确定为所述目标原词的目标译词;
响应于所述第一概率大于所述第二概率,将所述第一译词确定为所述目标原词的 目标译词;
响应于所述第一概率等于所述第二概率,将所述第一译词或所述第二译词确定为所述目标原词的目标译词。
可选的,所述方法还包括:
将所述第一原文语句输入到翻译模型中,得到所述翻译模型输出的所述第一概率。
可选的,所述方法还包括:
将所述第二原文语句和所述第二译文语句输入到所述翻译模型中,得到所述翻译模型输出的所述第二概率。
可选的,所述根据所述目标原词的第一概率和第二概率确定所述目标原词的目标译词,包括:
根据所述第二概率和第三概率得到第四概率,所述第三概率为将所述第一原文语句输入到所述翻译模型中得到的所述目标原词被翻译为所述第二译词的概率;
根据所述第一概率和所述第四概率确定所述目标原词的目标译词。
本申请实施例还提供一种翻译装置,所述装置包括:
第一获取单元,用于获取目标句对,所述目标句对包括第一原文语句和第一译文语句,所述第一译文语句为所述第一原文语句的译文;
第二获取单元,用于根据所述第一原文语句获取参考句对,所述参考句对包括第二原文语句和第二译文语句,所述第二原文语句和所述第一原文语句在语义上相似,所述第二译文语句为所述第二原文语句的译文;
第一确定单元,用于确定所述第一原文语句包括的目标原词,所述目标原词还属于所述第二原文语句;
第二确定单元,用于响应于第一译词和第二译词不同,根据所述目标原词的第一概率和第二概率确定所述目标原词的目标译词,所述第一译词为所述目标原词在所述第一译文语句中对应的词,所述第二译词为所述目标原词在所述第二译文语句中对应的词,所述第一概率为所述目标原词在所述第一译文语句中被翻译为所述第一译词的概率,所述第二概率为所述目标原词在所述第二译文语句中被翻译为所述第二译词的概率;
第三确定单元,用于根据所述目标原词的目标译词确定所述第一原文语句的目标译文语句。
可选的,所述第二确定单元根据所述目标原词的第一概率和第二概率确定所述目标原词的目标译词,包括下述至少一项:
所述第二确定单元响应于所述第一概率小于所述第二概率,将所述第二译词确定为所述目标原词的目标译词;
所述第二确定单元响应于所述第一概率大于所述第二概率,将所述第一译词确定为所述目标原词的目标译词;
所述第二确定单元响应于所述第一概率等于所述第二概率,将所述第一译词或所述第二译词确定为所述目标原词的目标译词。
可选的,所述装置还包括:
第一输入单元,用于将所述第一原文语句输入到翻译模型中,得到所述翻译模型输出的所述第一概率。
可选的,所述装置还包括:
第二输入单元,用于将所述第二原文语句和所述第二译文语句输入到所述翻译模型中,得到所述翻译模型输出的所述第二概率。
可选的,所述第二确定单元根据所述目标原词的第一概率和第二概率确定所述目标原词的目标译词,包括:
所述第二确定单元根据所述第二概率和第三概率得到第四概率,所述第三概率为将所述第一原文语句输入到所述翻译模型中得到的所述目标原词被翻译为所述第二译词的概率;
所述第二确定单元根据所述第一概率和所述第四概率确定所述目标原词的目标译词。
本申请实施例还提供一种翻译设备,所述设备包括:处理器和存储器;
所述存储器,用于存储指令;
所述处理器,用于执行所述存储器中的所述指令,执行如上述实施例所述的方法。
本申请实施例还提供一种计算机可读存储介质,包括指令,当其在计算机上运行时,使得计算机执行如上述实施例所述的方法。
本申请实施例中提供的翻译方法,首先确定与第一原文语句在语义上相似的第二原文语句,之后确定在第一原文语句和第二原文语句中都出现的目标原词,若目标原词在第一原文语句和第二原文语句中的译词不同,则根据目标原词被翻译为第一译词或第二译词的概率确定目标原词的目标译词。由此可见,本申请实施例的翻译方法不仅利用了第二原文语句的第二译文语句的信息,还采用了第二原文语句的信息,根据第二原文语句和第一原文语句的相似词汇信息,对第一原文语句的翻译进行校正,能够增大翻译的准确性。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请中记载的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。
图1为本申请实施例提供的一种翻译方法实施例的流程图;
图2为本申请实施例提供的一种翻译矩阵的示意图;
图3为本申请实施例提供的一种翻译装置的结构框图;
图4为本申请实施例提供的一种翻译设备的结构框图。
具体实施方式
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保 护的范围。
随着计算机技术的快速发展,机器学习领域也有了很多的技术进步。在机器学习领域主要是通过训练神经网络模型来代替重复的人工操作。当前机器学习也可以应用到翻译领域,可以训练神经网络模型来进行翻译。
神经网络机器翻译(Neural Machine Translation,NMT)翻译质量较高,已经成为当前最广泛的机器翻译技术。主要的翻译过程如下:将原文语句s输入NMT中,NMT中的编码器(encoder)将原文语句s编码为向量,之后寻找与原文语句s语义相似的相似语句s',得到相似语句s'的相似译文语句t',NMT中引入额外的一个编码器(encoder)将相似译文语句t'编码为向量,将原文语句s的向量和相似译文语句t'进行模型计算得到原文语句s的译文语句t的向量,利用解码器(decoder)对译文语句t的向量进行解码,得到译文语句t。
经发明人研究发现,现有技术中进行翻译时,只会参考相似译文语句t'的信息,而忽略了原文语句s的相似语句s'的信息,会导致在翻译的过程中出现翻译不准确的问题。并且,现有技术中进行翻译需要额外引入一个新的encoder对相似译文语句t'进行编码,增加了模型计算过程中的参数数量,翻译成本增加。若要考虑相似语句s'的信息,则还需要重新训练翻译模型,时间消耗过大。
因此,本申请实施例提供一种翻译方法,首先确定与第一原文语句在语义上相似的第二原文语句,之后确定在第一原文语句和第二原文语句中都出现的目标原词,若目标原词在第一原文语句和第二原文语句中的译词不同,则根据目标原词被翻译为第一译词或第二译词的概率确定目标原词的目标译词。由此可见,本申请实施例的翻译方法不仅利用了第二原文语句的第二译文语句的信息,还采用了第二原文语句的信息,根据第二原文语句和第一原文语句的相似词汇信息,对第一原文语句的翻译进行校正,能够增大翻译的准确性。此外,本申请实施例中的翻译方法不需要在翻译模型中引入新的encoder对译文语句t'编码为向量,降低了翻译成本,增大了翻译效率。
参见图1,该图为本申请实施例提供的一种翻译方法的流程图。
本实施例提供的翻译方法包括如下步骤:
S101,获取目标句对。
在本申请的实施例中,首先获取目标句对,目标句对是经过翻译模型进行翻译之后的句对。目标句对包括第一原文语句和第一译文语句,第一译文语句是第一原文语句经翻译模型进行翻译后得到的语句,第一译文语句可能会有翻译错误。
作为一种示例,第一原文语句可以是“她是一个来自中国的学生”,第一译文语句可以是“She is a student to China”,其中,第一译文语句中的词to翻译错误。
本申请实施例应用的翻译模型可以为神经网络机器翻译(Neural Machine Translation,NMT)模型。
S102,根据所述第一原文语句获取参考句对。
在本申请的实施例中,根据第一原文语句获取参考句对,参考句对包括第二原文语句和第二译文语句,其中,第二译文语句是第一原文语句的译文,第二译文语句是第一原文语句的正确译文。第一原文语句与第二原文语句在语义上相似,可以在翻译 记忆库中查询与第一原文语句语义相似的第二原文语句。
在实际应用中,第一原文语句和第二原文语句在语义上相似。语义上相似可以有以下两种实现方式:第一种实现方式是通过比较得知第一原文语句和第二原文语句中相似词汇的比例,当相似词汇达到预定的比例,即认为第一原文语句和第二原文语句在语义上相似;第二种实现方式是将第一原文语句和第二原文语句编码为向量之后,计算向量之间的相似度,当向量之间的相似度达到预定的比例,即认为第一原文语句和第二原文语句在语义上相似。
作为一种示例,第一原文语句可以是“她是一个来自中国的学生”,第一译文语句可以是“She is a student to China”;第二原文语句可以是“他是一个来自美国的老师”,第二译文语句可以是“He is a teacher from America”。
S103,确定所述第一原文语句中的目标原词。
在本申请的实施例中,目标原词为既出现在第一原文语句又出现在第二原文语句中的词。在确定与第一原文语句的语义相似的第二原文语句之后,可以比较第一原文语句和第二原文语句中的词,将既出现在第一原文语句中又出现在第二原文语句中的词,确定为目标原词。
在实际应用中,在确定第一原文语句中的目标原词时,可以对第一原文语句和第二原文语句进行逐个词汇比较,得到既出现在第一原文语句中又出现在第二原文语句中的词。
作为一种示例,第一原文语句可以是“她是一个来自中国的学生”,第二原文语句可以是“他是一个来自美国的老师”。通过逐个词汇的比较可以确定第一原文语句中的目标原词有“是”、“一个”、“来自”和“的”。
也可以不采用逐个词汇比较的方式确定目标原词,例如将第一原文语句和第二原文语句按照语法进行分割,得到几个部分,对比第一原文语句和第二原文语句相应的部分,确定目标原词。S104,响应于第一译词和第二译词不同,根据所述目标原词的第一概率和第二概率确定所述目标原词的目标译词。
在本申请的实施例中,第一译词为目标原词在第一译文语句中对应的词,第二译词为目标原词在第二译文语句中对应的词,第一概率为目标原词在第一译文语句中被翻译为第一译词的概率,第二概率为目标原词在第二译文语句中被翻译为第二译词的概率。在本申请的实施例中,目标原词在第一译文语句中的第一译词和在第二译文语句中的第二译词可能不同,此时可以根据目标原词在第一译文语句中被翻译为第一译词的第一概率和在第二译文语句中被翻译为第二译词的第二概率,确定目标原词的目标译词,目标译词为目标原词对应的正确的翻译词。在实际应用中,第一概率是将第一原文语句输入到翻译模型得到的。主要的翻译过程如下:将第一原文语句输入到翻译模型中,翻译模型中的编码器(encoder)将第一原文语句编码为向量,进行翻译得到第一译文语句的向量,利用解码器(decoder)对第一译文语句的向量进行解码,得到第一译文语句,此时目标原词在第一译文语句中被翻译为第一译词的概率为第一概率。
在实际应用中,第二概率是将第二原文语句和第二译文语句输入到翻译模型得到的。将第二原文语句和第二译文语句输入翻译模型,翻译模型对第二原文语句和第二译文 语句进行强制解码,即将第二原文语句和第二译文语句编码为向量,得到目标原词在第二原文语句中被翻译为第二译词的第二概率。由此可见,本申请实施例中的翻译方法不需要在翻译模型中引入新的encoder对第二译文语句编码为向量,只需要利用翻译模型中原本具有的encoder和decoder对第二原文语句和第二译文语句进行强制解码,即可得到第二概率,也就是本申请实施例的方法降低了翻译成本,增大了翻译效率。
参考表1所示,为将第一原文语句输入到翻译模型,得到第一原文语句中的词被翻译的概率。
表1
  She is a student to China
0.8 0.02 0.02 0.02 0.02 0.02
0.02 0.8 0.02 0.02 0.02 0.02
一个 0.1 0.1 0.7 0.03 0.03 0.04
来自 0.05 0.01 0.01 0.1 0.5 0.08
中国 0.02 0.02 0.02 0.02 0.02 0.9
0.1 0.2 0.13 0.15 0.15 0.1
学生 0.02 0.02 0.02 0.9 0.02 0.02
由表1可以看出,第一原文语句中的目标原词“一个”被翻译为“a”的第一概率为0.7,第一原文语句中的目标原词“来自”被翻译为“to”的第一概率为0.5,第一原文语句中的目标原词“来自”被翻译为“from”的概率为0.3,也就是说,由于将第一原文语句输入翻译模型,得到第一原文语句中的词“来自”被翻译为“to”的概率最大,为0.5,因此翻译模型在翻译第一原文语句的时候翻译错误,将“来自”错误的翻译为了“to”。
参考表2或图2所示,为将第二原文语句和第二译文语句输入到翻译模型,得到第二原文语句中的词被翻译为第二译文语句中的词的概率。
表2
Figure PCTCN2022092392-appb-000001
由表2或图2可以看出,第二原文语句中的词“他”被翻译为“He”的概率为0.8,第二原文语句中的目标原词“一个”被翻译为“a”的第二概率为0.7,第一原文语句中的目标原词“来自”被翻译为“from”的第二概率为0.8。
在本申请的实施例中,确定目标原词的目标译词可以有以下两种实现方式:
在第一种可能的实现方式中,根据目标原词的第一概率和第二概率确定目标原词的目标译词可以有以下三种情况:
第一种情况为第一概率小于第二概率,将第二译词确定为目标原词的目标译词。也就是说,若目标原词翻译为第一译词的概率小于翻译为第二译词的概率,则目标译词为第二译词。
作为一种示例,将目标原词“来自”翻译为第一译词“to”的第一概率为0.5,将目标原词“来自”翻译为第二译词“from”的第二概率为0.8,第一概率0.5小于第二概率0.8,则确定目标原词“来自”的目标译词为第二译词“from”。
第二种情况为第一概率大于第二概率,将第一译词确定为目标原词的目标译词。也就是说,若目标原词翻译为第一译词的概率大于翻译为第二译词的概率,则目标译词为第一译词。
作为一种示例,将目标原词“来自”翻译为第一译词“to”的第一概率为0.8,将目标原词“来自”翻译为第二译词“from”的第二概率为0.5,第一概率0.8大于第二概率0.5,则确定目标原词“来自”的目标译词为第一译词“to”。
第三种情况为第一概率等于第二概率,将第一译词或第二译词确定为目标原词的目标译词。也就是说,若目标原词翻译为第一译词的概率等于翻译为第二译词的概率,则目标译词为第一译词或第二译词。
作为一种示例,将目标原词“来自”翻译为第一译词“to”的第一概率为0.7,将目标原词“来自”翻译为第二译词“from”的第二概率为0.7,第一概率0.7等于第二概率0.7,则确定目标原词“来自”的目标译词为第一译词“to”或第二译词“from”。
在第二种可能的实现方式中,根据第二概率和第三概率得到第四概率,根据第一概率和第四概率确定目标原词的目标译词,其中,第三概率为将第一原文语句输入到翻译模型中得到的目标原词被翻译为第二译词的概率。在实际应用中,可以将第二概率和第三概率加权求和,得到第四概率。通过利用第二概率和第三概率得到第四概率的方式来辅助确定目标译词,能够增强本申请实施例提供的翻译方法的翻译质量。
在实际应用中,可以比较第一概率和第四概率的大小,将较大的概率对应的译词确定为目标译词。若第一概率小于第四概率,则将第四概率对应的第二译词确定为目标译词;若第一概率大于第四概率,则将第一概率对应的第一译词确定为目标译词;若第一概率等于第四概率,则将第一概率对应的第一译词确定为目标译词或将第四概率对应的第二译词确定为目标译词。
作为一种示例,将第一原文语句输入到翻译模型中,目标原词“来自”被翻译为第一译词“to”的第一概率为0.5,目标原词“来自”被翻译为第二译词“from”的第三概率为0.3,将第二原文语句输入到翻译模型中,目标原词“来自”被翻译为第二译词“from”的第二概率为0.8。将第二概率0.8和第三概率0.3加权求和,得到第四概率,即第四概率=(0.8+0.3)×0.5=0.55,第四概率0.55大于第一概率0.5,则将第四概率对应的第二译词“from”确定为目标译词。
S105,根据所述目标原词的目标译词确定所述第一原文语句的目标译文语句。
在本申请的实施例中,可以根据确定的目标原词的目标译词,确定第一原文语句的目 标译文语句。
作为一种示例,若目标原词“来自”翻译为第一译词“to”的第一概率小于目标原词“来自”翻译为第二译词“from”的第二概率,则确定目标原词“来自”的目标译词为第二译词“from”,将第一译文语句中的第一译词“to”修正为第二译词“from”,得到目标译文语句。
本申请实施例提供一种翻译方法,首先确定与第一原文语句在语义上相似的第二原文语句,之后确定在第一原文语句和第二原文语句中都出现的目标原词,若目标原词在第一原文语句和第二原文语句中的译词不同,则根据目标原词被翻译为第一译词或第二译词的概率确定目标原词的目标译词。由此可见,本申请实施例的翻译方法不仅利用了第二原文语句的第二译文语句的信息,还采用了第二原文语句的信息,根据第二原文语句和第一原文语句的相似词汇信息,对第一原文语句的翻译进行校正,能够增大翻译的准确性。此外,本申请实施例中的翻译方法不需要在翻译模型中引入新的encoder对译文语句t'编码为向量,降低了翻译成本,增大了翻译效率。
基于以上实施例提供的一种翻译方法,本申请实施例还提供了一种翻译装置,下面结合附图来详细说明其工作原理。
参见图3,该图为本申请实施例提供的一种翻译装置的结构框图。
本实施例提供的翻译装置300包括:
第一获取单元310,用于获取目标句对,所述目标句对包括第一原文语句和第一译文语句,所述第一译文语句为所述第一原文语句的译文;
第二获取单元320,用于根据所述第一原文语句获取参考句对,所述参考句对包括第二原文语句和第二译文语句,所述第二原文语句和所述第一原文语句在语义上相似,所述第二译文语句为所述第二原文语句的译文;
第一确定单元330,用于确定所述第一原文语句中的目标原词,所述目标原词为既出现在所述第一原文语句中又出现在所述第二原文语句中的词;
第二确定单元340,用于响应于第一译词和第二译词不同,根据所述目标原词的第一概率和第二概率确定所述目标原词的目标译词,所述第一译词为所述目标原词在所述第一译文语句中对应的词,所述第二译词为所述目标原词在所述第二译文语句中对应的词,所述第一概率为所述目标原词在所述第一译文语句中被翻译为所述第一译词的概率,所述第二概率为所述目标原词在所述第二译文语句中被翻译为所述第二译词的概率;
第三确定单元350,用于根据所述目标原词的目标译词确定所述第一原文语句的目标译文语句。
可选的,所述第二确定单元根据所述目标原词的第一概率和第二概率确定所述目标原词的目标译词,包括下述至少一项:
所述第二确定单元响应于所述第一概率小于所述第二概率,将所述第二译词确定为所述目标原词的目标译词;
所述第二确定单元响应于所述第一概率大于所述第二概率,将所述第一译词确定为所述目标原词的目标译词;
所述第二确定单元响应于所述第一概率等于所述第二概率,将所述第一译词或所述第二译词确定为所述目标原词的目标译词。
可选的,所述装置还包括:
第一输入单元,用于将所述第一原文语句输入到翻译模型中,得到所述第一概率。
可选的,所述装置还包括:
第二输入单元,用于将所述第二原文语句和所述第二译文语句输入到所述翻译模型中,得到所述第二概率。
可选的,所述第二确定单元根据所述目标原词的第一概率和第二概率确定所述目标原词的目标译词,包括:
所述第二确定单元根据所述第二概率和第三概率得到第四概率,所述第三概率为将所述第一原文语句输入到所述翻译模型中得到的所述目标原词被翻译为所述第二译词的概率;
所述第二确定单元根据所述第一概率和所述第四概率确定所述目标原词的目标译词。
基于以上实施例提供的一种翻译方法,本申请实施例还提供了一种翻译设备,翻译设备400包括:
处理器410和存储器420,处理器的数量可以一个或多个。在本申请的一些实施例中,处理器和存储器可通过总线或其它方式连接。
存储器可以包括只读存储器和随机存取存储器,并向处理器提供指令和数据。存储器的一部分还可以包括NVRAM。存储器存储有操作系统和操作指令、可执行模块或者数据结构,或者它们的子集,或者它们的扩展集,其中,操作指令可包括各种操作指令,用于实现各种操作。操作系统可包括各种系统程序,用于实现各种基础业务以及处理基于硬件的任务。
处理器控制终端设备的操作,处理器还可以称为CPU。
上述本申请实施例揭示的方法可以应用于处理器中,或者由处理器实现。处理器可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器可以是通用处理器、DSP、ASIC、FPGA或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法的步骤。
本申请实施例还提供一种计算机可读存储介质,用于存储程序代码,该程序代码用于执行前述各个实施例的一种翻译方法中的任意一种实施方式。
当介绍本申请的各种实施例的元件时,冠词“一”、“一个”、“这个”和“所述”都意图表示有一个或多个元件。词语“包括”、“包含”和“具有”都是包括性的并意味着除了列出的元件之外,还可以有其它元件。
需要说明的是,本领域普通技术人员可以理解实现上述方法实施例中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法实施例的流程。其中,所述存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)或随机存储记忆体(Random Access Memory,RAM)等。
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于装置实施例而言,由于其基本相似于方法实施例,所以描述得比较简单,相关之处参见方法实施例的部分说明即可。以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元及模块可以是或者也可以不是物理上分开的。另外,还可以根据实际的需要选择其中的部分或者全部单元和模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。
以上所述仅是本申请的具体实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本申请原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本申请的保护范围。

Claims (12)

  1. 一种翻译方法,其特征在于,所述方法包括:
    获取目标句对,所述目标句对包括第一原文语句和第一译文语句,所述第一译文语句为所述第一原文语句的译文;
    根据所述第一原文语句获取参考句对,所述参考句对包括第二原文语句和第二译文语句,所述第二原文语句和所述第一原文语句在语义上相似,所述第二译文语句为所述第二原文语句的译文;
    确定所述第一原文语句包括的目标原词,所述目标原词还属于所述第二原文语句;
    响应于第一译词和第二译词不同,根据所述目标原词的第一概率和第二概率确定所述目标原词的目标译词,所述第一译词为所述目标原词在所述第一译文语句中对应的词,所述第二译词为所述目标原词在所述第二译文语句中对应的词,所述第一概率为所述目标原词在所述第一译文语句中被翻译为所述第一译词的概率,所述第二概率为所述目标原词在所述第二译文语句中被翻译为所述第二译词的概率;
    根据所述目标原词的目标译词确定所述第一原文语句的目标译文语句。
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述目标原词的第一概率和第二概率确定所述目标原词的目标译词,包括下述至少一项:
    响应于所述第一概率小于所述第二概率,将所述第二译词确定为所述目标原词的目标译词;
    响应于所述第一概率大于所述第二概率,将所述第一译词确定为所述目标原词的目标译词;
    响应于所述第一概率等于所述第二概率,将所述第一译词或所述第二译词确定为所述目标原词的目标译词。
  3. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    将所述第一原文语句输入到翻译模型中,得到所述翻译模型输出的所述第一概率。
  4. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    将所述第二原文语句和所述第二译文语句输入到翻译模型中,得到所述翻译模型输出的所述第二概率。
  5. 根据权利要求3所述的方法,其特征在于,所述根据所述目标原词的第一概率和第二概率确定所述目标原词的目标译词,包括:
    根据所述第二概率和第三概率得到第四概率,所述第三概率为将所述第一原文语句输入到所述翻译模型中得到的所述目标原词被翻译为所述第二译词的概率;
    根据所述第一概率和所述第四概率确定所述目标原词的目标译词。
  6. 一种翻译装置,其特征在于,所述装置包括:
    第一获取单元,用于获取目标句对,所述目标句对包括第一原文语句和第一译文语句,所述第一译文语句为所述第一原文语句的译文;
    第二获取单元,用于根据所述第一原文语句获取参考句对,所述参考句对包括第二原文语句和第二译文语句,所述第二原文语句和所述第一原文语句在语义上相似,所述第二译文语句为所述第二原文语句的译文;
    第一确定单元,用于确定所述第一原文语句包括的目标原词,所述目标原词还属于所述第二原文语句;
    第二确定单元,用于响应于第一译词和第二译词不同,根据所述目标原词的第一概率和第二概率确定所述目标原词的目标译词,所述第一译词为所述目标原词在所述第一译文语句中对应的词,所述第二译词为所述目标原词在所述第二译文语句中对应的词,所述第一概率为所述目标原词在所述第一译文语句中被翻译为所述第一译词的概率,所述第二概率为所述目标原词在所述第二译文语句中被翻译为所述第二译词的概率;
    第三确定单元,用于根据所述目标原词的目标译词确定所述第一原文语句的目标译文语句。
  7. 根据权利要求6所述的装置,其特征在于,所述第二确定单元根据所述目标原词的第一概率和第二概率确定所述目标原词的目标译词,包括下述至少一项:
    所述第二确定单元响应于所述第一概率小于所述第二概率,将所述第二译词确定为所述目标原词的目标译词;
    所述第二确定单元响应于所述第一概率大于所述第二概率,将所述第一译词确定为所述目标原词的目标译词;
    所述第二确定单元响应于所述第一概率等于所述第二概率,将所述第一译词或所述第二译词确定为所述目标原词的目标译词。
  8. 根据权利要求6所述的装置,其特征在于,所述装置还包括:
    第一输入单元,用于将所述第一原文语句输入到翻译模型中,得到所述翻译模型输出的所述第一概率。
  9. 根据权利要求6所述的装置,其特征在于,所述装置还包括:
    第二输入单元,用于将所述第二原文语句和所述第二译文语句输入到翻译模型中,得到所述翻译模型输出的所述第二概率。
  10. 根据权利要求8所述的装置,其特征在于,所述第二确定单元根据所述目标原词的第一概率和第二概率确定所述目标原词的目标译词,包括:
    所述第二确定单元根据所述第二概率和第三概率得到第四概率,所述第三概率为将所述第一原文语句输入到所述翻译模型中得到的所述目标原词被翻译为所述第二译词的概率;
    所述第二确定单元根据所述第一概率和所述第四概率确定所述目标原词的目标译词。
  11. 一种翻译设备,其特征在于,所述设备包括:处理器和存储器;
    所述存储器,用于存储指令;
    所述处理器,用于执行所述存储器中的所述指令,执行如权利要求1至5中任一项所述的方法。
  12. 一种计算机可读存储介质,包括指令,当其在计算机上运行时,使得计算机执行如权利要求1-5任意一项所述的方法。
PCT/CN2022/092392 2021-05-21 2022-05-12 一种翻译方法、翻译装置、翻译设备以及存储介质 WO2022242535A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110560294.3 2021-05-21
CN202110560294.3A CN113191163B (zh) 2021-05-21 2021-05-21 一种翻译方法、翻译装置、翻译设备以及存储介质

Publications (1)

Publication Number Publication Date
WO2022242535A1 true WO2022242535A1 (zh) 2022-11-24

Family

ID=76984701

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/092392 WO2022242535A1 (zh) 2021-05-21 2022-05-12 一种翻译方法、翻译装置、翻译设备以及存储介质

Country Status (2)

Country Link
CN (1) CN113191163B (zh)
WO (1) WO2022242535A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113191163B (zh) * 2021-05-21 2023-06-30 北京有竹居网络技术有限公司 一种翻译方法、翻译装置、翻译设备以及存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1801141A (zh) * 2004-06-24 2006-07-12 夏普株式会社 一种基于现有译文的储存库的翻译方法及设备
CN101320366A (zh) * 2007-06-07 2008-12-10 株式会社东芝 用于机器翻译的装置和方法
CN103729347A (zh) * 2012-10-10 2014-04-16 株式会社东芝 机器翻译装置、方法及程序
CN108874785A (zh) * 2018-06-01 2018-11-23 清华大学 一种翻译处理方法及系统
CN110991196A (zh) * 2019-12-18 2020-04-10 北京百度网讯科技有限公司 多义词的翻译方法、装置、电子设备及介质
US20200302124A1 (en) * 2017-12-18 2020-09-24 Panasonic Intellectual Property Management Co., Ltd. Translation device, translation method, and program
CN113191163A (zh) * 2021-05-21 2021-07-30 北京有竹居网络技术有限公司 一种翻译方法、翻译装置、翻译设备以及存储介质

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5148583B2 (ja) * 2009-10-27 2013-02-20 株式会社東芝 機械翻訳装置、方法及びプログラム
CN107977356B (zh) * 2017-11-21 2019-10-25 新疆科大讯飞信息科技有限责任公司 识别文本纠错方法及装置
CN111401080A (zh) * 2018-12-14 2020-07-10 波音公司 神经机器翻译方法以及神经机器翻译装置
CN109710952B (zh) * 2018-12-27 2023-06-16 北京百度网讯科技有限公司 基于人工智能的翻译历史检索方法、装置、设备和介质
CN110175336B (zh) * 2019-05-22 2021-05-28 北京百度网讯科技有限公司 翻译方法、装置和电子设备

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1801141A (zh) * 2004-06-24 2006-07-12 夏普株式会社 一种基于现有译文的储存库的翻译方法及设备
CN101320366A (zh) * 2007-06-07 2008-12-10 株式会社东芝 用于机器翻译的装置和方法
CN103729347A (zh) * 2012-10-10 2014-04-16 株式会社东芝 机器翻译装置、方法及程序
US20200302124A1 (en) * 2017-12-18 2020-09-24 Panasonic Intellectual Property Management Co., Ltd. Translation device, translation method, and program
CN108874785A (zh) * 2018-06-01 2018-11-23 清华大学 一种翻译处理方法及系统
CN110991196A (zh) * 2019-12-18 2020-04-10 北京百度网讯科技有限公司 多义词的翻译方法、装置、电子设备及介质
CN113191163A (zh) * 2021-05-21 2021-07-30 北京有竹居网络技术有限公司 一种翻译方法、翻译装置、翻译设备以及存储介质

Also Published As

Publication number Publication date
CN113191163B (zh) 2023-06-30
CN113191163A (zh) 2021-07-30

Similar Documents

Publication Publication Date Title
US10380996B2 (en) Method and apparatus for correcting speech recognition result, device and computer-readable storage medium
TWI664540B (zh) Search word error correction method and device, and weighted edit distance calculation method and device
JP4331219B2 (ja) 二言語単語対応付けの方法および装置、二言語単語対応モデルを訓練する方法および装置
US9176952B2 (en) Computerized statistical machine translation with phrasal decoder
US20160092438A1 (en) Machine translation apparatus, machine translation method and program product for machine translation
WO2018120889A1 (zh) 输入语句的纠错方法、装置、电子设备及介质
CN109325229B (zh) 一种利用语义信息计算文本相似度的方法
CN110210043B (zh) 文本翻译方法、装置、电子设备及可读存储介质
CN111859987A (zh) 文本处理方法、目标任务模型的训练方法和装置
CN110232923B (zh) 一种语音控制指令生成方法、装置及电子设备
CN111753531A (zh) 一种基于人工智能的文本纠错方法、装置、计算机设备及存储介质
CN109635305B (zh) 语音翻译方法及装置、设备及存储介质
CN111402861A (zh) 一种语音识别方法、装置、设备及存储介质
WO2022242535A1 (zh) 一种翻译方法、翻译装置、翻译设备以及存储介质
CN112016271A (zh) 语言风格转换模型的训练方法、文本处理方法以及装置
CN111539199A (zh) 文本的纠错方法、装置、终端、及存储介质
CN111626065A (zh) 神经机器翻译模型的训练方法、装置及存储介质
CN113948066A (zh) 一种实时转译文本的纠错方法、系统、存储介质和装置
US11694041B2 (en) Chapter-level text translation method and device
CN112528598B (zh) 基于预训练语言模型和信息论的自动化文本摘要评测方法
CN110929514B (zh) 文本校对方法、装置、计算机可读存储介质及电子设备
CN112528628A (zh) 一种文本处理的方法、装置及电子设备
CN112632956A (zh) 文本匹配方法、装置、终端和存储介质
CN111178049A (zh) 一种文本修正方法、装置、可读介质及电子设备
CN115879480A (zh) 语义约束机器翻译方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22803857

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22803857

Country of ref document: EP

Kind code of ref document: A1