WO2020057001A1 - Machine translation engine recommendation method and apparatus - Google Patents

Machine translation engine recommendation method and apparatus Download PDF

Info

Publication number
WO2020057001A1
WO2020057001A1 PCT/CN2018/124891 CN2018124891W WO2020057001A1 WO 2020057001 A1 WO2020057001 A1 WO 2020057001A1 CN 2018124891 W CN2018124891 W CN 2018124891W WO 2020057001 A1 WO2020057001 A1 WO 2020057001A1
Authority
WO
WIPO (PCT)
Prior art keywords
machine translation
translation
translator
score
machine
Prior art date
Application number
PCT/CN2018/124891
Other languages
French (fr)
Chinese (zh)
Inventor
宋安琪
Original Assignee
语联网(武汉)信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 语联网(武汉)信息技术有限公司 filed Critical 语联网(武汉)信息技术有限公司
Publication of WO2020057001A1 publication Critical patent/WO2020057001A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • G06F18/2113Selection of the most significant subset of features by ranking or filtering the set of features, e.g. using a measure of variance or of feature cross-correlation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0631Resource planning, allocation, distributing or scheduling for enterprises or organisations
    • G06Q10/06311Scheduling, planning or task assignment for a person or group

Definitions

  • Embodiments of the present disclosure relate to the technical field of natural language processing, and more particularly, to a method and device for recommending a machine translation engine.
  • Embodiments of the present disclosure provide a method and device for recommending a machine translation engine that overcomes the above problems or at least partially solves the problems.
  • an embodiment of the present disclosure provides a method for recommending a machine translation engine, including:
  • the machine translation engine evaluation model is obtained after training based on a source sample and corresponding multiple machine translation target samples, and a predetermined score of the multiple machine translation target samples.
  • an embodiment of the present disclosure provides a machine translation engine recommendation device, including:
  • a translation module which is used to input the original text to be translated by a translator into multiple different machine translation engines for translation, and obtain multiple machine translation translations;
  • a prediction module configured to input the original text to be translated by the translator and the multiple machine translations into a pre-trained machine translation engine evaluation model, and obtain the multiple machine translations output by the machine translation engine evaluation model The score of the translation;
  • a recommendation module for recommending the highest-scoring machine translation to the translator
  • the machine translation engine evaluation model is obtained after training based on a source sample and corresponding multiple machine translation target samples, and a predetermined score of the multiple machine translation target samples.
  • an embodiment of the present disclosure provides an electronic device including a memory, a processor, and a computer program stored on the memory and executable on the processor.
  • the processor executes the program, the processor implements the program as described in the first aspect.
  • an embodiment of the present disclosure provides a non-transitory computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements the steps of the machine translation engine recommendation method provided by the first aspect. .
  • the method and device for recommending a machine translation engine provided by the embodiments of the present disclosure can build an evaluation model by learning a large number of manual translations of reference translations and machine translations, and evaluate multiple machine translation translations of documents to be translated, and recommend better translations for translators. Machine translations improve translation quality and efficiency.
  • FIG. 1 is a schematic flowchart of a machine translation engine recommendation method according to an embodiment of the present disclosure
  • FIG. 2 is a schematic flowchart of training an evaluation model of a machine translation engine according to an embodiment of the present disclosure
  • FIG. 3 is a schematic structural diagram of a machine translation engine recommendation device according to an embodiment of the present disclosure.
  • FIG. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.
  • FIG. 1 is a schematic flowchart of a machine translation engine recommendation method according to an embodiment of the present disclosure, as shown in the figure, including:
  • Step 100 Input the original text to be translated by a translator into multiple different machine translation engines for translation, and obtain multiple machine translation translations.
  • the existing machine translation engines include: Google, Baidu, Netease Youdao, etc.
  • the original text to be translated is input into different machine translation engines to obtain multiple machine translation translations.
  • Step 101 input the original text to be translated by the translator and the plurality of machine translation translations into a pre-trained machine translation engine evaluation model, and obtain the information of the plurality of machine translation translations output by the machine translation engine evaluation model. Score.
  • the input of the machine translation engine evaluation model is the original text to be translated by the translator and the obtained multiple machine translations, and the output is the score of the multiple machine translations.
  • the machine translation engine evaluation model has the function of predicting the score of the machine translation translation based on the original text and the corresponding multiple machine translation translations.
  • the machine translation engine evaluation model is obtained after training based on a source sample and corresponding multiple machine translation target samples, and a predetermined score of the multiple machine translation target samples.
  • the score of the machine translation reflects the similarity between the machine translation and the reference translation corresponding to the original.
  • Step 102 Recommend the highest-scoring machine translation to the translator
  • the scores of multiple machine translations are compared.
  • the highest score indicates that the machine translation and the reference translation have the highest similarity, that is, the machine translation with the highest score corresponds to the highest quality.
  • the machine score with the highest score is recommended to the machine translation.
  • the machine translation engine recommendation method provided by the embodiment of the present disclosure can build an evaluation model by learning a large number of manual translations of reference translations and machine translation translations, and evaluate multiple machine translation translations of documents to be translated, and recommend better machine translations for translators. Translation, thereby improving the quality and efficiency of translators.
  • FIG. 2 it is a schematic flowchart of training a machine translation engine evaluation model according to an embodiment of the present disclosure, that is, based on the content of the foregoing embodiment, the machine translation engine evaluation model is obtained by training as follows:
  • Step 200 Obtain a source sample and a reference translation corresponding to the source sample through a bilingual corpus, and call different machine translation engines to obtain multiple machine translation target samples corresponding to the source sample.
  • the reference translation corresponding to the original sample is a professional translator.
  • the existing bilingual corpus a large number of original translations and corresponding reference translations can be obtained.
  • the original text is input to multiple different machine translation engines for translation, and multiple machine translation translations are obtained.
  • Step 201 Calculate scores of a plurality of machine translation translation samples corresponding to the original sample according to a reference translation corresponding to the original sample.
  • the score of the machine translation translation sample is used to measure the similarity between the machine translation translation and the professional human translation result.
  • the higher the score the closer the machine translation translation is to the professional human translation result, the higher the quality of the machine translation translation. Therefore, the score of the machine translation can be obtained by the similarity between the computer translation and the reference translation.
  • a sample data set is established, where each sample includes the original text and corresponding multiple machine translation translations as inputs, and the scores of multiple machine translation translations are used as target outputs.
  • Step 202 Construct a deep learning network model, input the original sample and multiple machine translation target samples corresponding to the original sample into the deep learning network model for training, and perform training according to the multiple machine translation target samples output by the model.
  • the score and the calculated scores of the multiple machine translation translation samples are calculated to calculate a loss function, and the parameters of the deep learning network model are updated by a back propagation algorithm until a preset training end condition is satisfied, and the depth at the end of training is saved Learn the parameters of the network model and obtain the evaluation model of the machine translation engine.
  • a deep learning network model is constructed, the parameters of the deep learning network model are initialized, and the original sample and the corresponding multiple machine translation translation samples are used as inputs to the deep learning network model.
  • the scores of the two machine translations are used as the target of the model output, and the deep learning network is trained, and a loss function is calculated based on the predicted scores of the multiple machine translation translation samples output by the model and the target scores of the multiple machine translation translations.
  • the back-propagation algorithm is used to update the model parameters until the preset training end conditions are met. After training, a machine translation engine evaluation model is obtained.
  • the step of calculating the scores of multiple machine translation target samples corresponding to the original sample according to the reference translation corresponding to the original sample is specifically:
  • BLEU represents the BLEU score between the machine translation and the reference translation
  • Similarity represents the editing distance similarity between the machine translation and the reference translation
  • PosScore represents the similarity of the meaning word between the machine translation and the reference translation.
  • BLEU bilingual evaluation understudy
  • the accuracy of BLEU is not very good. Therefore, in the embodiment of the present disclosure, the BLEU is combined with the string editing distance similarity and the actual word similarity to measure the quality of the machine translation.
  • Equation (2) x represents the translator's reference translation, y represents the machine translation, Max (x, y) represents the maximum length between x and y, and Levenshtein (x, y) represents the Levenshtein distance between x and y;
  • the value of ⁇ 0 is 0.3
  • the value of ⁇ 1 is 0.3
  • the value of ⁇ 2 is 0.25
  • the value of ⁇ 3 0.15 count i represents the number of parts of speech in the reference translation
  • word j represents a certain word belonging to a certain part of speech in the reference translation
  • n count i -1
  • sim (word j ) represents the words and references in the machine translation The similarity of the same type of words in the translation. If the number of parts of speech in the reference translation is zero, then Take it as 1.
  • the time-of-flight camera calculates the distance of the object based on the measured time.
  • Time-of-Flight cameras calculate the distance of an object based on the measured time.
  • Time-of-flight cameras calculate distances from objects based on measured time.
  • the reference translation participle result is: flight (verb) time (noun) camera (noun) based on (preposition) measured (adjective) time (noun) to calculate (preposition) (verb) object (noun) distance (auxiliary) ( noun)
  • the result of the machine translation of the machine translation is: flight (verb) time (noun) camera (noun) calculates (verb) object (noun) distance (noun) from (adjective) time (noun) measured by (preposition)
  • the scores of multiple machine translation target samples calculated by formula (1) are more accurate.
  • the method further includes:
  • a loss function is calculated according to the recalculated score and the score output by the machine translation engine evaluation model, and the machine translation engine evaluation model is updated by a back propagation algorithm.
  • the translator edits the recommended machine translation to determine the final translation.
  • the translation is the reference translation confirmed by professional human translation.
  • the formula (1) may be used to recalculate the score of the machine translation translation recommended to the translator, and then calculate the loss function according to the recalculation score and the score of the machine translation translation output by the machine translation engine evaluation model, and update the translation function through a back propagation algorithm.
  • the evaluation model of machine translation engine is described.
  • the method for recommending a machine translation engine provided by the embodiment of the present disclosure can continuously update the evaluation model of the machine translation engine by continuously learning the final translation of the translator.
  • FIG. 3 it is a schematic structural diagram of a machine translation engine recommendation device according to an embodiment of the present disclosure, including a translation module 310, a prediction module 320, and a recommendation module 330.
  • the translation module 310 is configured to input the original text to be translated by a translator into a plurality of different machine translation engines for translation, and obtain a plurality of machine translation translations.
  • the existing machine translation engines include: Google, Baidu, Netease Youdao, etc.
  • the translation module 310 inputs the original text to be translated into different machine translation engines to obtain multiple machine translation translations.
  • the prediction module 320 is configured to input the original text to be translated by the translator and the multiple machine translations into a pre-trained machine translation engine evaluation model, and obtain the multiple machines output by the machine translation engine evaluation model. The score of the translation.
  • the input of the machine translation engine evaluation model is the original text to be translated by the translator and the obtained multiple machine translations, and the output is the score of the multiple machine translations.
  • the machine translation engine evaluation model has the function of predicting the score of the machine translation translation based on the original text and the corresponding multiple machine translation translations.
  • the machine translation engine evaluation model is obtained after training based on a source sample and corresponding multiple machine translation target samples, and a predetermined score of the multiple machine translation target samples.
  • the score of the machine translation reflects the similarity between the machine translation and the reference translation corresponding to the original.
  • a recommendation module 330 configured to recommend a machine translation with the highest score to the translator
  • the recommendation module 330 compares the scores of multiple machine translations.
  • the highest score indicates that the machine translation and the reference translation have the highest similarity, that is, the machine translation with the highest score corresponds to the highest quality.
  • the translation is recommended to the translator for the post-editing based on the machine translation, to form the final translation draft, so as to improve the translation quality and efficiency of the translator
  • the machine translation engine recommendation device provided by the embodiment of the present disclosure can build an evaluation model by learning a large number of manual translations of reference translations and machine translation translations, and evaluate multiple machine translation translations of documents to be translated to recommend better machine translations for translators. Translation, thereby improving the quality and efficiency of translators.
  • the device further includes a training module, and the training module is specifically configured to:
  • Construct a deep learning network model input the original sample and multiple machine translation target samples corresponding to the original sample into the deep learning network model for training, and calculate the scores and calculations of the multiple machine translation target samples output by the model Calculate a loss function for the scores of the multiple machine translation translation samples obtained, and update the parameters of the deep learning network model by a back-propagation algorithm until a preset training end condition is met, and save the deep learning network model at the end of training Parameters of the machine translation engine evaluation model.
  • the reference translation corresponding to the original sample is a professional translator.
  • the existing bilingual corpus a large number of original translations and corresponding reference translations can be obtained.
  • the original text is input to multiple different machine translation engines for translation, and multiple machine translation translations are obtained.
  • the score of the machine translation sample is used to measure the similarity between the machine translation and the professional human translation result. The higher the score, the closer the machine translation is to the professional human translation result, the higher the quality of the machine translation. Therefore, the score of the machine translation can be obtained by the similarity between the computer translation and the reference translation.
  • a sample data set is established, where each sample includes the original text and corresponding multiple machine translation translations as inputs, and the scores of multiple machine translation translations are used as target outputs.
  • the training module builds a deep learning network model based on the acquired sample set, initializes the parameters of the deep learning network model, and uses the original sample and corresponding multiple machine translation translation samples as input to the deep learning network model.
  • the score of the machine translation translation is used as the target of the model output.
  • the deep learning network is trained and the loss function is calculated based on the predicted scores of the multiple machine translation translation samples output by the model and the target scores of the multiple machine translation translations.
  • the back-propagation algorithm updates the model parameters until the preset training end conditions are met. After training, a machine translation engine evaluation model is obtained.
  • the training module is specifically configured to:
  • BLEU represents the BLEU score between the machine translation and the reference translation
  • Similarity represents the editing distance similarity between the machine translation and the reference translation
  • PosScore represents the similarity of the meaning word between the machine translation and the reference translation.
  • BLEU bilingual evaluation understudy
  • the accuracy of BLEU is not very good. Therefore, in the embodiment of the present disclosure, the BLEU is combined with the string editing distance similarity and the actual word similarity to measure the quality of the machine translation.
  • Equation (2) x represents the translator's reference translation, y represents the machine translation, Max (x, y) represents the maximum length between x and y, and Levenshtein (x, y) represents the Levenshtein distance between x and y;
  • the value of ⁇ 0 is 0.3
  • the value of ⁇ 1 is 0.3
  • the value of ⁇ 2 is 0.25
  • the value of ⁇ 3 0.15 count i represents the number of parts of speech in the reference translation
  • word j represents a certain word belonging to a certain part of speech in the reference translation
  • n count i -1
  • sim (word j ) represents the words and references in the machine translation The similarity of the same type of words in the translation. If the number of parts of speech in the reference translation is zero, then Take it as 1.
  • the apparatus further includes an update module, and the update module is specifically configured to:
  • a loss function is calculated according to the recalculated score and the score output by the machine translation engine evaluation model, and the machine translation engine evaluation model is updated by a back propagation algorithm.
  • the translator edits the recommended machine translation to determine the final translation.
  • the translation is the reference translation confirmed by professional human translation.
  • the update module is based on the The translation manuscript can use formula (1) to recalculate the score of the machine translation translation recommended to the translator, and then calculate the loss function based on the recalculation score and the score of the machine translation translation output by the machine translation engine evaluation model, and use the back propagation algorithm Update the machine translation engine evaluation model.
  • FIG. 4 is a schematic diagram of a physical structure of an electronic device according to an embodiment of the present disclosure.
  • the electronic device may include a processor 410, a communications interface 420, a memory 430, and communication.
  • the bus 440 wherein the processor 410, the communication interface 420, and the memory 430 complete communication with each other through the communication bus 440.
  • the processor 410 may call a computer program stored on the memory 430 and run on the processor 410 to execute the method of recommending a machine translation engine provided by the foregoing embodiments, for example, including: inputting a text to be translated by a translator into a plurality of different The machine translation engine performs translation to obtain multiple machine translation translations; the original text to be translated by the translator and the multiple machine translation translations are input into a pre-trained machine translation engine evaluation model to obtain the machine translation engine evaluation model Output the scores of the multiple machine translations; recommend the highest scoring machine translation to the translator; wherein the machine translation engine evaluation model is based on the original sample and the corresponding multiple machine translation translation samples, and The determined scores of the plurality of machine translation translation samples are obtained after training.
  • the logic instructions in the memory 430 may be implemented in the form of a software functional unit and sold or used as an independent product, and may be stored in a computer-readable storage medium.
  • the technical solution of the embodiments of the present disclosure is essentially a part that contributes to the existing technology or a part of the technical solution may be embodied in the form of a software product, which is stored in a storage medium.
  • Including a plurality of instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in various embodiments of the present disclosure.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disks or optical disks and other media that can store program codes .
  • An embodiment of the present disclosure also provides a non-transitory computer-readable storage medium on which a computer program is stored.
  • the method for recommending a machine translation engine provided by the foregoing embodiments is implemented, for example:
  • the original text translated by the translator is input into multiple different machine translation engines for translation, and multiple machine translation translations are obtained;
  • the original text to be translated by the translator and the multiple machine translation translations are input into a pre-trained machine translation engine evaluation model
  • To obtain the scores of the multiple machine translation translations output by the machine translation engine evaluation model recommend the machine translation translation with the highest score to the translator; wherein the machine translation engine evaluation model is based on the original sample and the corresponding Multiple machine translation target samples and a predetermined score of the multiple machine translation target samples obtained after training.
  • the device embodiments described above are only schematic, wherein the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, may be located One place, or it can be distributed across multiple network elements. Some or all of the modules may be selected according to actual needs to achieve the objective of the solution of this embodiment. Those of ordinary skill in the art can understand and implement without creative labor.
  • each embodiment can be implemented by means of software plus a necessary universal hardware platform, and of course, also by hardware.
  • the above-mentioned technical solution essentially or part that contributes to the existing technology can be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as ROM / RAM, magnetic A disc, an optical disc, and the like include several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in various embodiments or certain parts of the embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Economics (AREA)
  • General Engineering & Computer Science (AREA)
  • Game Theory and Decision Science (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Marketing (AREA)
  • Educational Administration (AREA)
  • Development Economics (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

A machine translation engine recommendation method and apparatus. The method comprises: inputting an original text to be translated by a translator into a plurality of different machine translation engines for translation, so as to obtain a plurality of machine translation texts (100); inputting the original text to be translated by a translator and the plurality of machine translation texts into a pretrained machine translation engine evaluation model, so as to obtain scores of the plurality of machine translation texts output by the machine translation engine evaluation model (101); and recommending a machine translation text with the highest score to the translator (102), wherein the machine translation engine evaluation model is obtained after training based on original text samples and a plurality of corresponding machine translation text samples as well as predetermined scores of the plurality of machine translation text samples. By means of evaluating a plurality of machine translation texts of a document to be translated, a better machine translation text is recommended to a translator, such that the translation quality and translation efficiency of the translator are improved.

Description

机器翻译引擎推荐方法及装置Machine translation engine recommendation method and device
交叉引用cross reference
本申请引用于2018年11月27日提交的专利名称为“机器翻译引擎推荐方法及装置”的第2018114261931号中国专利申请,其通过引用被全部并入本申请。This application refers to Chinese Patent Application No. 2018114261931, entitled "Machine Translation Engine Recommended Method and Device", filed on November 27, 2018, which is incorporated by reference in its entirety.
技术领域Technical field
本公开实施例涉及自然语言处理技术领域,更具体地,涉及一种机器翻译引擎推荐方法及装置。Embodiments of the present disclosure relate to the technical field of natural language processing, and more particularly, to a method and device for recommending a machine translation engine.
背景技术Background technique
随着人工智能的发展,机器翻译的质量不断的提高,基于机器翻译的后编辑成为了译员翻译的一种新趋势。但是,主流的机器翻译引擎不是在任何领域都翻译最好,一些其他的机器翻译引擎在某些领域也有自己的特色。With the development of artificial intelligence and the continuous improvement of the quality of machine translation, post-editing based on machine translation has become a new trend of translator translation. However, mainstream machine translation engines are not the best in any field. Some other machine translation engines also have their own characteristics in some fields.
目前已有多个比较流行的机器翻译引擎,在进行翻译时,译员所选择的机器翻译引擎是一个影响译员翻译质量的重要因素。因此,如何提供一种方法能够根据待翻译的原文向译员推荐合适的机器翻译引擎,从而提高翻译质量,显得尤为重要。At present, there are many popular machine translation engines. When translating, the machine translation engine selected by the translator is an important factor affecting the quality of the translator. Therefore, how to provide a method that can recommend a suitable machine translation engine to the translator based on the original text to be translated, so as to improve the translation quality, is particularly important.
发明内容Summary of the Invention
本公开实施例提供一种克服上述问题或者至少部分地解决上述问题的机器翻译引擎推荐方法及装置。Embodiments of the present disclosure provide a method and device for recommending a machine translation engine that overcomes the above problems or at least partially solves the problems.
第一方面,本公开实施例提供一种机器翻译引擎推荐方法,包括:In a first aspect, an embodiment of the present disclosure provides a method for recommending a machine translation engine, including:
将待译员翻译的原文输入多个不同的机器翻译引擎进行翻译,获得多个机器翻译译文;Input the original text to be translated by a translator into multiple different machine translation engines for translation and obtain multiple machine translation translations;
将所述待译员翻译的原文以及所述多个机器翻译译文输入至预先训练好的机器翻译引擎评估模型中,获取所述机器翻译引擎评估模型输出的所述多个机器翻译译文的得分;Inputting the original text to be translated by the translator and the plurality of machine translation translations into a pre-trained machine translation engine evaluation model to obtain scores of the plurality of machine translation translations output by the machine translation engine evaluation model;
将得分最高的机器翻译译文推荐给所述译员;Recommending the highest-scoring machine translation to the translator;
其中,所述机器翻译引擎评估模型是基于原文样本和对应的多个机器翻译译文样本,以及预先确定的所述多个机器翻译译文样本的得分进行训练后获得的。The machine translation engine evaluation model is obtained after training based on a source sample and corresponding multiple machine translation target samples, and a predetermined score of the multiple machine translation target samples.
第二方面,本公开实施例提供一种机器翻译引擎推荐装置,包括:In a second aspect, an embodiment of the present disclosure provides a machine translation engine recommendation device, including:
翻译模块,用于将待译员翻译的原文输入多个不同的机器翻译引擎进行翻译,获得多个机器翻译译文;A translation module, which is used to input the original text to be translated by a translator into multiple different machine translation engines for translation, and obtain multiple machine translation translations;
预测模块,用于将所述待译员翻译的原文以及所述多个机器翻译译文输入至预先训练好的机器翻译引擎评估模型中,获取所述机器翻译引擎评估模型输出的所述多个机器翻译译文的得分;A prediction module, configured to input the original text to be translated by the translator and the multiple machine translations into a pre-trained machine translation engine evaluation model, and obtain the multiple machine translations output by the machine translation engine evaluation model The score of the translation;
推荐模块,用于将得分最高的机器翻译译文推荐给所述译员;A recommendation module for recommending the highest-scoring machine translation to the translator;
其中,所述机器翻译引擎评估模型是基于原文样本和对应的多个机器翻译译文样本,以及预先确定的所述多个机器翻译译文样本的得分进行训练后获得的。The machine translation engine evaluation model is obtained after training based on a source sample and corresponding multiple machine translation target samples, and a predetermined score of the multiple machine translation target samples.
第三方面,本公开实施例提供一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现如第一方面所提供的机器翻译引擎推荐方法的步骤。According to a third aspect, an embodiment of the present disclosure provides an electronic device including a memory, a processor, and a computer program stored on the memory and executable on the processor. When the processor executes the program, the processor implements the program as described in the first aspect. Provides steps for a machine translation engine recommendation method.
第四方面,本公开实施例提供一种非暂态计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现如第一方面所提供的机器翻译引擎推荐方法的步骤。According to a fourth aspect, an embodiment of the present disclosure provides a non-transitory computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements the steps of the machine translation engine recommendation method provided by the first aspect. .
本公开实施例提供的机器翻译引擎推荐方法及装置,能够通过大量学习人工翻译的参考译文和机器翻译译文建立评估模型,通过对待翻译文档的多个机器翻译译文进行评估,为译员推荐更好的机器翻译译文,从而提升译员翻译质量和翻译效率。The method and device for recommending a machine translation engine provided by the embodiments of the present disclosure can build an evaluation model by learning a large number of manual translations of reference translations and machine translations, and evaluate multiple machine translation translations of documents to be translated, and recommend better translations for translators. Machine translations improve translation quality and efficiency.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
为了更清楚地说明本公开实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to explain the technical solutions in the embodiments of the present disclosure or the prior art more clearly, the drawings used in the embodiments or the description of the prior art will be briefly introduced below. Obviously, the drawings in the following description These are some embodiments of the present disclosure. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without paying creative labor.
图1为本公开实施例提供的机器翻译引擎推荐方法的流程示意图;1 is a schematic flowchart of a machine translation engine recommendation method according to an embodiment of the present disclosure;
图2为本公开实施例提供的训练机器翻译引擎评估模型的流程示意图;2 is a schematic flowchart of training an evaluation model of a machine translation engine according to an embodiment of the present disclosure;
图3为本公开实施例提供的机器翻译引擎推荐装置的结构示意图;3 is a schematic structural diagram of a machine translation engine recommendation device according to an embodiment of the present disclosure;
图4为本公开实施例提供的电子设备的实体结构示意图。FIG. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.
具体实施方式detailed description
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本公开一部分实施例,而不是全部的实施例。基于本公开中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。In order to make the objectives, technical solutions, and advantages of the embodiments of the present disclosure more clear, the technical solutions in the embodiments of the present disclosure will be described clearly and completely in combination with the drawings in the embodiments of the present disclosure. Obviously, the described embodiments These embodiments are part of, but not all of, the embodiments of the present disclosure. Based on the embodiments in the present disclosure, all other embodiments obtained by a person of ordinary skill in the art without making creative efforts fall within the protection scope of the present disclosure.
图1为本公开实施例提供的机器翻译引擎推荐方法的流程示意图,如图所示,包括:FIG. 1 is a schematic flowchart of a machine translation engine recommendation method according to an embodiment of the present disclosure, as shown in the figure, including:
步骤100、将待译员翻译的原文输入多个不同的机器翻译引擎进行翻译,获得多个机器翻译译文。Step 100: Input the original text to be translated by a translator into multiple different machine translation engines for translation, and obtain multiple machine translation translations.
现有的机器翻译引擎包括:谷歌、百度、网易有道等,将待翻译的原文输入不同的机器翻译引擎,获得多个机器翻译译文。The existing machine translation engines include: Google, Baidu, Netease Youdao, etc. The original text to be translated is input into different machine translation engines to obtain multiple machine translation translations.
步骤101、将所述待译员翻译的原文以及所述多个机器翻译译文输入至预先训练好的机器翻译引擎评估模型中,获取所述机器翻译引擎评估模型输出的所述多个机器翻译译文的得分。Step 101: input the original text to be translated by the translator and the plurality of machine translation translations into a pre-trained machine translation engine evaluation model, and obtain the information of the plurality of machine translation translations output by the machine translation engine evaluation model. Score.
具体地,机器翻译引擎评估模型的输入为待译员翻译的原文和所获得的多个机器翻译译文,输出为多个机器翻译译文的得分。Specifically, the input of the machine translation engine evaluation model is the original text to be translated by the translator and the obtained multiple machine translations, and the output is the score of the multiple machine translations.
机器翻译引擎评估模型具有基于原文和对应的多个机器翻译译文对机器翻译译文的得分进行预测的功能。The machine translation engine evaluation model has the function of predicting the score of the machine translation translation based on the original text and the corresponding multiple machine translation translations.
其中,所述机器翻译引擎评估模型是基于原文样本和对应的多个机器翻译译文样本,以及预先确定的所述多个机器翻译译文样本的得分进行训练后获得的。The machine translation engine evaluation model is obtained after training based on a source sample and corresponding multiple machine translation target samples, and a predetermined score of the multiple machine translation target samples.
值得说明的是,机器翻译译文的得分反映了机器翻译译文与原文所对应的参考译文之间的相似度。It is worth noting that the score of the machine translation reflects the similarity between the machine translation and the reference translation corresponding to the original.
步骤102、将得分最高的机器翻译译文推荐给所述译员;Step 102: Recommend the highest-scoring machine translation to the translator;
具体地,对多个机器翻译译文的得分进行比较,得分最高说明机器翻译译文与参考译文的相似度最高,即得分最高所对应的机器翻译译文的质量最高,将得分最高的机器翻译译文推荐给译员,以供译员在此机器翻译 译文的基础上进行后编辑,形成最终的翻译稿,从而提升译员翻译质量和翻译效率。Specifically, the scores of multiple machine translations are compared. The highest score indicates that the machine translation and the reference translation have the highest similarity, that is, the machine translation with the highest score corresponds to the highest quality. The machine score with the highest score is recommended to the machine translation. Translators for post-editing on the basis of this machine-translated translation by the translator to form the final translation draft, thereby improving the translator's translation quality and translation efficiency.
本公开实施例提供的机器翻译引擎推荐方法,能够通过大量学习人工翻译的参考译文和机器翻译译文建立评估模型,通过对待翻译文档的多个机器翻译译文进行评估,为译员推荐更好的机器翻译译文,从而提升译员翻译质量和翻译效率。The machine translation engine recommendation method provided by the embodiment of the present disclosure can build an evaluation model by learning a large number of manual translations of reference translations and machine translation translations, and evaluate multiple machine translation translations of documents to be translated, and recommend better machine translations for translators. Translation, thereby improving the quality and efficiency of translators.
如图2所示,为本公开实施例提供的训练机器翻译引擎评估模型的流程示意图,即基于上述实施例的内容,所述机器翻译引擎评估模型采用如下方法训练得到:As shown in FIG. 2, it is a schematic flowchart of training a machine translation engine evaluation model according to an embodiment of the present disclosure, that is, based on the content of the foregoing embodiment, the machine translation engine evaluation model is obtained by training as follows:
步骤200、通过双语语料库获取原文样本以及所述原文样本对应的参考译文,并调用不同机器翻译引擎获取所述原文样本对应的多个机器翻译译文样本。Step 200: Obtain a source sample and a reference translation corresponding to the source sample through a bilingual corpus, and call different machine translation engines to obtain multiple machine translation target samples corresponding to the source sample.
具体地,原文样本对应的参考译文是专业翻译人员翻译的人员,通过现有的双语语料库可以获取到大量的原文以及对应的参考译文。将原文输入到多个不同的机器翻译引擎进行翻译,获得多个机器翻译译文。Specifically, the reference translation corresponding to the original sample is a professional translator. Through the existing bilingual corpus, a large number of original translations and corresponding reference translations can be obtained. The original text is input to multiple different machine translation engines for translation, and multiple machine translation translations are obtained.
步骤201、根据所述原文样本对应的参考译文,计算所述原文样本对应的多个机器翻译译文样本的得分。Step 201: Calculate scores of a plurality of machine translation translation samples corresponding to the original sample according to a reference translation corresponding to the original sample.
具体地,机器翻译译文样本的得分用于衡量机器翻译译文与专业人工翻译结果之间的相似度,得分越高说明机器翻译译文越接近专业人工翻译结果,机器翻译译文的质量就越高。因此,机器翻译译文的得分可以通过计算机器翻译译文与参考译文之间的相似度来获得。Specifically, the score of the machine translation translation sample is used to measure the similarity between the machine translation translation and the professional human translation result. The higher the score, the closer the machine translation translation is to the professional human translation result, the higher the quality of the machine translation translation. Therefore, the score of the machine translation can be obtained by the similarity between the computer translation and the reference translation.
计算获得多个机器翻译译文样本的得分后,建立样本数据集,其中每个样本包括原文以及对应的多个机器翻译译文作为输入,多个机器翻译译文的得分作为目标输出。After calculating the scores of multiple machine translation translation samples, a sample data set is established, where each sample includes the original text and corresponding multiple machine translation translations as inputs, and the scores of multiple machine translation translations are used as target outputs.
步骤202、构建深度学习网络模型,将所述原文样本以及所述原文样本对应的多个机器翻译译文样本输入所述深度学习网络模型进行训练,根据模型输出的所述多个机器翻译译文样本的得分和计算获得的所述多个机器翻译译文样本的得分计算损失函数,通过反向传播算法更新所述深度学习网络模型的参数,直至满足预设的训练结束条件,保存训练结束时所 述深度学习网络模型的参数,获得机器翻译引擎评估模型。Step 202: Construct a deep learning network model, input the original sample and multiple machine translation target samples corresponding to the original sample into the deep learning network model for training, and perform training according to the multiple machine translation target samples output by the model. The score and the calculated scores of the multiple machine translation translation samples are calculated to calculate a loss function, and the parameters of the deep learning network model are updated by a back propagation algorithm until a preset training end condition is satisfied, and the depth at the end of training is saved Learn the parameters of the network model and obtain the evaluation model of the machine translation engine.
具体地,基于所获取到的样本集,构建深度学习网络模型,初始化深度学习网络模型的参数,将原文样本和对应的多个机器翻译译文样本作为深度学习网络模型的输入,将计算获得的多个机器翻译译文的得分作为模型输出的目标,开始对深度学习网络网络进行训练,并根据模型输出的多个机器翻译译文样本的预测得分和所述多个机器翻译译文的目标得分计算损失函数,通过反向传播算法更新模型参数,直至满足预先设置的训练结束条件,训练结束后,获得机器翻译引擎评估模型。Specifically, based on the obtained sample set, a deep learning network model is constructed, the parameters of the deep learning network model are initialized, and the original sample and the corresponding multiple machine translation translation samples are used as inputs to the deep learning network model. The scores of the two machine translations are used as the target of the model output, and the deep learning network is trained, and a loss function is calculated based on the predicted scores of the multiple machine translation translation samples output by the model and the target scores of the multiple machine translation translations. The back-propagation algorithm is used to update the model parameters until the preset training end conditions are met. After training, a machine translation engine evaluation model is obtained.
基于上述实施例的内容,所述根据所述原文样本对应的参考译文,计算所述原文样本对应的多个机器翻译译文样本的得分的步骤,具体为:Based on the content of the above embodiment, the step of calculating the scores of multiple machine translation target samples corresponding to the original sample according to the reference translation corresponding to the original sample is specifically:
根据所述原文样本对应的参考译文,应用如下公式计算所述原文样本对应的多个机器翻译译文样本的得分:According to the reference translation corresponding to the original sample, the following formula is used to calculate the scores of multiple machine translation target samples corresponding to the original sample:
Score=(BLEU+Similarity+PosScore)/3  (1)Score = (BLEU + Similarity + PosScore) / 3 (1)
其中,BLEU表示机器翻译译文与参考译文之间的BLEU得分,Similarity表示机器翻译译文与参考译文之间的编辑距离相似度,PosScore表示机器翻译译文与参考译文之间的实意词相似度。Among them, BLEU represents the BLEU score between the machine translation and the reference translation, Similarity represents the editing distance similarity between the machine translation and the reference translation, and PosScore represents the similarity of the meaning word between the machine translation and the reference translation.
具体地,BLEU(bilingual evaluation understudy,双语互译质量辅助工具)是一个常用的用于衡量机器翻译文本的指标。但是,BLEU的准确度不太好。因此,在本公开实施例中将BLEU与字符串编辑距离相似度和实意词相似度结合起来共同衡量机器翻译译文的质量。Specifically, BLEU (bilingual evaluation understudy) is a commonly used index for measuring machine translation text. However, the accuracy of BLEU is not very good. Therefore, in the embodiment of the present disclosure, the BLEU is combined with the string editing distance similarity and the actual word similarity to measure the quality of the machine translation.
编辑距离相似度计算公式为:The formula for calculating distance similarity is:
Similarity=(Max(x,y)-Levenshtein(x,y))/Max(x,y)  (2)Similarity = (Max (x, y) -Levenshtein (x, y)) / Max (x, y) (2)
式(2)中x表示译员参考译文,y表示机器翻译译文,Max(x,y)表示x和y间的最大长度,Levenshtein(x,y)表示x和y之间的Levenshtein距离;In Equation (2), x represents the translator's reference translation, y represents the machine translation, Max (x, y) represents the maximum length between x and y, and Levenshtein (x, y) represents the Levenshtein distance between x and y;
实意词相似度计算公式为:The calculation formula for similarity of intentional words is:
Figure PCTCN2018124891-appb-000001
Figure PCTCN2018124891-appb-000001
式(3)中,i=0、1、2、3分别代表名词、动词、形容词、副词,α 0的值为0.3,α 1的值为0.3,α 2的值为0.25,α 3的值为0.15,count i代表参考译文中各类词性的数量,word j表示参考译文中属于某一词性的某一词汇,n=count i-1,sim(word j)表示机器翻译译文中词汇与参考译文中同 类型的词汇的相似度,如果参考译文中某一类词性的数量为零,则
Figure PCTCN2018124891-appb-000002
取为1。
In formula (3), i = 0, 1, 2, and 3 respectively represent nouns, verbs, adjectives, and adverbs. The value of α 0 is 0.3, the value of α 1 is 0.3, the value of α 2 is 0.25, and the value of α 3 0.15, count i represents the number of parts of speech in the reference translation, word j represents a certain word belonging to a certain part of speech in the reference translation, n = count i -1, sim (word j ) represents the words and references in the machine translation The similarity of the same type of words in the translation. If the number of parts of speech in the reference translation is zero, then
Figure PCTCN2018124891-appb-000002
Take it as 1.
实意词相似度PosScore的计算过程举例如下:An example of the calculation process of PosScore for similarity of the intended word is as follows:
原文:The time-of-flight camera calculates the distance of the object based on the measured time.Original: The time-of-flight camera calculates the distance of the object based on the measured time.
参考译文:飞行时间照相机基于所测量的时间来计算对象的距离。Reference: Time-of-Flight cameras calculate the distance of an object based on the measured time.
机器翻译译文:飞行时间相机根据测量的时间计算物体的距离。Machine translation: Time-of-flight cameras calculate distances from objects based on measured time.
参考译文分词结果为:飞行(动词)时间(名词)照相机(名词)基于(介词)所测量的(形容词)时间(名词)来(介词)计算(动词)对象(名词)的(助词)距离(名词)The reference translation participle result is: flight (verb) time (noun) camera (noun) based on (preposition) measured (adjective) time (noun) to calculate (preposition) (verb) object (noun) distance (auxiliary) ( noun)
机器翻译译文分词结果为:飞行(动词)时间(名词)相机(名词)根据(介词)测量的(形容词)时间(名词)计算(动词)物体(名词)的(助词)距离(名词)The result of the machine translation of the machine translation is: flight (verb) time (noun) camera (noun) calculates (verb) object (noun) distance (noun) from (adjective) time (noun) measured by (preposition)
可以计算出名词得分为(1+0.67+1+1)/5=0.734,动词得分为1,形容词得分为0.75,副词得分为1。It can be calculated that the noun score is (1 + 0.67 + 1 + 1) /5=0.734, the verb score is 1, the adjective score is 0.75, and the adverb score is 1.
那么,PosScore得分为0.3*0.734+0.3*1+0.25*0.75+0.15*1=0.86。Then, the PosScore score is 0.3 * 0.734 + 0.3 * 1 + 0.25 * 0.75 + 0.15 * 1 = 0.86.
利用公式(1)计算出来的多个机器翻译译文样本的得分,计算结果更为准确。The scores of multiple machine translation target samples calculated by formula (1) are more accurate.
基于上述实施例的内容,所述将得分最高的机器翻译译文推荐给所述译员的步骤之后,还包括:Based on the content of the above embodiment, after the step of recommending the machine translation with the highest score to the translator, the method further includes:
以所述译员最终确认的翻译稿为参考译文,重新计算推荐给所述译员的机器翻译译文的得分;Recalculate the score of the machine translation recommended to the translator using the translation confirmed by the translator as the reference translation;
根据重新计算的得分和所述机器翻译引擎评估模型输出的得分计算损失函数,通过反向传播算法更新所述机器翻译引擎评估模型。A loss function is calculated according to the recalculated score and the score output by the machine translation engine evaluation model, and the machine translation engine evaluation model is updated by a back propagation algorithm.
具体地,将得分最高的机器翻译译文推荐给译员后,译员根据推荐的机器翻译译文进行编辑,确定最终的翻译稿,该翻译稿即为有专业人工翻译确认过的参考译文,根据该翻译稿可以利用公式(1)重新计算推荐给译员的机器翻译译文的得分,然后根据重新计算得分和所述机器翻译引擎评估模型输出的该机器翻译译文的得分计算损失函数,通过反向传播算法 更新所述机器翻译引擎评估模型。Specifically, after recommending the highest-scoring machine translation to the translator, the translator edits the recommended machine translation to determine the final translation. The translation is the reference translation confirmed by professional human translation. According to the translation, The formula (1) may be used to recalculate the score of the machine translation translation recommended to the translator, and then calculate the loss function according to the recalculation score and the score of the machine translation translation output by the machine translation engine evaluation model, and update the translation function through a back propagation algorithm. The evaluation model of machine translation engine is described.
本公开实施例提供的机器翻译引擎推荐方法可以通过不断学习译员的最终翻译稿,不断更新机器翻译引擎评估模型。The method for recommending a machine translation engine provided by the embodiment of the present disclosure can continuously update the evaluation model of the machine translation engine by continuously learning the final translation of the translator.
如图3所示,为本公开实施例提供的机器翻译引擎推荐装置的结构示意图,包括:翻译模块310、预测模块320和推荐模块330,其中,As shown in FIG. 3, it is a schematic structural diagram of a machine translation engine recommendation device according to an embodiment of the present disclosure, including a translation module 310, a prediction module 320, and a recommendation module 330.
翻译模块310,用于将待译员翻译的原文输入多个不同的机器翻译引擎进行翻译,获得多个机器翻译译文.The translation module 310 is configured to input the original text to be translated by a translator into a plurality of different machine translation engines for translation, and obtain a plurality of machine translation translations.
具体地,现有的机器翻译引擎包括:谷歌、百度、网易有道等,翻译模块310将待翻译的原文输入不同的机器翻译引擎,获得多个机器翻译译文。Specifically, the existing machine translation engines include: Google, Baidu, Netease Youdao, etc. The translation module 310 inputs the original text to be translated into different machine translation engines to obtain multiple machine translation translations.
预测模块320,用于将所述待译员翻译的原文以及所述多个机器翻译译文输入至预先训练好的机器翻译引擎评估模型中,获取所述机器翻译引擎评估模型输出的所述多个机器翻译译文的得分。The prediction module 320 is configured to input the original text to be translated by the translator and the multiple machine translations into a pre-trained machine translation engine evaluation model, and obtain the multiple machines output by the machine translation engine evaluation model. The score of the translation.
具体地,机器翻译引擎评估模型的输入为待译员翻译的原文和所获得的多个机器翻译译文,输出为多个机器翻译译文的得分。Specifically, the input of the machine translation engine evaluation model is the original text to be translated by the translator and the obtained multiple machine translations, and the output is the score of the multiple machine translations.
机器翻译引擎评估模型具有基于原文和对应的多个机器翻译译文对机器翻译译文的得分进行预测的功能。The machine translation engine evaluation model has the function of predicting the score of the machine translation translation based on the original text and the corresponding multiple machine translation translations.
其中,所述机器翻译引擎评估模型是基于原文样本和对应的多个机器翻译译文样本,以及预先确定的所述多个机器翻译译文样本的得分进行训练后获得的。The machine translation engine evaluation model is obtained after training based on a source sample and corresponding multiple machine translation target samples, and a predetermined score of the multiple machine translation target samples.
值得说明的是,机器翻译译文的得分反映了机器翻译译文与原文所对应的参考译文之间的相似度。It is worth noting that the score of the machine translation reflects the similarity between the machine translation and the reference translation corresponding to the original.
推荐模块330,用于将得分最高的机器翻译译文推荐给所述译员;A recommendation module 330, configured to recommend a machine translation with the highest score to the translator;
具体地,推荐模块330对多个机器翻译译文的得分进行比较,得分最高说明机器翻译译文与参考译文的相似度最高,即得分最高所对应的机器翻译译文的质量最高,将得分最高的机器翻译译文推荐给译员,以供译员在此机器翻译译文的基础上进行后编辑,形成最终的翻译稿,从而提升译员翻译质量和翻译效率Specifically, the recommendation module 330 compares the scores of multiple machine translations. The highest score indicates that the machine translation and the reference translation have the highest similarity, that is, the machine translation with the highest score corresponds to the highest quality. The translation is recommended to the translator for the post-editing based on the machine translation, to form the final translation draft, so as to improve the translation quality and efficiency of the translator
本公开实施例提供的机器翻译引擎推荐装置,能够通过大量学习人工 翻译的参考译文和机器翻译译文建立评估模型,通过对待翻译文档的多个机器翻译译文进行评估,为译员推荐更好的机器翻译译文,从而提升译员翻译质量和翻译效率。The machine translation engine recommendation device provided by the embodiment of the present disclosure can build an evaluation model by learning a large number of manual translations of reference translations and machine translation translations, and evaluate multiple machine translation translations of documents to be translated to recommend better machine translations for translators. Translation, thereby improving the quality and efficiency of translators.
基于上述实施例的内容,所述装置还包括训练模块,所述训练模块具体用于:Based on the content of the above embodiments, the device further includes a training module, and the training module is specifically configured to:
通过双语语料库获取原文样本以及所述原文样本对应的参考译文,并调用不同机器翻译引擎获取所述原文样本对应的多个机器翻译译文样本;Obtaining the original sample and the reference translation corresponding to the original sample through a bilingual corpus, and calling different machine translation engines to obtain multiple machine translation target samples corresponding to the original sample;
根据所述原文样本对应的参考译文,计算所述原文样本对应的多个机器翻译译文样本的得分;Calculate scores of multiple machine-translated translation samples corresponding to the original sample according to the reference translation corresponding to the original sample;
构建深度学习网络模型,将所述原文样本以及所述原文样本对应的多个机器翻译译文样本输入所述深度学习网络模型进行训练,根据模型输出的所述多个机器翻译译文样本的得分和计算获得的所述多个机器翻译译文样本的得分计算损失函数,通过反向传播算法更新所述深度学习网络模型的参数,直至满足预设的训练结束条件,保存训练结束时所述深度学习网络模型的参数,获得机器翻译引擎评估模型。Construct a deep learning network model, input the original sample and multiple machine translation target samples corresponding to the original sample into the deep learning network model for training, and calculate the scores and calculations of the multiple machine translation target samples output by the model Calculate a loss function for the scores of the multiple machine translation translation samples obtained, and update the parameters of the deep learning network model by a back-propagation algorithm until a preset training end condition is met, and save the deep learning network model at the end of training Parameters of the machine translation engine evaluation model.
具体地,原文样本对应的参考译文是专业翻译人员翻译的人员,通过现有的双语语料库可以获取到大量的原文以及对应的参考译文。将原文输入到多个不同的机器翻译引擎进行翻译,获得多个机器翻译译文。Specifically, the reference translation corresponding to the original sample is a professional translator. Through the existing bilingual corpus, a large number of original translations and corresponding reference translations can be obtained. The original text is input to multiple different machine translation engines for translation, and multiple machine translation translations are obtained.
机器翻译译文样本的得分用于衡量机器翻译译文与专业人工翻译结果之间的相似度,得分越高说明机器翻译译文越接近专业人工翻译结果,机器翻译译文的质量就越高。因此,机器翻译译文的得分可以通过计算机器翻译译文与参考译文之间的相似度来获得。The score of the machine translation sample is used to measure the similarity between the machine translation and the professional human translation result. The higher the score, the closer the machine translation is to the professional human translation result, the higher the quality of the machine translation. Therefore, the score of the machine translation can be obtained by the similarity between the computer translation and the reference translation.
计算获得多个机器翻译译文样本的得分后,建立样本数据集,其中每个样本包括原文以及对应的多个机器翻译译文作为输入,多个机器翻译译文的得分作为目标输出。After calculating the scores of multiple machine translation translation samples, a sample data set is established, where each sample includes the original text and corresponding multiple machine translation translations as inputs, and the scores of multiple machine translation translations are used as target outputs.
训练模块基于所获取到的样本集,构建深度学习网络模型,初始化深度学习网络模型的参数,将原文样本和对应的多个机器翻译译文样本作为深度学习网络模型的输入,将计算获得的多个机器翻译译文的得分作为模型输出的目标,开始对深度学习网络网络进行训练,并根据模型输出的多 个机器翻译译文样本的预测得分和所述多个机器翻译译文的目标得分计算损失函数,通过反向传播算法更新模型参数,直至满足预先设置的训练结束条件,训练结束后,获得机器翻译引擎评估模型。The training module builds a deep learning network model based on the acquired sample set, initializes the parameters of the deep learning network model, and uses the original sample and corresponding multiple machine translation translation samples as input to the deep learning network model. The score of the machine translation translation is used as the target of the model output. The deep learning network is trained and the loss function is calculated based on the predicted scores of the multiple machine translation translation samples output by the model and the target scores of the multiple machine translation translations. The back-propagation algorithm updates the model parameters until the preset training end conditions are met. After training, a machine translation engine evaluation model is obtained.
基于上述实施例的内容,所述训练模块具体用于:Based on the content of the foregoing embodiment, the training module is specifically configured to:
根据所述原文样本对应的参考译文,应用如下公式计算所述原文样本对应的多个机器翻译译文样本的得分:According to the reference translation corresponding to the original sample, the following formula is used to calculate the scores of multiple machine translation target samples corresponding to the original sample:
Score=(BLEU+Similarity+PosScore)/3  (1),Score = (BLEU + Similarity + PosScore) / 3 (1),
其中,BLEU表示机器翻译译文与参考译文之间的BLEU得分,Similarity表示机器翻译译文与参考译文之间的编辑距离相似度,PosScore表示机器翻译译文与参考译文之间的实意词相似度。Among them, BLEU represents the BLEU score between the machine translation and the reference translation, Similarity represents the editing distance similarity between the machine translation and the reference translation, and PosScore represents the similarity of the meaning word between the machine translation and the reference translation.
具体地,BLEU(bilingual evaluation understudy,双语互译质量辅助工具)是一个常用的用于衡量机器翻译文本的指标。但是,BLEU的准确度不太好。因此,在本公开实施例中将BLEU与字符串编辑距离相似度和实意词相似度结合起来共同衡量机器翻译译文的质量。Specifically, BLEU (bilingual evaluation understudy) is a commonly used index for measuring machine translation text. However, the accuracy of BLEU is not very good. Therefore, in the embodiment of the present disclosure, the BLEU is combined with the string editing distance similarity and the actual word similarity to measure the quality of the machine translation.
编辑距离相似度计算公式为:The formula for calculating distance similarity is:
Similarity=(Max(x,y)-Levenshtein(x,y))/Max(x,y)  (2)Similarity = (Max (x, y) -Levenshtein (x, y)) / Max (x, y) (2)
式(2)中x表示译员参考译文,y表示机器翻译译文,Max(x,y)表示x和y间的最大长度,Levenshtein(x,y)表示x和y之间的Levenshtein距离;In Equation (2), x represents the translator's reference translation, y represents the machine translation, Max (x, y) represents the maximum length between x and y, and Levenshtein (x, y) represents the Levenshtein distance between x and y;
实意词相似度计算公式为:The calculation formula for similarity of intentional words is:
Figure PCTCN2018124891-appb-000003
Figure PCTCN2018124891-appb-000003
式(3)中,i=0、1、2、3分别代表名词、动词、形容词、副词,α 0的值为0.3,α 1的值为0.3,α 2的值为0.25,α 3的值为0.15,count i代表参考译文中各类词性的数量,word j表示参考译文中属于某一词性的某一词汇,n=count i-1,sim(word j)表示机器翻译译文中词汇与参考译文中同类型的词汇的相似度,如果参考译文中某一类词性的数量为零,则
Figure PCTCN2018124891-appb-000004
取为1。
In formula (3), i = 0, 1, 2, and 3 respectively represent nouns, verbs, adjectives, and adverbs. The value of α 0 is 0.3, the value of α 1 is 0.3, the value of α 2 is 0.25, and the value of α 3 0.15, count i represents the number of parts of speech in the reference translation, word j represents a certain word belonging to a certain part of speech in the reference translation, n = count i -1, sim (word j ) represents the words and references in the machine translation The similarity of the same type of words in the translation. If the number of parts of speech in the reference translation is zero, then
Figure PCTCN2018124891-appb-000004
Take it as 1.
基于上述各实施例的内容,所述装置还包括更新模块,所述更新模块具体用于:Based on the contents of the foregoing embodiments, the apparatus further includes an update module, and the update module is specifically configured to:
以所述译员最终确认的翻译稿为参考译文,重新计算推荐给所述译员 的机器翻译译文的得分;Recalculate the score of the machine translation recommended to the translator by using the translation finally confirmed by the translator as a reference translation;
根据重新计算的得分和所述机器翻译引擎评估模型输出的得分计算损失函数,通过反向传播算法更新所述机器翻译引擎评估模型。A loss function is calculated according to the recalculated score and the score output by the machine translation engine evaluation model, and the machine translation engine evaluation model is updated by a back propagation algorithm.
具体地,将得分最高的机器翻译译文推荐给译员后,译员根据推荐的机器翻译译文进行编辑,确定最终的翻译稿,该翻译稿即为有专业人工翻译确认过的参考译文,更新模块根据该翻译稿可以利用公式(1)重新计算推荐给译员的机器翻译译文的得分,然后根据重新计算得分和所述机器翻译引擎评估模型输出的该机器翻译译文的得分计算损失函数,通过反向传播算法更新所述机器翻译引擎评估模型。Specifically, after recommending the highest-scoring machine translation to the translator, the translator edits the recommended machine translation to determine the final translation. The translation is the reference translation confirmed by professional human translation. The update module is based on the The translation manuscript can use formula (1) to recalculate the score of the machine translation translation recommended to the translator, and then calculate the loss function based on the recalculation score and the score of the machine translation translation output by the machine translation engine evaluation model, and use the back propagation algorithm Update the machine translation engine evaluation model.
图4为本公开实施例提供的电子设备的实体结构示意图,如图4所示,该电子设备可以包括:处理器(processor)410、通信接口(Communications Interface)420、存储器(memory)430和通信总线440,其中,处理器410,通信接口420,存储器430通过通信总线440完成相互间的通信。处理器410可以调用存储在存储器430上并可在处理器410上运行的计算机程序,以执行上述各实施例提供的机器翻译引擎推荐方法,例如包括:将待译员翻译的原文输入多个不同的机器翻译引擎进行翻译,获得多个机器翻译译文;将所述待译员翻译的原文以及所述多个机器翻译译文输入至预先训练好的机器翻译引擎评估模型中,获取所述机器翻译引擎评估模型输出的所述多个机器翻译译文的得分;将得分最高的机器翻译译文推荐给所述译员;其中,所述机器翻译引擎评估模型是基于原文样本和对应的多个机器翻译译文样本,以及预先确定的所述多个机器翻译译文样本的得分进行训练后获得的。FIG. 4 is a schematic diagram of a physical structure of an electronic device according to an embodiment of the present disclosure. As shown in FIG. 4, the electronic device may include a processor 410, a communications interface 420, a memory 430, and communication. The bus 440, wherein the processor 410, the communication interface 420, and the memory 430 complete communication with each other through the communication bus 440. The processor 410 may call a computer program stored on the memory 430 and run on the processor 410 to execute the method of recommending a machine translation engine provided by the foregoing embodiments, for example, including: inputting a text to be translated by a translator into a plurality of different The machine translation engine performs translation to obtain multiple machine translation translations; the original text to be translated by the translator and the multiple machine translation translations are input into a pre-trained machine translation engine evaluation model to obtain the machine translation engine evaluation model Output the scores of the multiple machine translations; recommend the highest scoring machine translation to the translator; wherein the machine translation engine evaluation model is based on the original sample and the corresponding multiple machine translation translation samples, and The determined scores of the plurality of machine translation translation samples are obtained after training.
此外,上述的存储器430中的逻辑指令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本公开实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本公开各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U 盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。In addition, the logic instructions in the memory 430 may be implemented in the form of a software functional unit and sold or used as an independent product, and may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the embodiments of the present disclosure is essentially a part that contributes to the existing technology or a part of the technical solution may be embodied in the form of a software product, which is stored in a storage medium. Including a plurality of instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in various embodiments of the present disclosure. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disks or optical disks and other media that can store program codes .
本公开实施例还提供一种非暂态计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现上述各实施例提供的机器翻译引擎推荐方法,例如包括:将待译员翻译的原文输入多个不同的机器翻译引擎进行翻译,获得多个机器翻译译文;将所述待译员翻译的原文以及所述多个机器翻译译文输入至预先训练好的机器翻译引擎评估模型中,获取所述机器翻译引擎评估模型输出的所述多个机器翻译译文的得分;将得分最高的机器翻译译文推荐给所述译员;其中,所述机器翻译引擎评估模型是基于原文样本和对应的多个机器翻译译文样本,以及预先确定的所述多个机器翻译译文样本的得分进行训练后获得的。An embodiment of the present disclosure also provides a non-transitory computer-readable storage medium on which a computer program is stored. When the computer program is executed by a processor, the method for recommending a machine translation engine provided by the foregoing embodiments is implemented, for example: The original text translated by the translator is input into multiple different machine translation engines for translation, and multiple machine translation translations are obtained; the original text to be translated by the translator and the multiple machine translation translations are input into a pre-trained machine translation engine evaluation model To obtain the scores of the multiple machine translation translations output by the machine translation engine evaluation model; recommend the machine translation translation with the highest score to the translator; wherein the machine translation engine evaluation model is based on the original sample and the corresponding Multiple machine translation target samples and a predetermined score of the multiple machine translation target samples obtained after training.
以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下,即可以理解并实施。The device embodiments described above are only schematic, wherein the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, may be located One place, or it can be distributed across multiple network elements. Some or all of the modules may be selected according to actual needs to achieve the objective of the solution of this embodiment. Those of ordinary skill in the art can understand and implement without creative labor.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件。基于这样的理解,上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行各个实施例或者实施例的某些部分所述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus a necessary universal hardware platform, and of course, also by hardware. Based on such an understanding, the above-mentioned technical solution essentially or part that contributes to the existing technology can be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as ROM / RAM, magnetic A disc, an optical disc, and the like include several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in various embodiments or certain parts of the embodiments.
最后应说明的是:以上实施例仅用以说明本公开的技术方案,而非对其限制;尽管参照前述实施例对本公开进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本公开各实施例技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present disclosure, but not limited thereto. Although the present disclosure has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that they can still Modifications to the technical solutions described in the foregoing embodiments, or equivalent replacements of some of the technical features thereof; and these modifications or replacements do not depart the essence of the corresponding technical solutions from the spirit and scope of the technical solutions of the embodiments of the present disclosure.

Claims (10)

  1. 一种机器翻译引擎推荐方法,其特征在于,包括:A method for recommending a machine translation engine, comprising:
    将待译员翻译的原文输入多个不同的机器翻译引擎进行翻译,获得多个机器翻译译文;Input the original text to be translated by a translator into multiple different machine translation engines for translation and obtain multiple machine translation translations;
    将所述待译员翻译的原文以及所述多个机器翻译译文输入至预先训练好的机器翻译引擎评估模型中,获取所述机器翻译引擎评估模型输出的所述多个机器翻译译文的得分;Inputting the original text to be translated by the translator and the plurality of machine translation translations into a pre-trained machine translation engine evaluation model to obtain scores of the plurality of machine translation translations output by the machine translation engine evaluation model;
    将得分最高的机器翻译译文推荐给所述译员;Recommending the highest-scoring machine translation to the translator;
    其中,所述机器翻译引擎评估模型是基于原文样本和对应的多个机器翻译译文样本,以及预先确定的所述多个机器翻译译文样本的得分进行训练后获得的。The machine translation engine evaluation model is obtained after training based on a source sample and corresponding multiple machine translation target samples, and a predetermined score of the multiple machine translation target samples.
  2. 根据权利要求1所述的方法,其特征在于,所述机器翻译引擎评估模型采用如下方法训练得到:The method according to claim 1, wherein the machine translation engine evaluation model is obtained by training as follows:
    通过双语语料库获取原文样本以及所述原文样本对应的参考译文,并调用不同机器翻译引擎获取所述原文样本对应的多个机器翻译译文样本;Obtaining the original sample and the reference translation corresponding to the original sample through a bilingual corpus, and calling different machine translation engines to obtain multiple machine translation target samples corresponding to the original sample;
    根据所述原文样本对应的参考译文,计算所述原文样本对应的多个机器翻译译文样本的得分;Calculate scores of multiple machine-translated translation samples corresponding to the original sample according to the reference translation corresponding to the original sample;
    构建深度学习网络模型,将所述原文样本以及所述原文样本对应的多个机器翻译译文样本输入所述深度学习网络模型进行训练,根据模型输出的所述多个机器翻译译文样本的得分和计算获得的所述多个机器翻译译文样本的得分计算损失函数,通过反向传播算法更新所述深度学习网络模型的参数,直至满足预设的训练结束条件,保存训练结束时所述深度学习网络模型的参数,获得机器翻译引擎评估模型。Construct a deep learning network model, input the original sample and multiple machine translation target samples corresponding to the original sample into the deep learning network model for training, and calculate the scores and calculations of the multiple machine translation target samples output by the model Calculate a loss function for the scores of the multiple machine translation translation samples obtained, and update the parameters of the deep learning network model by a back-propagation algorithm until a preset training end condition is met, and save the deep learning network model at the end of training Parameters of the machine translation engine evaluation model.
  3. 根据权利要求2所述的方法,其特征在于,所述根据所述原文样本对应的参考译文,计算所述原文样本对应的多个机器翻译译文样本的得分的步骤,具体为:The method according to claim 2, wherein the step of calculating the scores of the plurality of machine translation target samples corresponding to the source sample according to the reference target corresponding to the source sample is specifically:
    根据所述原文样本对应的参考译文,应用如下公式计算所述原文样本对应的多个机器翻译译文样本的得分:According to the reference translation corresponding to the original sample, the following formula is used to calculate the scores of multiple machine translation target samples corresponding to the original sample:
    Score=(BLEU+Similarity+PosScore)/3,Score = (BLEU + Similarity + PosScore) / 3,
    其中,BLEU表示机器翻译译文与参考译文之间的BLEU得分, Similarity表示机器翻译译文与参考译文之间的编辑距离相似度,PosScore表示机器翻译译文与参考译文之间的实意词相似度。Among them, BLEU indicates the BLEU score between the machine translation and the reference translation, Similarity indicates the editing distance similarity between the machine translation and the reference translation, and PosScore indicates the similarity of the meaning word between the machine translation and the reference translation.
  4. 根据权利要求1所述的方法,其特征在于,所述将得分最高的机器翻译译文推荐给所述译员的步骤之后,还包括:The method according to claim 1, wherein after the step of recommending the highest-scoring machine translation to the translator, the method further comprises:
    以所述译员最终确认的翻译稿为参考译文,重新计算推荐给所述译员的机器翻译译文的得分;Recalculate the score of the machine translation recommended to the translator using the translation confirmed by the translator as the reference translation;
    根据重新计算的得分和所述机器翻译引擎评估模型输出的得分计算损失函数,通过反向传播算法更新所述机器翻译引擎评估模型。A loss function is calculated according to the recalculated score and the score output by the machine translation engine evaluation model, and the machine translation engine evaluation model is updated by a back propagation algorithm.
  5. 一种机器翻译引擎推荐装置,其特征在于,包括:A machine translation engine recommendation device, comprising:
    翻译模块,用于将待译员翻译的原文输入多个不同的机器翻译引擎进行翻译,获得多个机器翻译译文;A translation module, which is used to input the original text to be translated by a translator into multiple different machine translation engines for translation, and obtain multiple machine translation translations;
    预测模块,用于将所述待译员翻译的原文以及所述多个机器翻译译文输入至预先训练好的机器翻译引擎评估模型中,获取所述机器翻译引擎评估模型输出的所述多个机器翻译译文的得分;A prediction module, configured to input the original text to be translated by the translator and the multiple machine translations into a pre-trained machine translation engine evaluation model, and obtain the multiple machine translations output by the machine translation engine evaluation model The score of the translation;
    推荐模块,用于将得分最高的机器翻译译文推荐给所述译员;A recommendation module for recommending the highest-scoring machine translation to the translator;
    其中,所述机器翻译引擎评估模型是基于原文样本和对应的多个机器翻译译文样本,以及预先确定的所述多个机器翻译译文样本的得分进行训练后获得的。The machine translation engine evaluation model is obtained after training based on a source sample and corresponding multiple machine translation target samples, and a predetermined score of the multiple machine translation target samples.
  6. 根据权利要求5所述的装置,其特征在于,还包括训练模块,所述训练模块具体用于:The device according to claim 5, further comprising a training module, the training module is specifically configured to:
    通过双语语料库获取原文样本以及所述原文样本对应的参考译文,并调用不同机器翻译引擎获取所述原文样本对应的多个机器翻译译文样本;Obtaining the original sample and the reference translation corresponding to the original sample through a bilingual corpus, and calling different machine translation engines to obtain multiple machine translation target samples corresponding to the original sample;
    根据所述原文样本对应的参考译文,计算所述原文样本对应的多个机器翻译译文样本的得分;Calculate scores of multiple machine-translated translation samples corresponding to the original sample according to the reference translation corresponding to the original sample;
    构建深度学习网络模型,将所述原文样本以及所述原文样本对应的多个机器翻译译文样本输入所述深度学习网络模型进行训练,根据模型输出的所述多个机器翻译译文样本的得分和计算获得的所述多个机器翻译译文样本的得分计算损失函数,通过反向传播算法更新所述深度学习网络模型的参数,直至满足预设的训练结束条件,保存训练结束时所述深度学习网络模型的参数,获得机器翻译引擎评估模型。Construct a deep learning network model, input the original sample and multiple machine translation target samples corresponding to the original sample into the deep learning network model for training, and calculate the scores and calculations of the multiple machine translation target samples output by the model Calculate a loss function for the scores of the multiple machine translation translation samples obtained, and update the parameters of the deep learning network model by a back-propagation algorithm until a preset training end condition is met, and save the deep learning network model at the end of training Parameters of the machine translation engine evaluation model.
  7. 根据权利要求5所述的装置,其特征在于,所述训练模块具体用于:The apparatus according to claim 5, wherein the training module is specifically configured to:
    根据所述原文样本对应的参考译文,应用如下公式计算所述原文样本对应的多个机器翻译译文样本的得分:According to the reference translation corresponding to the original sample, the following formula is used to calculate the scores of multiple machine translation target samples corresponding to the original sample:
    Score=(BLEU+Similarity+PosScore)/3,Score = (BLEU + Similarity + PosScore) / 3,
    其中,BLEU表示机器翻译译文与参考译文之间的BLEU得分,Similarity表示机器翻译译文与参考译文之间的编辑距离相似度,PosScore表示机器翻译译文与参考译文之间的实意词相似度。Among them, BLEU represents the BLEU score between the machine translation and the reference translation, Similarity represents the editing distance similarity between the machine translation and the reference translation, and PosScore represents the similarity of the meaning word between the machine translation and the reference translation.
  8. 根据权利要求5所述的装置,其特征在于,还包括更新模块,所述更新模块具体用于:The apparatus according to claim 5, further comprising an update module, wherein the update module is specifically configured to:
    以所述译员最终确认的翻译稿为参考译文,重新计算推荐给所述译员的机器翻译译文的得分;Recalculate the score of the machine translation recommended to the translator using the translation confirmed by the translator as the reference translation;
    根据重新计算的得分和所述机器翻译引擎评估模型输出的得分计算损失函数,通过反向传播算法更新所述机器翻译引擎评估模型。A loss function is calculated according to the recalculated score and the score output by the machine translation engine evaluation model, and the machine translation engine evaluation model is updated by a back propagation algorithm.
  9. 一种电子设备,其特征在于,包括:An electronic device, comprising:
    至少一个处理器;以及At least one processor; and
    与所述处理器通信连接的至少一个存储器,其中:At least one memory connected in communication with the processor, wherein:
    所述存储器存储有可被所述处理器执行的程序指令,所述处理器调用所述程序指令能够执行如权利要求1至4任一所述的方法。The memory stores program instructions executable by the processor, and the processor invokes the program instructions to perform the method according to any one of claims 1 to 4.
  10. 一种非暂态计算机可读存储介质,其特征在于,所述非暂态计算机可读存储介质存储计算机指令,所述计算机指令使所述计算机执行如权利要求1至4任一所述的方法。A non-transitory computer-readable storage medium, wherein the non-transitory computer-readable storage medium stores computer instructions, and the computer instructions cause the computer to execute the method according to any one of claims 1 to 4. .
PCT/CN2018/124891 2018-09-19 2018-12-28 Machine translation engine recommendation method and apparatus WO2020057001A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811095799.1 2018-09-19
CN201811095799.1A CN109299737B (en) 2018-09-19 2018-09-19 Translator gene selection method and device and electronic equipment

Publications (1)

Publication Number Publication Date
WO2020057001A1 true WO2020057001A1 (en) 2020-03-26

Family

ID=65163510

Family Applications (2)

Application Number Title Priority Date Filing Date
PCT/CN2018/124891 WO2020057001A1 (en) 2018-09-19 2018-12-28 Machine translation engine recommendation method and apparatus
PCT/CN2018/124951 WO2020057003A1 (en) 2018-09-19 2018-12-28 Translator gene selection method and apparatus, and electronic device

Family Applications After (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/124951 WO2020057003A1 (en) 2018-09-19 2018-12-28 Translator gene selection method and apparatus, and electronic device

Country Status (2)

Country Link
CN (1) CN109299737B (en)
WO (2) WO2020057001A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8775155B2 (en) * 2010-10-25 2014-07-08 Xerox Corporation Machine translation using overlapping biphrase alignments and sampling
US20160188576A1 (en) * 2014-12-30 2016-06-30 Facebook, Inc. Machine translation output reranking
CN106021239A (en) * 2016-04-29 2016-10-12 北京创鑫旅程网络技术有限公司 Method for real-time evaluation of translation quality
CN106776583A (en) * 2015-11-24 2017-05-31 株式会社Ntt都科摩 Machine translation evaluation method and apparatus and machine translation method and equipment
CN107480147A (en) * 2017-08-15 2017-12-15 中译语通科技(北京)有限公司 A kind of method and system of comparative evaluation's machine translation system

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100793990B1 (en) * 2006-09-18 2008-01-16 삼성전자주식회사 Method and system for early z test for tile-based 3d rendering
CN102103612A (en) * 2009-12-22 2011-06-22 北大方正集团有限公司 Information extraction method and device
US9619463B2 (en) * 2012-11-14 2017-04-11 International Business Machines Corporation Document decomposition into parts based upon translation complexity for translation assignment and execution
CN103064970B (en) * 2012-12-31 2016-04-20 武汉传神信息技术有限公司 Optimize the search method of interpreter
CN103092827B (en) * 2012-12-31 2016-08-17 武汉传神信息技术有限公司 The method of many strategy interpreter's contribution Auto-matchings
CN103729349A (en) * 2013-12-23 2014-04-16 武汉传神信息技术有限公司 Analyzing method for affecting factors on translation quality
CN104537009B (en) * 2014-12-17 2017-09-29 武汉传神信息技术有限公司 Interpreter recommends method and device
CN105138521B (en) * 2015-08-27 2017-12-22 武汉传神信息技术有限公司 A kind of translation industry risk project general recommendations interpreter's method
CN105279147B (en) * 2015-09-29 2018-02-23 语联网(武汉)信息技术有限公司 A kind of interpreter's contribution fast matching method
CN106844303A (en) * 2016-12-23 2017-06-13 语联网(武汉)信息技术有限公司 A kind of is to treat the method that manuscript of a translation part matches interpreter based on similarity mode algorithm
CN106844304A (en) * 2016-12-26 2017-06-13 语联网(武汉)信息技术有限公司 It is a kind of to be categorized as treating the method that manuscript of a translation part matches interpreter based on the manuscript of a translation
CN108538284A (en) * 2017-03-06 2018-09-14 北京搜狗科技发展有限公司 Simultaneous interpretation result shows method and device, simultaneous interpreting method and device
CN107016131A (en) * 2017-05-19 2017-08-04 北方工业大学 Machine learning algorithm based on enhanced clustering and application of algorithm
CN107357783B (en) * 2017-07-04 2020-06-12 桂林电子科技大学 English translation quality analysis method for translating Chinese into English

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8775155B2 (en) * 2010-10-25 2014-07-08 Xerox Corporation Machine translation using overlapping biphrase alignments and sampling
US20160188576A1 (en) * 2014-12-30 2016-06-30 Facebook, Inc. Machine translation output reranking
CN106776583A (en) * 2015-11-24 2017-05-31 株式会社Ntt都科摩 Machine translation evaluation method and apparatus and machine translation method and equipment
CN106021239A (en) * 2016-04-29 2016-10-12 北京创鑫旅程网络技术有限公司 Method for real-time evaluation of translation quality
CN107480147A (en) * 2017-08-15 2017-12-15 中译语通科技(北京)有限公司 A kind of method and system of comparative evaluation's machine translation system

Also Published As

Publication number Publication date
CN109299737A (en) 2019-02-01
CN109299737B (en) 2021-10-26
WO2020057003A1 (en) 2020-03-26

Similar Documents

Publication Publication Date Title
US9367541B1 (en) Terminological adaptation of statistical machine translation system through automatic generation of phrasal contexts for bilingual terms
CN109710948A (en) MT engine recommended method and device
US20230162723A1 (en) Text data processing method and apparatus
US20170185581A1 (en) Systems and methods for suggesting emoji
CN109299280B (en) Short text clustering analysis method and device and terminal equipment
WO2021073254A1 (en) Knowledge graph-based entity linking method and apparatus, device, and storage medium
CN110457708B (en) Vocabulary mining method and device based on artificial intelligence, server and storage medium
US20140149102A1 (en) Personalized machine translation via online adaptation
EP3203383A1 (en) Text generation system
US20150170051A1 (en) Applying a Genetic Algorithm to Compositional Semantics Sentiment Analysis to Improve Performance and Accelerate Domain Adaptation
US11455335B2 (en) Image retrieval using interactive natural language dialog
JP2014235740A (en) Confidence-driven rewriting of source texts for improved translation
WO2021139266A1 (en) Fine-tuning method and apparatus for external knowledge-fusing bert model, and computer device
CN109783806B (en) Text matching method utilizing semantic parsing structure
US20150106079A1 (en) Ontology-driven annotation confidence levels for natural language processing
CN107870901A (en) Similar literary method, program, device and system are generated from translation source original text
CN110678868B (en) Translation support system, translation support apparatus, translation support method, and computer-readable medium
US20220058349A1 (en) Data processing method, device, and storage medium
Chen et al. Chinese zero pronoun resolution: An unsupervised approach combining ranking and integer linear programming
CN114144774A (en) Question-answering system
JP2023002690A (en) Semantics recognition method, apparatus, electronic device, and storage medium
CN116150621A (en) Training method, device and equipment for text model
AU2020343670B2 (en) Automatic preprocessing for black box translation
WO2023124837A1 (en) Inquiry processing method and apparatus, device, and storage medium
CN111144134B (en) OpenKiwi-based automatic evaluation system for translation engine

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18933907

Country of ref document: EP

Kind code of ref document: A1