WO2024032691A1

WO2024032691A1 - Machine translation quality assessment method and apparatus, device, and storage medium

Info

Publication number: WO2024032691A1
Application number: PCT/CN2023/112135
Authority: WO
Inventors: 陶大程; 丁亮; 陆清屿
Original assignee: 京东科技信息技术有限公司
Priority date: 2022-08-12
Filing date: 2023-08-10
Publication date: 2024-02-15
Also published as: CN115310460A

Abstract

Disclosed in embodiments of the present application are a machine translation quality assessment method and apparatus, a device, and a storage medium, which are applied to the technical field of natural language processing. The method comprises: obtaining a translation text pair to be assessed, the translation text pair comprising a source text corresponding to a source language and a translated target text corresponding to a target language; on the basis of at least two quality assessment indexes and the source text, performing quality assessment on the target text, and determining an assessment result corresponding to each quality assessment index; on the basis of the language similarity between the source language and the target language, determining an assessment weight corresponding to each quality assessment index; and on the basis of at least two assessment weights, fusing at least two assessment results, and determining a target assessment result of the translation text pair.

Description

A machine translation quality assessment method, device, equipment and storage medium

This application claims priority to the Chinese patent application with application number 202210970061.5, which was submitted to the China Patent Office on August 12, 2022. The entire content of this application is incorporated into this application by reference.

Technical field

The embodiments of this application relate to the technical field of natural language processing, for example, to a machine translation quality assessment method, device, equipment and storage medium.

Background technique

With the rapid development of computer technology, it is often necessary to evaluate the quality of texts translated using machine translation models.

Currently, the quality of the translated text can be assessed based on the translation quality assessment model. For example, an evaluation model trained based on sentence-level annotation data evaluates the quality of translated text, and the obtained index evaluation results tend to characterize the overall fluency of the translated text. Alternatively, the evaluation model obtained by training based on word-level annotation data is used to evaluate the quality of the translated text, and the obtained index evaluation results are biased towards characterizing the fidelity of the translated text.

However, there are at least the following problems in related technologies:

The index evaluation results obtained by using each translation quality evaluation model can only be biased to evaluate the quality of a single level of the translated text, such as the overall fluency or fidelity of the translated text. It cannot comprehensively evaluate the translation quality, and the translation of different language pairs cannot be comprehensively evaluated. The texts all use the same evaluation method, which leads to large differences in the accuracy of translation evaluations for different language pairs.

Contents of the invention

Embodiments of the present application provide a machine translation quality assessment method, device, equipment and storage medium to comprehensively assess translation quality and ensure the accuracy of translation assessment for different language pairs.

In the first aspect, embodiments of the present application provide a machine translation quality assessment method, including:

Obtain a translation text pair to be evaluated, the translation text pair includes a source text corresponding to the source language and a translated target text corresponding to the target language;

Perform a quality assessment on the target text based on at least two quality assessment indicators and the source text, and determine the assessment results corresponding to each of the quality assessment indicators;

Based on the language similarity between the source language and the target language, determine the evaluation weight corresponding to each of the quality evaluation indicators;

Based on at least two evaluation weights, at least two evaluation results are fused to determine the target evaluation result of the translated text pair.

In a second aspect, embodiments of the present application also provide a machine translation quality assessment device, including:

The translation text pair acquisition module is configured to acquire a translation text pair to be evaluated, where the translation text pair includes a source text corresponding to the source language and a translated target text corresponding to the target language;

An evaluation result determination module is configured to perform a quality evaluation on the target text based on at least two quality evaluation indicators and the source text, and determine the evaluation results corresponding to each of the quality evaluation indicators;

An evaluation weight determination module is configured to determine the evaluation weight corresponding to each of the quality evaluation indicators based on the language similarity between the source language and the target language;

The evaluation result fusion module is configured to perform fusion processing on at least two evaluation results based on at least two evaluation weights, and determine the target evaluation result of the translated text pair.

In a third aspect, embodiments of the present application further provide an electronic device, where the electronic device includes:

at least one processor;

a memory configured to store at least one program;

When the at least one program is executed by the at least one processor, the at least one processor is caused to implement the machine translation quality evaluation method as provided in any embodiment of the present application.

In a fourth aspect, embodiments of the present application also provide a computer-readable storage medium on which a computer program is stored. When the program is executed by a processor, the machine translation quality assessment method as provided in any embodiment of the present application is implemented.

Description of drawings

Figure 1 is a flow chart of a machine translation quality assessment method provided by an embodiment of the present application;

Figure 2 is a flow chart of another machine translation quality assessment method provided by an embodiment of the present application;

Figure 3 is a schematic structural diagram of a machine translation quality assessment device provided by an embodiment of the present application;

FIG. 4 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.

Detailed ways

The present application will be described in detail below with reference to the accompanying drawings and examples.

Figure 1 is a flow chart of a machine translation quality assessment method provided by an embodiment of the present application. This embodiment can be applied to the situation of evaluating the quality of text translated by a machine translation model. The method can be performed by a machine translation quality assessment device, which can be implemented in software and/or hardware and integrated into an electronic device. As shown in Figure 1, the method includes the following steps:

S110. Obtain a translation text pair to be evaluated. The translation text pair includes a source text corresponding to the source language and a translated target text corresponding to the target language.

Among them, the source language may refer to the language to be translated. The target language refers to the translated language. The source text may refer to the original text expressed in the source language, that is, the sentence to be translated. The target text can refer to the translation that expresses the same meaning as the source text in the target language, that is, the translated sentence.

For example, the source text can be input into the machine translation model for translation, and the target text output by the machine translation model can be obtained, thereby obtaining the translated text pair to be evaluated.

S120. Based on at least two quality assessment indicators and the source text, perform a quality assessment on the target text, and determine the assessment results corresponding to each quality assessment indicator.

Among them, the quality evaluation index may be an index used to evaluate the translation quality of the target text. Different types of quality assessment indicators tend to evaluate different levels of translation. For example, the quality evaluation index may be, but is not limited to: a fluency evaluation index that is biased to evaluate the overall fluency of the target text, or a fidelity evaluation index that is biased to evaluate the fidelity of the target text. Among them, the fluency evaluation index can be used to characterize the overall fluency of the translated text and whether it conforms to the presentation habits and other information. Fidelity evaluation metrics can be used to It indicates whether the details in the translated text faithfully reflect the meaning of the original text, that is, it evaluates details such as mistranslations, omissions, and emotional errors in the translation. Each quality evaluation index can correspond to at least one quality evaluation model, so that the evaluation result corresponding to the quality evaluation index is determined using at least one quality evaluation model. This embodiment can use a scoring method to indicate the evaluation results. For example, the larger the score in the evaluation result, the higher the quality level corresponding to the quality evaluation indicator, for example, the higher the fluency or the higher the fidelity.

For example, at least two different quality assessment indicators can be selected based on business needs. For each quality assessment indicator, the quality of the target text can be assessed based on at least one quality assessment model corresponding to the quality assessment indicator, and the quality assessment index can be determined. Output the evaluation results corresponding to this quality evaluation indicator. For example, if this kind of quality assessment indicator corresponds to multiple quality assessment models, you can randomly select a quality assessment model from the multiple quality assessment models, conduct a quality assessment on the target text based on the selected quality assessment model and the source text, and obtain The evaluation results are used as the evaluation results corresponding to the quality evaluation index; the quality of the target text can also be evaluated based on each quality evaluation model and source text, and the multiple evaluation results obtained are averaged to obtain the quality The average evaluation results corresponding to the evaluation indicators are used to improve the accuracy of quality evaluation.

S130. Based on the language similarity between the source language and the target language, determine the evaluation weight corresponding to each quality evaluation index.

Among them, language similarity can refer to the linguistic similarity between the source language and the target language in terms of language family, vocabulary and grammatical structure.

For example, the language similarity between each two languages can be determined in advance, so that the language similarity between the source language and the target language can be directly obtained, or the language similarity between the source language and the target language can be determined in real time. . Based on the language similarity between the source language and the target language, the optimal evaluation weights corresponding to different quality evaluation indicators can be determined. That is, different evaluation weights can be determined based on different language pairs, so that the differences between different language pairs can be taken into account. This method can effectively avoid the situation where the accuracy of translation evaluation of different language pairs is greatly different, thereby ensuring the accuracy of translation evaluation of different language pairs and improving the versatility of quality translation evaluation.

S140. Based on at least two evaluation weights, perform fusion processing on at least two evaluation results to determine the target evaluation result of the translated text pair.

For example, the evaluation results corresponding to each quality evaluation index and the corresponding evaluation weights can be multiplied, and the multiplication results can be added together, and the weighted average result obtained can be used as the target evaluation result, so that at least two types of evaluation results can be fused. The evaluation results corresponding to the quality evaluation indicators comprehensively evaluate the translation quality, avoiding the bias caused by a single evaluation indicator in evaluating the translated text, thereby improving the accuracy and robustness of the quality assessment.

The technical solution of this embodiment is to perform quality assessment on the target text based on at least two quality assessment indicators and the source text in the translated text pair to be assessed, determine the assessment results corresponding to each quality assessment indicator, and evaluate the target text based on the source language and the source text. The language similarity between the target languages determines the evaluation weight corresponding to each quality evaluation index, so as to fuse at least two evaluation results based on at least two evaluation weights to determine the target evaluation result of the translated text pair, so that different evaluation results can be combined The evaluation results corresponding to at least two quality evaluation indicators are fused to comprehensively evaluate the translation quality to avoid biased evaluation results, and each evaluation weight is determined based on the language similarity between the source language and the target language, so that it can be considered It can effectively avoid the situation where the translation evaluation accuracy of different language pairs differs greatly, thereby ensuring the accuracy of translation evaluation of different language pairs.

Based on the above technical solution, S130 may include: inputting the language similarity between the source language and the target language into a preset network model. The preset network model is obtained by pre-training the data and label evaluation results based on translation samples. ; Based on the output of the preset network model, determine the evaluation weight corresponding to each quality evaluation indicator.

Among them, the preset network model can be set to represent the mapping relationship between the optimal evaluation weight corresponding to each quality evaluation index and the language similarity. This mapping relationship can be obtained by learning the data and label evaluation results based on translation samples. For example, quality assessment can be performed on the target sample text in the translation sample pair data based on at least two quality assessment indicators and the source sample text in the translation sample pair data, and at least two sample evaluation results corresponding to each translation sample pair data can be obtained. , and input the language similarity between the sample language pairs into the preset network model to be trained, based on the output of the preset network model, Determine the sample evaluation weight corresponding to each quality evaluation indicator, and perform a fusion process on the at least two sample evaluation weights based on the at least two sample evaluation results to obtain the target sample evaluation result, based on the target sample evaluation result and the label evaluation result Determine the training error, back propagate the training error to the preset network model to be trained, and adjust the model parameters in the preset network model until the preset convergence conditions are met, such as when the number of iterations reaches the preset number or the training error converges , confirm that the preset network model training is completed.

It should be noted that the network architecture of the preset network model can be set based on business requirements. For example, the preset network model can directly output the evaluation weight corresponding to each quality evaluation indicator, or it can only output the evaluation weight corresponding to one quality evaluation indicator, and determine the evaluation weights corresponding to other quality evaluation indicators based on the output evaluation weight. . For example, if there are two quality evaluation indicators A and B, and the preset network model is set to output the evaluation weight corresponding to the quality evaluation index A, then since the sum of the evaluation weights corresponding to A and B is 1, 1 and The difference between the evaluation weights corresponding to indicator A is determined as the evaluation weight corresponding to indicator B.

On the basis of the above technical solution, before S130, it may also include: based on the preset multilingual model, according to the source corpus corresponding to the source language and the target corpus corresponding to the target language, determining the source language representation vector and the target language corresponding to the source language. The corresponding target language representation vector; based on the source language representation vector and the target language representation vector, determine the language similarity between the source language and the target language.

The preset multilingual model may be a model that performs language processing on texts in different languages. For example, the preset multilingual model may be but is not limited to the XLM-RoBERTa model.

For example, the source language representation vector _vi used to represent the source linguistics can be determined based on the preset multilingual model and the source corpus. The target language representation vector v _j used to represent the target linguistics can be determined based on the preset multilingual model and the target corpus. In this embodiment, the cosine distance cos(v _i , v _j ) between the source language representation vector _vi and the target language representation vector v _j can be determined as the language similarity between the source language and the target language.

Exemplarily, based on the preset multilingual model, according to the source corpus corresponding to the source language and the target corpus corresponding to the target language, the source language representation vector corresponding to the source language and the target corresponding to the target language are determined. Language representation vector can include:

Input each source text in the source corpus corresponding to the source language into the preset multilingual model, determine the source language representation vector corresponding to each source text, and determine the source language corresponding to the source language based on multiple source language representation vectors Representation vector; input each target text in the target corpus corresponding to the target language into the preset multilingual model, determine the target language representation vector corresponding to each target text, and determine the target language correspondence based on multiple target language representation vectors The target language representation vector.

Exemplarily, based on multiple source language representation vectors, determining the source language representation vector corresponding to the source language may include: averaging the multiple source language representation vectors, and the obtained average vector is determined as the source language representation vector corresponding to the source language. .

Exemplarily, based on multiple target language representation vectors, determining the target language representation vector corresponding to the target language may include: averaging the multiple target language representation vectors, and the obtained average vector is determined as the target language representation vector corresponding to the target language. .

For example, each source text in the source corpus can be input into a preset multilingual model obtained by pre-training, and based on the output of the preset multilingual model, the source language representation vector R ( x _im ), where i represents the source language and m represents the m-th source text. By averaging multiple source language representation vectors R(x _im ), the average vector obtained is determined as the source language representation vector v _i , that is Among them, n _i represents the number of source texts. In the same way, the target language representation vector R(x _jm ) corresponding to each target text can be determined based on the preset multilingual model, where j represents the target language. By averaging multiple target language representation vectors R(x _jm ), the average vector obtained is determined as the target language representation vector v _j , that is Among them, n _j represents the number of target texts.

Figure 2 is a flow chart of another machine translation quality evaluation method provided by an embodiment of the present application. This embodiment is based on the above embodiments and when the quality evaluation indicators include: fluency evaluation indicators and fidelity evaluation indicators, The entire evaluation process of translation quality is described in detail. The explanations of terms that are the same as or corresponding to the above embodiments will not be repeated here.

Referring to Figure 2, another machine translation quality assessment method provided in this embodiment includes the following steps:

S210. Obtain a translation text pair to be evaluated. The translation text pair includes a source text corresponding to the source language and a translated target text corresponding to the target language.

S220. Based on at least one preset fluency evaluation model and the source text, perform a fluency evaluation on the target text, and determine the evaluation results corresponding to the fluency evaluation indicators.

Among them, the fluency evaluation index can be used to characterize the overall fluency of the translated text and whether it conforms to the presentation habits and other information. The preset fluency evaluation model may be an evaluation model that is biased to evaluate the overall fluency of the target text, so as to obtain evaluation results corresponding to the fluency evaluation index. Preset fluency assessment models may include but are not limited to: COMET-MQM (Multidimensional Quality Metric) cross-language multidimensional quality model, COMET-QE cross-language quality assessment model and BLEURT (Bilingual Evaluation Understudy with Representations from Transformers) bilingual evaluation alternative model of at least one. Among them, COMET (Crosslingual Optimized Metric for Evaluation of Translation) is the general name for a series of translation evaluation models. COMET is a model framework. These indicators are trained by manual evaluation. MQM is a multi-dimensional and multi-level manual evaluation method. The COMET-MQM model is obtained by training the COMET model on MQM data. QE (Quality Estimation) is a specific task in the field of translation evaluation. This task does not allow the use of reference translations and can only be evaluated based on the source text. The COMET-QE model is obtained by training the COMET model on QE data. The BLEURT model is an alternative translation evaluation model for bilingual evaluation obtained using the Transformers model.

For example, the target text can be evaluated for fluency based on at least one preset fluency evaluation model, and the evaluation results corresponding to the fluency evaluation indicators can be determined. For example, if there are multiple preset fluency evaluation models, you can randomly select a preset fluency evaluation model from the multiple preset fluency evaluation models, and compare the target text with the selected preset fluency evaluation model and the source text. Carry out quality assessment and use the obtained assessment results as the assessment results corresponding to the fluency assessment indicators. You can also perform quality assessment on the target text based on each preset fluency assessment model and source text, average the multiple assessment results obtained, and use the average assessment result as the assessment result corresponding to the fluency assessment index. In order to improve the accuracy of quality assessment.

It should be noted that different preset fluency evaluation models correspond to different evaluation methods, so when evaluating the quality of the target text, the reference texts required may be different. For example, the COMET-MQM cross-language multi-dimensional quality model needs to be evaluated based on the source text and the reference translation corresponding to the source text to obtain the evaluation results corresponding to the COMET-MQM fluency evaluation index. The COMET-QE cross-language quality evaluation model needs to be evaluated based on the source text to obtain evaluation results corresponding to the COMET-QE fluency evaluation indicators. The BLEURT bilingual evaluation alternative model needs to be evaluated based on the reference translation corresponding to the source text to obtain the evaluation results corresponding to the BLEURT fluency evaluation index.

S230. Based on at least one preset loyalty assessment model and the source text, conduct a loyalty assessment on the target text and determine an assessment result corresponding to the loyalty assessment index.

Among them, the fidelity evaluation index can be used to characterize whether the details in the translated text faithfully reflect the meaning of the original text, that is, to evaluate detailed issues such as mistranslations, omissions, and emotional errors in the translation. The preset loyalty evaluation model may be an evaluation model that is biased to evaluate the fidelity of the target text, so as to obtain evaluation results corresponding to the fidelity evaluation index. The preset fidelity evaluation model may include but is not limited to: at least one of the OpenKiwi (Open-Source Machine Translation Quality Estimation in PyTorch) evaluation model and the Yisi-2 semantic evaluation model. Both the OpenKiwi evaluation model and the Yisi-2 semantic evaluation model need to be evaluated based on the source text to obtain evaluation results corresponding to the two loyalty evaluation indicators of OpenKiwi and Yisi-2.

For example, the target text can be evaluated for fidelity based on at least one preset fidelity evaluation model, and the evaluation results corresponding to the fidelity evaluation indicators can be determined. For example, if there are multiple preset loyalty evaluation models, you can randomly select a preset loyalty evaluation model from the multiple preset loyalty evaluation models, and compare the target text with the selected preset loyalty evaluation model and the source text. Carry out quality assessment and use the obtained assessment results as the assessment results corresponding to the loyalty assessment indicators. It is also possible to perform quality assessment on the target text based on each preset loyalty assessment model and source text, average the multiple assessment results obtained, and use the average assessment result as the assessment result corresponding to the loyalty assessment index. In order to improve the accuracy of quality assessment.

S240. Input the language similarity between the source language and the target language into the preset network model, and determine the evaluation weight corresponding to the fluency evaluation index and the evaluation weight corresponding to the fidelity evaluation index based on the output of the preset network model.

For example, the preset network model can directly output the evaluation weight corresponding to the fluency evaluation index and the evaluation weight corresponding to the fidelity evaluation index, or it can only output the evaluation weight corresponding to the fluency evaluation index or the evaluation weight corresponding to the fidelity evaluation index. , and determine the evaluation weight corresponding to another indicator based on the output evaluation weight. This embodiment determines the optimal evaluation weight corresponding to the fluency evaluation index and the fidelity evaluation index based on linguistic similarity, which can effectively solve the problem of different biases in evaluating fluency and fidelity for translated texts in different languages. Improved the robustness of quality assessment.

For example, S240 may include: determining the evaluation weight corresponding to the fluency evaluation index according to the output of the preset network model; determining the evaluation weight corresponding to the fidelity evaluation index based on the evaluation weight corresponding to the fluency evaluation index.

For example, when the preset network model is a model used to predict the evaluation weight corresponding to the fluency evaluation index, the weight output by the preset network model can be used as the evaluation weight corresponding to the fluency evaluation index. Since the sum of the two evaluation weights corresponding to the fluency evaluation index and the fidelity evaluation index is 1, the difference between 1 and the evaluation weight corresponding to the fluency evaluation index can be determined as the evaluation weight corresponding to the fidelity evaluation index. .

S250. Based on at least two evaluation weights, perform fusion processing on at least two evaluation results to determine the target evaluation result of the translated text pair.

For example, the evaluation result corresponding to the fluency evaluation index and the evaluation weight can be multiplied together, and the evaluation result corresponding to the fidelity evaluation index can be multiplied by the evaluation weight, and the two multiplied results can be added together to obtain The weighted average result is used as the target evaluation result, so that fidelity and fluency can be integrated for comprehensive evaluation, avoiding the bias towards fidelity or fluency caused by a single evaluation index when evaluating translated texts, thus improving the accuracy of quality evaluation. and robustness.

The technical solution of this embodiment determines the optimal evaluation weight corresponding to the fluency evaluation index and the fidelity evaluation index based on linguistic similarity, and performs fusion processing based on each evaluation weight, thereby achieving It effectively solves the problem of different biases in evaluating fluency and fidelity of translated texts in different languages, and improves the accuracy and robustness of quality assessment.

The following is an example of a machine translation quality assessment device provided by the embodiment of the present application. This device belongs to the same inventive concept as the machine translation quality assessment method of the above embodiments. Things that are not described in detail in the embodiments of the machine translation quality assessment device For details, please refer to the above embodiments of the machine translation quality assessment method.

Figure 3 is a schematic structural diagram of a machine translation quality assessment device provided by an embodiment of the present application. This embodiment can be applied to the situation of performing machine translation quality assessment on a pre-trained model, especially when the downstream task is a cross-cutting task such as a translation task. In the fine-tuning scenario during language tasks. As shown in Figure 3, the device includes: a translation text pair acquisition module 310, an evaluation result determination module 320, an evaluation weight determination module 330, and an evaluation result fusion module 340.

Among them, the translation text pair acquisition module 310 is configured to acquire a translation text pair to be evaluated. The translation text pair includes a source text corresponding to the source language and a translated target text corresponding to the target language; the evaluation result determination module 320 is configured to obtain a translation text pair based on at least Two quality evaluation indicators and the source text are used to evaluate the quality of the target text and determine the evaluation results corresponding to each quality evaluation indicator; the evaluation weight determination module 330 is set to determine each based on the language similarity between the source language and the target language. evaluation weights corresponding to each quality evaluation index; the evaluation result fusion module 340 is configured to perform fusion processing on at least two evaluation results based on at least two evaluation weights to determine the target evaluation result of the translated text pair.

The technical solution of this embodiment is to perform quality assessment on the target text based on at least two quality assessment indicators and the source text in the translated text pair to be assessed, determine the assessment results corresponding to each quality assessment indicator, and evaluate the target text based on the source language and the source text. The language similarity between the target languages determines the evaluation weight corresponding to each quality evaluation index, so as to fuse the evaluation results based on each evaluation weight to determine the target evaluation result of the translated text pair, so that at least two different The evaluation results corresponding to the quality evaluation indicators are fused to comprehensively evaluate the translation quality to avoid biased evaluation results. Each evaluation weight is determined based on the language similarity between the source language and the target language, so that the translation quality can be evaluated. Taking into account the language differences between different language pairs, it can effectively avoid the situation of large differences in the accuracy of translation evaluation of different language pairs, thereby ensuring the accuracy of translation evaluation of different language pairs.

Optionally, the quality evaluation indicators include: fluency evaluation indicators and fidelity evaluation indicators; the evaluation result determination module 320 is set to:

Based on at least one preset fluency evaluation model and the source text, perform a fluency evaluation on the target text and determine the evaluation results corresponding to the fluency evaluation indicators; based on at least one preset fidelity evaluation model and the source text, perform a fidelity evaluation on the target text Evaluation, determine the evaluation results corresponding to the loyalty evaluation indicators.

Optionally, the preset fluency assessment model includes: at least one of the COMET-MQM cross-language multi-dimensional quality model, the COMET-QE cross-language quality assessment model and the BLEURT bilingual assessment alternative model;

The preset loyalty evaluation model includes: at least one of the OpenKiwi evaluation model and the Yisi-2 semantic evaluation model.

Optionally, the evaluation weight determination module 330 includes:

The language similarity input unit is configured to input the language similarity between the source language and the target language into a preset network model. The preset network model is obtained by training the data and label evaluation results based on translation samples in advance;

The evaluation weight determination unit is configured to determine the evaluation weight corresponding to each quality evaluation index based on the output of the preset network model.

Optionally, when the quality evaluation index includes a fluency evaluation index and a fidelity evaluation index, the evaluation weight determination unit is set to:

According to the output of the preset network model, the evaluation weight corresponding to the fluency evaluation index is determined; based on the evaluation weight corresponding to the fluency evaluation index, the evaluation weight corresponding to the fidelity evaluation index is determined.

Optionally, the device also includes:

The language similarity determination module is set as follows: before determining the evaluation weight corresponding to each quality evaluation indicator based on the language similarity between the source language and the target language, based on the preset multilingual model, based on the source corpus corresponding to the source language and The target corpus corresponding to the target language determines the source language representation vector corresponding to the source language and the target language representation vector corresponding to the target language; based on the source language representation vector and the target Language representation vector determines the language similarity between the source language and the target language.

Optionally, the language similarity determination module is set to:

Optionally, the language similarity determination module is also configured to average multiple source language representation vectors, and the obtained average vector is determined to be the source language representation vector corresponding to the source language.

The machine translation quality assessment device provided by the embodiments of this application can execute the machine translation quality assessment method provided by any embodiment of this application, and has corresponding functional modules for executing the machine translation quality assessment method.

It is worth noting that in the above embodiments of the machine translation quality assessment device, the various units and modules included are only divided according to functional logic, but are not limited to the above divisions, as long as the corresponding functions can be realized; in addition, , the specific names of each functional unit are only for the convenience of distinguishing each other, and are not used to limit the scope of protection of this application.

FIG. 4 is a schematic structural diagram of an electronic device provided by an embodiment of the present application. 4 illustrates a block diagram of an exemplary electronic device 12 suitable for implementing embodiments of the present application. The electronic device 12 shown in FIG. 4 is only an example and should not bring any limitations to the functions and scope of use of the embodiments of the present application.

As shown in Figure 4, electronic device 12 is embodied in the form of a general-purpose computing device. The components of electronic device 12 may include, but are not limited to: at least one processor or processing unit 16, system memory 28, and a bus 18 connecting various system components (including system memory 28 and processing unit 16).

Bus 18 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a graphics accelerated port, a processor, or a local bus using any of a variety of bus structures. For example, these architectures include, but are not limited to, industry standard architectures (Industry Standard Architecture, ISA) bus, Micro Channel Architecture (MCA) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus and peripheral component interconnection (Peripheral Component Interconnect, PCI) bus.

Electronic device 12 typically includes a variety of computer system readable media. These media can be any available media that can be accessed by electronic device 12, including volatile and nonvolatile media, removable and non-removable media.

System memory 28 may include computer system readable media in the form of volatile memory, such as random access memory (RAM) 30 and/or cache memory 32. Electronic device 12 may further include other removable/non-removable, volatile/non-volatile computer system storage media. By way of example only, storage system 34 may be used to read and write to non-removable, non-volatile magnetic media (not shown in Figure 4, commonly referred to as a "hard drive"). Although not shown in FIG. 4, a disk drive may be provided for reading and writing to removable non-volatile disks (e.g., "floppy disks"), and for removable non-volatile optical disks (e.g., Portable Compact Disk Read-Only Memory). Compact Disc-Read Only Memory, CD-ROM), digital video disk (Digital Video Disc-Read Only Memory, DVD-ROM) or other optical media) read and write optical disc drive. In these cases, each drive may be connected to bus 18 via at least one data media interface. System memory 28 may include at least one program product having a set (eg, at least one) of program modules configured to perform the functions of various embodiments of the present application.

A program/utility 40 having a set of (at least one) program modules 42, including but not limited to an operating system, at least one application program, other program modules, and program data, may be stored, for example, in system memory 28. Each of these examples, or some combination, may include the implementation of a network environment. Program modules 42 generally perform functions and/or methods in the embodiments described herein.

Electronic device 12 may also communicate with at least one external device 14 (e.g., keyboard, pointing device, display 24, etc.) and with at least one device that enables a user to interact with electronic device 12, and /or communicate with any device (eg, network card, modem, etc.) that enables the electronic device 12 to communicate with at least one other computing device. This communication may occur through an input/output (I/O) interface 22 . Moreover, the electronic device 12 can also communicate with at least one network (such as a local area network (LAN), a wide area network (WAN) and/or a public network such as the Internet) through the network adapter 20 . As shown, network adapter 20 communicates with other modules of electronic device 12 via bus 18 . It should be understood that, although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, redundant arrays of Independent Disks, RAID) systems, tape drives, and data backup storage systems, etc.

The processing unit 16 executes various functional applications and data processing by running programs stored in the system memory 28, for example, implementing the steps of a machine translation quality assessment method provided by the embodiment of the present invention. The method includes:

Obtain a translation text pair to be evaluated. The translation text pair includes a source text corresponding to the source language and a translated target text corresponding to the target language;

Based on at least two quality assessment indicators and the source text, perform a quality assessment on the target text and determine the assessment results corresponding to each quality assessment indicator;

Based on the language similarity between the source language and the target language, determine the evaluation weight corresponding to each quality evaluation indicator;

Of course, those skilled in the art can understand that the processor can also implement the technical solution of the machine translation quality assessment method provided by any embodiment of the present application.

This embodiment provides a computer-readable storage medium on which a computer program is stored. When the program is executed by a processor, the machine translation quality assessment method steps as provided in any embodiment of the present application are implemented. The method includes:

Based on the language similarity between the source language and the target language, determine the evaluation weight corresponding to each quality evaluation index;

The computer storage medium in the embodiment of the present application may be any combination of one or more computer-readable media. The computer-readable medium may be a computer-readable signal medium or a computer-readable storage medium. The computer-readable storage medium may be, for example, but not limited to: an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, device or device, or any combination thereof. More specific examples (non-exhaustive list) of computer readable storage media include: an electrical connection having at least one conductor, a portable computer disk, a hard disk, random access memory (RAM), read only memory (ROM), erasable EPROM (Erasable Programmable Read-Only Memory) or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above. As used herein, a computer-readable storage medium may be any tangible medium that contains or stores a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave carrying computer-readable program code therein. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the above. A computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium that can send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device .

Program code embodied on a computer-readable medium may be transmitted using any appropriate medium, including but not Limited to: wireless, wires, optical cables, radio frequency (Radio Frequency, RF), etc., or any suitable combination of the above.

Computer program code for performing operations of the present application may be written in one or more programming languages, including object-oriented programming languages such as Java, Smalltalk, C++, and conventional Procedural programming language—such as "C" or a similar programming language. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In situations involving remote computers, the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (such as an Internet service provider through the Internet). connect).

Those of ordinary skill in the art should understand that the above-mentioned modules or steps of the present application can be implemented using general-purpose computing devices. They can be concentrated on a single computing device, or distributed on a network composed of multiple computing devices. Alternatively, they can be implemented with program codes executable by a computer device, so that they can be stored in a storage device and executed by the computing device, or they can be made into individual integrated circuit modules, or multiple modules among them. Or the steps are made into a single integrated circuit module. As such, the application is not limited to any specific combination of hardware and software.

Note that the above are only optional embodiments and applied technical principles of the present application. Those skilled in the art will understand that the present application is not limited to the specific embodiments described herein, and that various obvious changes, readjustments and substitutions can be made by those skilled in the art without departing from the scope of the present application. Therefore, although the present application has been described in detail through the above embodiments, the present application is not limited to the above embodiments, and may also include more other equivalent embodiments without departing from the concept of the present application, and the present application The scope is determined by the scope of the appended claims.

Claims

A machine translation quality assessment method, including:

Obtain a translation text pair to be evaluated, the translation text pair includes a source text corresponding to the source language and a translated target text corresponding to the target language;

Perform a quality assessment on the target text based on at least two quality assessment indicators and the source text, and determine the assessment results corresponding to each of the quality assessment indicators;

Based on the language similarity between the source language and the target language, determine the evaluation weight corresponding to each of the quality evaluation indicators;

Based on at least two evaluation weights, at least two evaluation results are fused to determine the target evaluation result of the translated text pair.
The method according to claim 1, wherein the at least two quality assessment indicators include: a fluency assessment indicator and a fidelity assessment indicator;

Performing a quality assessment on the target text based on at least two quality assessment indicators and the source text, and determining an assessment result corresponding to each of the quality assessment indicators, including:

Based on at least one preset fluency evaluation model and the source text, perform a fluency evaluation on the target text, and determine the evaluation results corresponding to the fluency evaluation indicators;

Based on at least one preset fidelity evaluation model and the source text, a fidelity evaluation is performed on the target text and an evaluation result corresponding to the fidelity evaluation index is determined.
The method according to claim 1, wherein determining the evaluation weight corresponding to each of the quality evaluation indicators based on the language similarity between the source language and the target language includes:

Input the language similarity between the source language and the target language into a preset network model, which is obtained by training data and label evaluation results based on translation samples in advance;

According to the output of the preset network model, the evaluation weight corresponding to each of the quality evaluation indicators is determined.
The method according to claim 3, wherein when the at least two quality evaluation indicators include a fluency evaluation index and a fidelity evaluation index, each of the quality evaluation indicators is determined according to the output of the preset network model. The corresponding evaluation weights include:

Determine the evaluation weight corresponding to the fluency evaluation index according to the output of the preset network model;

Based on the evaluation weight corresponding to the fluency evaluation index, the evaluation weight corresponding to the fidelity evaluation index is determined.
The method according to any one of claims 1 to 4, before determining the evaluation weight corresponding to each of the quality evaluation indicators based on the language similarity between the source language and the target language, further comprising:

Based on the preset multilingual model, according to the source corpus corresponding to the source language and the target corpus corresponding to the target language, determine the source language representation vector corresponding to the source language and the target language representation vector corresponding to the target language;

Based on the source language representation vector and the target language representation vector, the language similarity between the source language and the target language is determined.
The method according to claim 5, wherein the source language representation vector corresponding to the source language is determined based on a preset multilingual model and a source corpus corresponding to the source language and a target corpus corresponding to the target language. The target language representation vector corresponding to the target language includes:

Input each source text in the source corpus corresponding to the source language into the preset multilingual model, determine the source language representation vector corresponding to each source text, and determine the source language based on multiple source language representation vectors The corresponding source language representation vector;

Input each target text in the target corpus corresponding to the target language into the preset multilingual model, determine the target language representation vector corresponding to each target text, and determine the target language based on multiple target language representation vectors The corresponding target language representation vector.
The method according to claim 6, wherein determining the source language representation vector corresponding to the source language based on a plurality of source language representation vectors includes:

The multiple source language representation vectors are averaged, and the obtained average vector is determined as the source language representation vector corresponding to the source language.
A machine translation quality assessment device, including:

The translation text pair acquisition module is configured to acquire the translation text pair to be evaluated, and the translation text pair Including the source text corresponding to the source language and the translated target text corresponding to the target language;

An evaluation result determination module is configured to perform a quality evaluation on the target text based on at least two quality evaluation indicators and the source text, and determine the evaluation results corresponding to each of the quality evaluation indicators;

An evaluation weight determination module is configured to determine the evaluation weight corresponding to each of the quality evaluation indicators based on the language similarity between the source language and the target language;

The evaluation result fusion module is configured to perform fusion processing on at least two evaluation results based on at least two evaluation weights, and determine the target evaluation result of the translated text pair.
An electronic device including:

at least one processor;

a memory configured to store at least one program;

When the at least one program is executed by the at least one processor, the at least one processor is caused to implement the machine translation quality evaluation method according to any one of claims 1-7.
A computer-readable storage medium. A computer program is stored on the computer-readable storage medium. When the computer program is executed by a processor, the machine translation quality assessment method as described in any one of claims 1-7 is implemented.