CN115310460A - Machine translation quality evaluation method, device, equipment and storage medium - Google Patents

Machine translation quality evaluation method, device, equipment and storage medium Download PDF

Info

Publication number
CN115310460A
CN115310460A CN202210970061.5A CN202210970061A CN115310460A CN 115310460 A CN115310460 A CN 115310460A CN 202210970061 A CN202210970061 A CN 202210970061A CN 115310460 A CN115310460 A CN 115310460A
Authority
CN
China
Prior art keywords
language
evaluation
target
text
source
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210970061.5A
Other languages
Chinese (zh)
Inventor
陶大程
丁亮
陆清屿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jingdong Technology Information Technology Co Ltd
Original Assignee
Jingdong Technology Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jingdong Technology Information Technology Co Ltd filed Critical Jingdong Technology Information Technology Co Ltd
Priority to CN202210970061.5A priority Critical patent/CN115310460A/en
Publication of CN115310460A publication Critical patent/CN115310460A/en
Priority to PCT/CN2023/112135 priority patent/WO2024032691A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/51Translation evaluation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The embodiment of the invention discloses a method, a device, equipment and a storage medium for evaluating machine translation quality, which are applied to the technical field of natural language processing. The method comprises the following steps: acquiring a translation text pair to be evaluated, wherein the translation text pair comprises a source text corresponding to a source language and a target text corresponding to a translated target language; performing quality evaluation on the target text based on at least two quality evaluation indexes and the source text, and determining an evaluation result corresponding to each quality evaluation index; determining an evaluation weight corresponding to each quality evaluation index based on the language similarity between the source language and the target language; and performing fusion processing on each evaluation result based on each evaluation weight, and determining a target evaluation result of the translation text pair. By the technical scheme of the embodiment of the invention, the translation quality can be comprehensively evaluated, and the translation evaluation accuracy of different language pairs is ensured.

Description

Machine translation quality evaluation method, device, equipment and storage medium
Technical Field
The embodiment of the invention relates to the technical field of natural language processing, in particular to a method, a device, equipment and a storage medium for evaluating machine translation quality.
Background
With the rapid development of computer technology, the quality of text translated by a machine translation model is often required to be evaluated.
Currently, the quality of the translated text can be evaluated based on a translation quality evaluation model. For example, the quality of the translated text is evaluated by an evaluation model obtained by training based on sentence-level annotation data, and an obtained index evaluation result is biased to represent the overall fluency of the translated text. Or, performing quality evaluation on the translated text by using an evaluation model obtained by training based on the word-level labeling data, wherein the obtained index evaluation result is biased to represent the fidelity of the translated text.
However, in the process of implementing the present invention, the inventor finds that at least the following problems exist in the prior art:
the index evaluation result obtained by each translation quality evaluation model only can be biased to evaluate the quality condition of a single level of the translated text, such as the overall fluency or the fidelity of the translated text, the translation quality cannot be comprehensively evaluated, and the translation texts of different language pairs are evaluated in the same way, so that the phenomenon that the translation evaluation accuracy difference of the different language pairs is large is caused.
Disclosure of Invention
The embodiment of the invention provides a method, a device, equipment and a storage medium for evaluating machine translation quality, which are used for comprehensively evaluating translation quality and ensuring the accuracy of translation evaluation of different language pairs.
In a first aspect, an embodiment of the present invention provides a method for evaluating machine translation quality, including:
acquiring a translation text pair to be evaluated, wherein the translation text pair comprises a source text corresponding to a source language and a target text corresponding to a translated target language;
performing quality evaluation on the target text based on at least two quality evaluation indexes and the source text, and determining an evaluation result corresponding to each quality evaluation index;
determining an evaluation weight corresponding to each quality evaluation index based on language similarity between the source language and the target language;
and performing fusion processing on each evaluation result based on each evaluation weight to determine a target evaluation result of the translation text pair.
In a second aspect, an embodiment of the present invention further provides a device for evaluating machine translation quality, including:
the translation text pair obtaining module is used for obtaining a translation text pair to be evaluated, and the translation text pair comprises a source text corresponding to a source language and a target text corresponding to a translated target language;
the evaluation result determining module is used for carrying out quality evaluation on the target text based on at least two quality evaluation indexes and the source text and determining an evaluation result corresponding to each quality evaluation index;
an evaluation weight determining module, configured to determine an evaluation weight corresponding to each quality evaluation indicator based on a language similarity between the source language and the target language;
and the evaluation result fusion module is used for performing fusion processing on each evaluation result based on each evaluation weight to determine a target evaluation result of the translation text pair.
In a third aspect, an embodiment of the present invention further provides an electronic device, where the electronic device includes:
one or more processors;
a memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement a method for machine translation quality assessment as provided by any of the embodiments of the present invention.
In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method for evaluating machine translation quality according to any embodiment of the present invention.
The embodiment of the invention has the following advantages or beneficial effects:
the method comprises the steps of performing quality evaluation on a target text based on at least two quality evaluation indexes and a source text in a translation text pair to be evaluated, determining an evaluation result corresponding to each quality evaluation index, determining an evaluation weight corresponding to each quality evaluation index based on language similarity between a source language and a target language, performing fusion processing on each evaluation result based on each evaluation weight, and determining a target evaluation result of the translation text pair, so that the evaluation results corresponding to at least two different quality evaluation indexes can be subjected to fusion processing, translation quality is comprehensively evaluated, deviation of the evaluation results is avoided, and each evaluation weight is determined based on the language similarity between the source language and the target language, so that language differences between different language pairs can be considered, the situation that the translation evaluation accuracy differences of different language pairs are large is effectively avoided, and the translation evaluation accuracy of different language pairs is further ensured.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, a brief description will be given below of the drawings required for the embodiments or the technical solutions in the prior art, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flow chart of a method for evaluating machine translation quality according to an embodiment of the present invention;
FIG. 2 is a flow chart of another method for evaluating machine translation quality according to an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a device for evaluating machine translation quality according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Fig. 1 is a flowchart of a method for evaluating machine translation quality according to an embodiment of the present invention, which is applicable to a case of evaluating quality of a text translated by a machine translation model. The method can be executed by a machine translation quality evaluation device, which can be implemented by software and/or hardware and is integrated in an electronic device. As shown in fig. 1, the method specifically includes the following steps:
s110, obtaining a translation text pair to be evaluated, wherein the translation text pair comprises a source text corresponding to a source language and a target text corresponding to a translated target language.
The source language may refer to a language to be translated. The target language is the translated language. The source text may refer to an original expressed by a source language, i.e., a sentence to be translated. The target text may refer to a translation expressed in the target language having the same meaning as the source text, i.e., a translated sentence.
Specifically, the source text may be input into a machine translation model for translation, and a target text output by the machine translation model is obtained, so as to obtain a translation text pair to be evaluated.
And S120, performing quality evaluation on the target text based on at least two quality evaluation indexes and the source text, and determining an evaluation result corresponding to each quality evaluation index.
The quality evaluation index may be an index for evaluating the translation quality of the target text. Different kinds of quality assessment indicators are biased towards assessing translation conditions at different levels. For example, the quality assessment indicator may be, but is not limited to: a fluency evaluation index biased towards evaluating the overall fluency of the target text, or a loyalty evaluation index biased towards evaluating the loyalty of the target text. The fluency evaluation index can be used for representing the overall fluency of the translated text, whether the translated text conforms to the information of expression habits and the like. The loyalty evaluation index can be used for representing whether the details in the translated text faithfully reflect the meaning of the original text, namely, the details problems of wrong translation, missed translation, emotional errors and the like in the translated text are judged. Each quality assessment index may correspond to one or more quality assessment models, so that the one or more quality assessment models are used to determine the assessment result corresponding to the quality assessment index. The present embodiment may indicate the evaluation result by means of a scoring. For example, the larger the score in the evaluation result, the higher the quality degree corresponding to the quality evaluation index, such as the higher fluency or the higher fidelity.
Specifically, at least two different quality assessment indexes can be selected based on service requirements, and for each quality assessment index, quality assessment can be performed on the target text based on at least one quality assessment model corresponding to the quality assessment index, and an assessment result corresponding to the quality assessment index is determined. For example, if the quality evaluation index corresponds to multiple quality evaluation models, one quality evaluation model may be randomly selected from the quality evaluation models, a target text may be subjected to quality evaluation based on the selected quality evaluation model and a source text, and an obtained evaluation result may be used as an evaluation result corresponding to the quality evaluation index; the target text can be subjected to quality evaluation based on each quality evaluation model and the source text, and the obtained evaluation results are subjected to average processing to obtain an average evaluation result corresponding to the quality evaluation index, so that the accuracy of quality evaluation is further improved.
S130, determining an evaluation weight corresponding to each quality evaluation index based on the language similarity between the source language and the target language.
The language similarity may refer to the linguistic similarity between the source language and the target language in the language family, vocabulary, and grammar structure.
Specifically, the language similarity between every two languages may be predetermined, so as to directly obtain the language similarity between the source language and the target language, and also determine the language similarity between the source language and the target language in real time. Based on the language similarity between the source language and the target language, the optimal evaluation weights corresponding to different quality evaluation indexes can be determined, that is, different evaluation weights can be determined based on different language pairs, so that the language difference between different language pairs can be considered, the condition that the difference of the translation evaluation accuracy of different language pairs is large is effectively avoided, the translation evaluation accuracy of different language pairs is further ensured, and the universality of quality translation evaluation is improved.
And S140, based on the evaluation weights, fusing the evaluation results, and determining a target evaluation result of the translation text pair.
Specifically, the evaluation result corresponding to each quality evaluation index and the corresponding evaluation weight can be multiplied, the multiplication results are added, and the obtained weighted average result is used as the target evaluation result, so that the evaluation results corresponding to at least two quality evaluation indexes can be fused, the translation quality can be comprehensively evaluated, the deviation condition generated when a single evaluation index evaluates the translated text is avoided, and the accuracy and the robustness of quality evaluation are further improved.
According to the technical scheme of the embodiment, the target text is subjected to quality evaluation based on at least two quality evaluation indexes and the source text in the translation text pair to be evaluated, an evaluation result corresponding to each quality evaluation index is determined, an evaluation weight corresponding to each quality evaluation index is determined based on the language similarity between the source language and the target language, fusion processing is performed on each evaluation result based on each evaluation weight, the target evaluation result of the translation text pair is determined, the evaluation results corresponding to at least two different quality evaluation indexes can be subjected to fusion processing, translation quality is comprehensively evaluated, deviation of the generated evaluation results is avoided, each evaluation weight is determined based on the language similarity between the source language and the target language, the language difference between different language pairs can be considered, the situation that the translation evaluation accuracy difference of different language pairs is large is effectively avoided, and the translation evaluation accuracy of different language pairs is further ensured.
On the basis of the above technical solution, S130 may include: inputting language similarity between a source language and a target language into a preset network model, wherein the preset network model is obtained by training data and a label evaluation result in advance based on a translation sample; and determining the evaluation weight corresponding to each quality evaluation index according to the output of the preset network model.
The preset network model can be used for representing a mapping relation between the optimal evaluation weight corresponding to each quality evaluation index and the language similarity, and the mapping relation can be obtained by learning data and a label evaluation result based on the translation sample. For example, the method may include performing quality evaluation on a target sample text in translation sample pair data based on at least two quality evaluation indexes and a source sample text in the translation sample pair data, obtaining each sample evaluation result corresponding to each translation sample pair data, inputting language similarity between sample language pairs into a preset network model to be trained, determining a sample evaluation weight corresponding to each quality evaluation index based on output of the preset network model, performing fusion processing on each sample evaluation weight based on each sample evaluation result, obtaining a target sample evaluation result, determining a training error based on the target sample evaluation result and a label evaluation result, reversely propagating the training error to the preset network model to be trained, adjusting model parameters in the preset network model, and determining that training of the preset network model is ended until a preset convergence condition is satisfied, such as the number of iterations reaches a preset number or the training error converges.
It should be noted that the network architecture of the preset network model may be set based on the service requirement. For example, the preset network model may directly output the evaluation weight corresponding to each quality evaluation index, or may output only the evaluation weight corresponding to one quality evaluation index, and determine the evaluation weights corresponding to other quality evaluation indexes based on the output evaluation weight. For example, if there are two quality evaluation indexes a and B and the preset network model is used to output the evaluation weight corresponding to the quality evaluation index a, since the sum of the evaluation weights corresponding to a and B is 1, the difference between 1 and the evaluation weight corresponding to the index a can be determined as the evaluation weight corresponding to the index B.
On the basis of the above technical solution, before S130, the method may further include: determining a source language characterization vector corresponding to the source language and a target language characterization vector corresponding to the target language based on a preset multi-language model according to a source language material library corresponding to the source language and a target language material library corresponding to the target language; and determining the language similarity between the source language and the target language based on the source language characterization vector and the target language characterization vector.
The preset multilingual model may be a model for performing language processing on texts in different languages. For example, the predetermined multilingual model may be, but is not limited to, the XLM-RoBERTA model.
Specifically, a source language characterization vector v for characterizing the source language can be determined based on a preset multilingual model and a source language database i . Target language characterization vector v for characterizing target linguistics can be determined based on preset multilingual model and target corpus j . The embodiment can characterize the vector v of the source language i And target language characterization vector v j Cosine distance cos (v) therebetween i ,v j ) The language similarity between the source language and the target language is determined.
Exemplarily, determining a source language token vector corresponding to a source language and a target language token vector corresponding to a target language based on a preset multi-language model according to a source language database corresponding to the source language and a target language database corresponding to the target language may include:
inputting each source text in a source language material base corresponding to a source language into a preset multi-language model, determining a source language representation vector corresponding to each source text, and determining a source language representation vector corresponding to the source language based on each source language representation vector; inputting each target text in a target language database corresponding to the target language into a preset multilingual model, determining a target language representation vector corresponding to each target text, and determining a target language representation vector corresponding to the target language based on each target language representation vector.
Illustratively, determining a source language token vector corresponding to a source language based on each source language token vector may include: and averaging all the source language characterization vectors, and determining the obtained average vector as a source language characterization vector corresponding to the source language.
For example, determining a target language token vector corresponding to a target language based on each target language token vector may include: and averaging all the target language representation vectors, and determining the obtained average vector as a target language representation vector corresponding to the target language.
Specifically, each source text in the source language database may be input into a preset multilingual model obtained through pre-training, and a source language representation vector R (x) corresponding to each source text is determined based on output of the preset multilingual model im ) Wherein, i represents the source language, and m represents the mth source text. By characterizing the vector R (x) for each source language im ) Carrying out average processing, and determining the obtained average vector as a source language characterization vector v i I.e. by
Figure BDA0003796205970000091
Wherein n is i Representative is the amount of source text. Similarly, a target language characterization vector R (x) corresponding to each target text can be determined based on a preset multi-language model jm ) Wherein j represents the target language. By characterizing the vector R (x) for each target language jm ) Carrying out average processing, and determining the obtained average vector as a target language characterization vector v j I.e. by
Figure BDA0003796205970000092
Wherein n is j Representing the target text amount.
Fig. 2 is a flowchart of another machine translation quality evaluation method according to an embodiment of the present invention, where the quality evaluation index includes: when the fluency evaluation index and the fidelity evaluation index are used, the whole evaluation process of the translation quality is described in detail. Wherein explanations of the same or corresponding terms as those of the above embodiments are omitted.
Referring to fig. 2, another method for evaluating machine translation quality provided by this embodiment specifically includes the following steps:
s210, obtaining a translation text pair to be evaluated, wherein the translation text pair comprises a source text corresponding to a source language and a target text corresponding to a translated target language.
S220, performing fluency assessment on the target text based on at least one preset fluency assessment model and the source text, and determining an assessment result corresponding to a fluency assessment index.
The fluency evaluation index can be used for representing the information such as the whole fluency of the translated text, whether the translated text conforms to the expression habit and the like. The preset fluency evaluation model can be an evaluation model used for preferentially evaluating the overall fluency of the target text so as to obtain an evaluation result corresponding to the fluency evaluation index. The preset fluency assessment model may include, but is not limited to: at least one of a COMET-MQM (Multidimensional Quality Metric) cross language Multidimensional Quality model, a COMET-QE cross language Quality assessment model, and a BLUERT (Bilingual Evaluation Understudy with retrieval from transforms) Bilingual assessment substitution model. The COMET (Cross Optimized method for Evaluation of Translation) is a generic term of a series of Translation Evaluation models, and the COMET is a model framework, and the indexes are trained by manual Evaluation. The MQM is a multi-dimensional and multi-level manual evaluation method, and the COMET-MQM model is obtained by training the COMET model on MQM data. QE (Quality assessment) is a specific task in the field of translation evaluation, which does not allow the use of reference translations, but only evaluates based on source text. The COMET-QE model is obtained by training the COMET model on QE data. The BLEURT model is a translation evaluation model for bilingual evaluation substitution using the transformations model.
Specifically, the fluency of the target text can be evaluated based on one or more preset fluency evaluation models, and an evaluation result corresponding to the fluency evaluation index can be determined. For example, if a plurality of preset fluency evaluation models exist, one preset fluency evaluation model can be randomly selected from each preset fluency evaluation model, quality evaluation is performed on the target text based on the selected preset fluency evaluation model and the source text, and the obtained evaluation result is used as the evaluation result corresponding to the fluency evaluation index. The target text can be subjected to quality evaluation based on each preset fluency evaluation model and the source text, the obtained evaluation results are subjected to average processing, and the average evaluation result is used as an evaluation result corresponding to the fluency evaluation index, so that the accuracy of quality evaluation is further improved.
It should be noted that, the evaluation modes corresponding to different preset fluency evaluation models are different, so that when the quality of the target text is evaluated, the required reference text may be different. For example, the COMET-MQM cross-language multidimensional quality model needs to be evaluated based on a source text and a reference translation corresponding to the source text to obtain an evaluation result corresponding to a COMET-MQM fluency evaluation index. The COMET-QE cross-language quality evaluation model needs to be evaluated based on a source text to obtain an evaluation result corresponding to a COMET-QE fluency evaluation index. The BLUERT bilingual evaluation substitution model needs to be evaluated based on a reference translation corresponding to a source text, and an evaluation result corresponding to a BLUERT fluency evaluation index is obtained.
And S230, performing loyalty evaluation on the target text based on at least one preset loyalty evaluation model and the source text, and determining an evaluation result corresponding to the loyalty evaluation index.
The loyalty evaluation index can be used for representing whether details in the translated text faithfully reflect the meaning of the original text, namely, the details problems of wrong translation, missed translation, emotional errors and the like in the translated text are judged. The preset loyalty evaluation model may be an evaluation model for biasing to evaluate the target text loyalty so as to obtain an evaluation result corresponding to the loyalty evaluation index. The preset loyalty evaluation model may include, but is not limited to: the OpenKiwi (Open-Source Machine Translation Estimation in Pythch) evaluation model and the Yisi-2 semantic evaluation model. The OpenKiwi evaluation model and the YIsi-2 semantic evaluation model need to be evaluated based on the source text, and evaluation results corresponding to two fidelity evaluation indexes of OpenKiwi and YIsi-2 are obtained.
Specifically, based on one or more preset loyalty evaluation models, loyalty evaluation can be performed on the target text, and an evaluation result corresponding to the loyalty evaluation index can be determined. For example, if there are a plurality of preset loyalty evaluation models, one preset loyalty evaluation model may be randomly selected from the preset loyalty evaluation models, the target text may be evaluated for quality based on the selected preset loyalty evaluation model and the source text, and the obtained evaluation result may be used as an evaluation result corresponding to the loyalty evaluation index. The target text can be subjected to quality evaluation based on each preset loyalty evaluation model and the source text, the obtained evaluation results are subjected to average processing, and the average evaluation results are used as evaluation results corresponding to the loyalty evaluation indexes, so that the accuracy of quality evaluation is further improved.
S240, inputting the language similarity between the source language and the target language into a preset network model, and determining an evaluation weight corresponding to the fluency evaluation index and an evaluation weight corresponding to the loyalty evaluation index according to the output of the preset network model.
Specifically, the preset network model may directly output an evaluation weight corresponding to the fluency evaluation index and an evaluation weight corresponding to the loyalty evaluation index, or may only output an evaluation weight corresponding to the fluency evaluation index or the loyalty evaluation index, and determine an evaluation weight corresponding to another index based on the output evaluation weight. According to the method and the device, the optimal evaluation weight corresponding to the fluency evaluation index and the loyalty evaluation index is determined based on the linguistic similarity, the problem that different biases are generated when fluency and loyalty are evaluated for translated texts of different languages can be effectively solved, and the robustness of quality evaluation is further improved.
Exemplarily, S240 may include: determining an evaluation weight corresponding to the fluency evaluation index according to the output of a preset network model; and determining the evaluation weight corresponding to the loyalty evaluation index based on the evaluation weight corresponding to the fluency evaluation index.
Specifically, when the preset network model is a model for predicting an evaluation weight corresponding to the fluency evaluation index, the weight output by the preset network model may be used as the evaluation weight corresponding to the fluency evaluation index. Since the sum of the two evaluation weights corresponding to the fluency evaluation index and the loyalty evaluation index is 1, the difference between 1 and the evaluation weight corresponding to the fluency evaluation index can be determined as the evaluation weight corresponding to the loyalty evaluation index.
And S250, performing fusion processing on each evaluation result based on each evaluation weight, and determining a target evaluation result of the translation text pair.
Specifically, the evaluation result and the evaluation weight corresponding to the fluency evaluation index can be multiplied, the evaluation weight and the evaluation weight corresponding to the loyalty evaluation index can be multiplied, the two multiplication results are added, and the obtained weighted average result is used as a target evaluation result, so that the loyalty and the fluency can be fused for comprehensive evaluation, the problem of deviation of the single evaluation index to the loyalty or the fluency generated when the translated text is evaluated is avoided, and the accuracy and the robustness of quality evaluation are further improved.
According to the technical scheme, the optimal evaluation weights corresponding to the fluency evaluation index and the loyalty evaluation index are determined based on the linguistic similarity, and fusion processing is performed based on the evaluation weights, so that the problem that different biases are generated when fluency and loyalty of translated texts of different languages are evaluated can be effectively solved, and accuracy and robustness of quality evaluation are further improved.
The following is an embodiment of the machine translation quality evaluation apparatus provided in the embodiment of the present invention, which belongs to the same inventive concept as the machine translation quality evaluation methods in the embodiments described above, and reference may be made to the embodiments of the machine translation quality evaluation method for details that are not described in detail in the embodiments of the machine translation quality evaluation apparatus.
Fig. 3 is a schematic structural diagram of a machine translation quality evaluation apparatus according to an embodiment of the present invention, which is applicable to a situation of performing machine translation quality evaluation on a pre-training model, and is particularly applicable to a fine tuning scenario when a downstream task is a cross-language task such as a translation task. As shown in fig. 3, the apparatus specifically includes: a translation text pair obtaining module 310, an evaluation result determining module 320, an evaluation weight determining module 330, and an evaluation result fusing module 340.
The translation text pair obtaining module 310 is configured to obtain a translation text pair to be evaluated, where the translation text pair includes a source text corresponding to a source language and a target text corresponding to a translated target language; an evaluation result determining module 320, configured to perform quality evaluation on the target text based on at least two quality evaluation indexes and the source text, and determine an evaluation result corresponding to each quality evaluation index; an evaluation weight determining module 330, configured to determine an evaluation weight corresponding to each quality evaluation index based on the language similarity between the source language and the target language; and the evaluation result fusion module 340 is configured to perform fusion processing on each evaluation result based on each evaluation weight, and determine a target evaluation result of the translation text pair.
According to the technical scheme of the embodiment, the target text is subjected to quality evaluation based on at least two quality evaluation indexes and the source text in the translation text pair to be evaluated, an evaluation result corresponding to each quality evaluation index is determined, an evaluation weight corresponding to each quality evaluation index is determined based on the language similarity between the source language and the target language, fusion processing is performed on each evaluation result based on each evaluation weight, the target evaluation result of the translation text pair is determined, the evaluation results corresponding to at least two different quality evaluation indexes can be subjected to fusion processing, translation quality is comprehensively evaluated, deviation of the generated evaluation results is avoided, each evaluation weight is determined based on the language similarity between the source language and the target language, the language difference between different language pairs can be considered, the situation that the translation evaluation accuracy difference of different language pairs is large is effectively avoided, and the translation evaluation accuracy of different language pairs is further ensured.
Optionally, the quality assessment indicator comprises: fluency assessment indexes and loyalty assessment indexes; the evaluation result determining module 320 is specifically configured to:
performing fluency evaluation on the target text based on at least one preset fluency evaluation model and the source text, and determining an evaluation result corresponding to a fluency evaluation index; and performing loyalty evaluation on the target text based on at least one preset loyalty evaluation model and the source text, and determining an evaluation result corresponding to the loyalty evaluation index.
Optionally, the preset fluency assessment model comprises: at least one of a COMET-MQM cross language multi-dimensional quality model, a COMET-QE cross language quality evaluation model and a BLUERT bilingual evaluation substitution model;
the preset loyalty evaluation model comprises the following steps: an OpenKiwi evaluation model and a Yisi-2 semantic evaluation model.
Optionally, the evaluation weight determining module 330 includes:
the language similarity input unit is used for inputting the language similarity between the source language and the target language into a preset network model, and the preset network model is obtained by training data and a label evaluation result in advance based on a translation sample;
and the evaluation weight determining unit is used for determining the evaluation weight corresponding to each quality evaluation index according to the output of the preset network model.
Optionally, when the quality evaluation index includes a fluency evaluation index and a loyalty evaluation index, the evaluation weight determining unit is specifically configured to:
determining an evaluation weight corresponding to the fluency evaluation index according to the output of a preset network model; and determining the evaluation weight corresponding to the loyalty evaluation index based on the evaluation weight corresponding to the fluency evaluation index.
Optionally, the apparatus further comprises:
a language similarity determination module configured to: determining a source language characterization vector corresponding to the source language and a target language characterization vector corresponding to the target language according to a source language material base corresponding to the source language and a target language material base corresponding to the target language based on a preset multi-language model before determining an evaluation weight corresponding to each quality evaluation index based on language similarity between the source language and the target language; and determining the language similarity between the source language and the target language based on the source language characterization vector and the target language characterization vector.
Optionally, the language similarity determining module is specifically configured to:
inputting each source text in a source language material library corresponding to a source language into a preset multi-language model, determining a source language representation vector corresponding to each source text, and determining a source language representation vector corresponding to the source language based on each source language representation vector; inputting each target text in the target corpus corresponding to the target language into a preset multi-language model, determining a target language representation vector corresponding to each target text, and determining a target language representation vector corresponding to the target language based on each target language representation vector.
Optionally, the language similarity determining module is further specifically configured to: and carrying out average processing on the source language characterization vectors, and determining the obtained average vector as a source language characterization vector corresponding to the source language.
The machine translation quality evaluation device provided by the embodiment of the invention can execute the machine translation quality evaluation method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects for executing the machine translation quality evaluation method.
It should be noted that, in the embodiment of the machine translation quality evaluation apparatus, the included units and modules are merely divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.
Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention. FIG. 4 illustrates a block diagram of an exemplary electronic device 12 suitable for use in implementing embodiments of the present invention. The electronic device 12 shown in fig. 4 is only an example and should not bring any limitation to the function and the scope of use of the embodiment of the present invention.
As shown in FIG. 4, electronic device 12 is embodied in the form of a general purpose computing device. The components of electronic device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including the system memory 28 and the processing unit 16.
Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
Electronic device 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by electronic device 12 and includes both volatile and nonvolatile media, removable and non-removable media.
The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) 30 and/or cache memory 32. The electronic device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 4, and commonly referred to as a "hard drive"). Although not shown in FIG. 4, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 18 by one or more data media interfaces. System memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.
A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in system memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 42 generally carry out the functions and/or methodologies of the described embodiments of the invention.
Electronic device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), with one or more devices that enable a user to interact with electronic device 12, and/or with any devices (e.g., network card, modem, etc.) that enable electronic device 12 to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 22. Also, the electronic device 12 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet) via the network adapter 20. As shown, the network adapter 20 communicates with other modules of the electronic device 12 via the bus 18. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with electronic device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, to name a few.
The processing unit 16 executes various functional applications and data processing by executing programs stored in the system memory 28, for example, to implement the steps of a machine translation quality assessment method provided by the embodiment of the present invention, the method including:
acquiring a translation text pair to be evaluated, wherein the translation text pair comprises a source text corresponding to a source language and a target text corresponding to a translated target language;
performing quality evaluation on the target text based on at least two quality evaluation indexes and the source text, and determining an evaluation result corresponding to each quality evaluation index;
determining an evaluation weight corresponding to each quality evaluation index based on the language similarity between the source language and the target language;
and performing fusion processing on each evaluation result based on each evaluation weight, and determining a target evaluation result of the translation text pair.
Of course, those skilled in the art can understand that the processor may also implement the technical solution of the machine translation quality evaluation method provided in any embodiment of the present invention.
The present embodiment provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the steps of a machine translation quality assessment method provided by any embodiment of the present invention, the method including:
acquiring a translation text pair to be evaluated, wherein the translation text pair comprises a source text corresponding to a source language and a target text corresponding to a translated target language;
performing quality evaluation on the target text based on at least two quality evaluation indexes and the source text, and determining an evaluation result corresponding to each quality evaluation index;
determining an evaluation weight corresponding to each quality evaluation index based on the language similarity between the source language and the target language;
and performing fusion processing on each evaluation result based on each evaluation weight, and determining a target evaluation result of the translation text pair.
Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer-readable storage medium may be, for example but not limited to: an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
It will be understood by those skilled in the art that the modules or steps of the invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of computing devices, and optionally they may be implemented by program code executable by a computing device, such that it may be stored in a memory device and executed by a computing device, or it may be separately fabricated into various integrated circuit modules, or it may be fabricated by fabricating a plurality of modules or steps thereof into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (10)

1. A machine translation quality assessment method is characterized by comprising the following steps:
acquiring a translation text pair to be evaluated, wherein the translation text pair comprises a source text corresponding to a source language and a target text corresponding to a translated target language;
performing quality evaluation on the target text based on at least two quality evaluation indexes and the source text, and determining an evaluation result corresponding to each quality evaluation index;
determining an evaluation weight corresponding to each quality evaluation index based on the language similarity between the source language and the target language;
and performing fusion processing on each evaluation result based on each evaluation weight to determine a target evaluation result of the translation text pair.
2. The method of claim 1, wherein the quality assessment indicators comprise: fluency assessment indexes and loyalty assessment indexes;
the quality evaluation of the target text based on at least two quality evaluation indexes and the source text, and determining an evaluation result corresponding to each quality evaluation index, comprises:
performing fluency assessment on the target text based on at least one preset fluency assessment model and the source text, and determining an assessment result corresponding to the fluency assessment index;
and performing loyalty evaluation on the target text based on at least one preset loyalty evaluation model and the source text, and determining an evaluation result corresponding to the loyalty evaluation index.
3. The method according to claim 1, wherein said determining an evaluation weight corresponding to each of said quality evaluation indicators based on said language similarity between said source language and said target language comprises:
inputting language similarity between the source language and the target language into a preset network model, wherein the preset network model is obtained by training data and a label evaluation result in advance based on a translation sample;
and determining the evaluation weight corresponding to each quality evaluation index according to the output of the preset network model.
4. The method of claim 3, wherein when the quality assessment indicators include a fluency assessment indicator and a loyalty assessment indicator, determining an assessment weight corresponding to each of the quality assessment indicators according to an output of the preset network model comprises:
determining an evaluation weight corresponding to the fluency evaluation index according to the output of the preset network model;
and determining the evaluation weight corresponding to the loyalty evaluation index based on the evaluation weight corresponding to the fluency evaluation index.
5. The method according to any one of claims 1 to 4, wherein before determining the evaluation weight corresponding to each of the quality evaluation indicators based on the language similarity between the source language and the target language, the method further comprises:
determining a source language characterization vector corresponding to the source language and a target language characterization vector corresponding to the target language according to a source language material base corresponding to the source language and a target language material base corresponding to the target language based on a preset multi-language model;
and determining the language similarity between the source language and the target language based on the source language characterization vector and the target language characterization vector.
6. The method according to claim 5, wherein the determining, based on the predetermined multilingual model, the source language token vector corresponding to the source language and the target language token vector corresponding to the target language from the source language database corresponding to the source language and the target language database corresponding to the target language comprises:
inputting each source text in a source language material library corresponding to the source language into a preset multi-language model, determining a source language representation vector corresponding to each source text, and determining a source language representation vector corresponding to the source language based on each source language representation vector;
inputting each target text in the target corpus corresponding to the target language into a preset multilingual model, determining a target language representation vector corresponding to each target text, and determining the target language representation vector corresponding to the target language based on each target language representation vector.
7. The method of claim 6, wherein determining a source language token vector corresponding to the source language based on each of the source language token vectors comprises:
and averaging all the source language characterization vectors, and determining the obtained average vector as the source language characterization vector corresponding to the source language.
8. A machine translation quality evaluation apparatus, comprising:
the translation text pair obtaining module is used for obtaining a translation text pair to be evaluated, and the translation text pair comprises a source text corresponding to a source language and a target text corresponding to a translated target language;
the evaluation result determining module is used for carrying out quality evaluation on the target text based on at least two quality evaluation indexes and the source text and determining an evaluation result corresponding to each quality evaluation index;
an evaluation weight determining module, configured to determine an evaluation weight corresponding to each quality evaluation indicator based on a language similarity between the source language and the target language;
and the evaluation result fusion module is used for performing fusion processing on each evaluation result based on each evaluation weight to determine a target evaluation result of the translation text pair.
9. An electronic device, characterized in that the electronic device comprises:
one or more processors;
a memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement the machine translation quality assessment method of any of claims 1-7.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method for machine translation quality assessment according to any one of claims 1-7.
CN202210970061.5A 2022-08-12 2022-08-12 Machine translation quality evaluation method, device, equipment and storage medium Pending CN115310460A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210970061.5A CN115310460A (en) 2022-08-12 2022-08-12 Machine translation quality evaluation method, device, equipment and storage medium
PCT/CN2023/112135 WO2024032691A1 (en) 2022-08-12 2023-08-10 Machine translation quality assessment method and apparatus, device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210970061.5A CN115310460A (en) 2022-08-12 2022-08-12 Machine translation quality evaluation method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115310460A true CN115310460A (en) 2022-11-08

Family

ID=83862779

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210970061.5A Pending CN115310460A (en) 2022-08-12 2022-08-12 Machine translation quality evaluation method, device, equipment and storage medium

Country Status (2)

Country Link
CN (1) CN115310460A (en)
WO (1) WO2024032691A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116341561A (en) * 2023-03-27 2023-06-27 京东科技信息技术有限公司 Voice sample data generation method, device, equipment and storage medium
WO2024032691A1 (en) * 2022-08-12 2024-02-15 京东科技信息技术有限公司 Machine translation quality assessment method and apparatus, device, and storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9747281B2 (en) * 2015-12-07 2017-08-29 Linkedin Corporation Generating multi-language social network user profiles by translation
CN107357783B (en) * 2017-07-04 2020-06-12 桂林电子科技大学 English translation quality analysis method for translating Chinese into English
CN111027331B (en) * 2019-12-05 2022-04-05 百度在线网络技术(北京)有限公司 Method and apparatus for evaluating translation quality
CN114004238A (en) * 2021-09-23 2022-02-01 昆明理工大学 Chinese-transcendental neural machine translation quality estimation method integrating language differentiation characteristics
CN115310460A (en) * 2022-08-12 2022-11-08 京东科技信息技术有限公司 Machine translation quality evaluation method, device, equipment and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024032691A1 (en) * 2022-08-12 2024-02-15 京东科技信息技术有限公司 Machine translation quality assessment method and apparatus, device, and storage medium
CN116341561A (en) * 2023-03-27 2023-06-27 京东科技信息技术有限公司 Voice sample data generation method, device, equipment and storage medium
CN116341561B (en) * 2023-03-27 2024-02-02 京东科技信息技术有限公司 Voice sample data generation method, device, equipment and storage medium

Also Published As

Publication number Publication date
WO2024032691A1 (en) 2024-02-15

Similar Documents

Publication Publication Date Title
CN107908635B (en) Method and device for establishing text classification model and text classification
US20180165278A1 (en) Method and apparatus for translating based on artificial intelligence
CN115310460A (en) Machine translation quality evaluation method, device, equipment and storage medium
US10387568B1 (en) Extracting keywords from a document
CN109599095B (en) Method, device and equipment for marking voice data and computer storage medium
US11157699B2 (en) Interactive method and apparatus based on test-type application
US10977155B1 (en) System for providing autonomous discovery of field or navigation constraints
CN109408829B (en) Method, device, equipment and medium for determining readability of article
US10032448B1 (en) Domain terminology expansion by sensitivity
CN111143556B (en) Automatic counting method and device for software function points, medium and electronic equipment
US11386270B2 (en) Automatically identifying multi-word expressions
KR20150007647A (en) Method and system for statistical context-sensitive spelling correction using confusion set
US20220351634A1 (en) Question answering systems
CN111597800B (en) Method, device, equipment and storage medium for obtaining synonyms
CN110688111A (en) Configuration method, device, server and storage medium of business process
CN114792089A (en) Method, apparatus and program product for managing computer system
CN107153694B (en) Method, device, equipment and storage medium for automatically modifying question errors
US10043511B2 (en) Domain terminology expansion by relevancy
Rozovskaya et al. Adapting to learner errors with minimal supervision
CN113705207A (en) Grammar error recognition method and device
CN110807334A (en) Text processing method, device, medium and computing equipment
US11922129B2 (en) Causal knowledge identification and extraction
US20220067102A1 (en) Reasoning based natural language interpretation
US20220269860A1 (en) Evaluation apparatus and evaluation method
CN112528651A (en) Intelligent correction method, system, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination