CN112052693A - Method, device and equipment for evaluating machine translation effect and storage medium - Google Patents

Method, device and equipment for evaluating machine translation effect and storage medium Download PDF

Info

Publication number
CN112052693A
CN112052693A CN202010965988.0A CN202010965988A CN112052693A CN 112052693 A CN112052693 A CN 112052693A CN 202010965988 A CN202010965988 A CN 202010965988A CN 112052693 A CN112052693 A CN 112052693A
Authority
CN
China
Prior art keywords
clause
translation
machine translation
text
translated
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010965988.0A
Other languages
Chinese (zh)
Other versions
CN112052693B (en
Inventor
罗佩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
iFlytek Co Ltd
Original Assignee
iFlytek Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by iFlytek Co Ltd filed Critical iFlytek Co Ltd
Priority to CN202010965988.0A priority Critical patent/CN112052693B/en
Publication of CN112052693A publication Critical patent/CN112052693A/en
Application granted granted Critical
Publication of CN112052693B publication Critical patent/CN112052693B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/51Translation evaluation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The application discloses a method, a device, equipment and a storage medium for evaluating machine translation effect, wherein for each clause contained in a text to be translated, the clause and the clauses before the clause are input into a machine translation tool to obtain a translation result as a machine translation of the clause, the machine translation of each clause is subjected to length constraint, a plurality of terminal translations are deleted compared with the machine translation after the constraint of the machine translation, the machine translation after the constraint of each clause is further matched with the machine translation of the next adjacent clause to obtain the translation constraint loss of the machine translation of the text to be translated, the translation constraint loss can measure the expression capacity of the whole context coherence of the text to be translated when the machine translation tool translates the text to be translated, and the machine translation score of the text to be translated is determined based on the translation constraint loss and a reference translation, the scoring considers the continuity of more detailed sentence levels, and greatly improves the accuracy and reliability of the machine translation scoring.

Description

Method, device and equipment for evaluating machine translation effect and storage medium
Technical Field
The present application relates to the field of machine translation technologies, and in particular, to a method, an apparatus, a device, and a storage medium for evaluating a machine translation effect.
Background
With the development of the machine translation level, the evaluation on the effect becomes more important, and researchers need to frequently evaluate the translation result to feed back so as to improve the translation system in the process of continuously developing and improving the translation system.
Human evaluation is very time consuming, labor intensive, and highly complex. At this time, it is very important to design an automatic evaluation method for machine translation effect, and it is very important for practitioners to make a good evaluation scheme for machine translation. For example, whether new research techniques bring about good results is judged; the translation product indexes can bring deep impression to customers; how the user should weigh which translation tools are better, etc. In this today's fierce competitive market, research performance evaluation may constantly observe and help potential users learn translation characteristics, while establishing a good basis for machine translation evaluation.
Disclosure of Invention
In view of the above problems, the present application provides a method, an apparatus, a device and a storage medium for evaluating a machine translation effect, so as to solve the problem of time and labor consumption of manual evaluation. The specific scheme is as follows:
a machine translation effect evaluation method comprises the following steps:
acquiring a reference translation of a text to be translated and a machine translation of each clause contained in the text to be translated, wherein the machine translation of each clause is a translation result of a machine translation tool on the clause and a clause before the clause;
performing length constraint on the machine translation of each clause to obtain a constrained machine translation, wherein compared with the machine translation, the constrained machine translation deletes a plurality of translations at the tail end;
matching the constrained machine translation of each clause with the machine translation of the next adjacent clause to obtain the translation constraint loss of the machine translation of the text to be translated;
determining a machine translation score for the text to be translated based on the translation constraint loss and the reference translation.
Preferably, before the obtaining of the reference translation of the text to be translated and the machine translation of each clause included in the text to be translated, the method further includes:
and carrying out clause division on the text to be translated to obtain each clause contained in the text to be translated.
Preferably, the clause division of the text to be translated includes:
dividing the text to be translated into a plurality of clauses by taking punctuations contained in the text to be translated as clause dividing boundaries;
or the like, or, alternatively,
traversing the text to be translated, judging whether punctuation exists in the sentence before the sentence does not reach the set length, if so, dividing the text before the punctuation into a clause, and continuously traversing the text to be translated backwards, if not, dividing the traversed text with the set length into a clause when the set length is reached, and continuously traversing the text to be translated backwards.
Preferably, the length constraining the machine translation of each clause includes:
determining the length constraint quantity of each clause, wherein the length constraint quantity is used for indicating the translation length required to be deleted at the end of the machine translation when length constraint is carried out;
and length constraint is carried out on the machine translation of each clause based on the length constraint quantity of each clause.
Preferably, the determining the length constraint amount of each clause comprises:
and determining the length constraint quantity corresponding to the division mode according to the division mode of each clause to obtain the length constraint quantity of each clause.
Preferably, the first and second electrodes are formed of a metal,
if the clauses are obtained by dividing according to punctuations, the corresponding length constraint quantity is a first set length;
if the clauses are obtained by dividing after the traversal text reaches the set length, the corresponding length constraint quantity is a second set length, and the second set length is larger than the first set length.
Preferably, the length-constraining the machine translation of each clause based on the length-constrained amount of each clause includes:
deleting translations with a plurality of lengths at the tail of the machine translation of each clause based on the length constraint quantity of each clause, wherein the translations with the plurality of lengths are translation results of the participles with the length constraint quantity at the tail of the clause;
or the like, or, alternatively,
and deleting the word segmentation with the length constraint quantity at the tail end in the machine translation of each clause based on the length constraint quantity of each clause.
Preferably, the matching the machine translation after the constraint of each clause with the machine translation of the next adjacent clause to obtain the translation constraint loss of the machine translation of the text to be translated includes:
judging whether the constrained machine translation of each clause exists in the machine translation of the next adjacent clause; if yes, determining that the clause constraint is successful, otherwise, determining that the clause constraint is failed;
and determining the translation constraint loss of the machine translation of the text to be translated based on the constraint success and failure conditions of each clause in the text to be translated.
Preferably, the determining a machine translation score of the text to be translated based on the translation constraint loss and the reference translation includes:
determining a translation constraint loss coefficient of the machine translation of the text to be translated based on the translation constraint loss and the number of clauses contained in the reference translation;
matching the machine translation of the text to be translated with the reference translation to determine matching accuracy;
and punishing the matching accuracy by using the translation constraint loss coefficient to obtain a machine translation score.
Preferably, the matching the machine translation of the text to be translated with the reference translation to determine the matching accuracy includes:
and determining the matching accuracy of the n-gram grammar of the machine translation of the text to be translated and the reference translation.
A machine translation effect evaluating apparatus includes:
the translation acquiring unit is used for acquiring a reference translation of a text to be translated and a machine translation of each clause contained in the text to be translated, wherein the machine translation of each clause is a translation result of the clause and a previous clause by a machine translation tool;
the constraint processing unit is used for carrying out length constraint on the machine translation of each clause to obtain a constrained machine translation, and the constrained machine translation is compared with a plurality of translations at the tail of the machine translation and deleted;
a translation constraint loss determining unit, configured to match the constrained machine translation of each clause with the machine translation of the next adjacent clause, to obtain a translation constraint loss of the machine translation of the text to be translated;
and the scoring unit is used for determining the machine translation score of the text to be translated based on the translation constraint loss and the reference translation.
A machine translation effect evaluating apparatus includes: a memory and a processor;
the memory is used for storing programs;
the processor is used for executing the program to realize the steps of the machine translation effect evaluation method.
A storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the machine translation effect evaluation method as described above.
By the technical scheme, for each clause contained in a text to be translated, the clause and the clauses before the clause are input into a machine translation tool to obtain a translation result as a machine translation of the clause, and further, considering that the content of the subsequent clause cannot be referred to by the clause at the tail part of the clause during machine translation, and the translation result is possibly inaccurate, the method carries out length constraint on the machine translation of each clause, deletes a plurality of translations at the tail part of the constrained machine translation compared with the machine translation, namely deletes the partial machine translation with the possibly inaccurate tail translation result, and the machine translation of the next adjacent clause of the current clause can refer to the content of the subsequent clause during translation of the clause at the tail part of the current clause during translation, so that the translation result is more accurate and the context relationship is greatly improved, on the basis, the machine translation score of the text to be translated is determined based on the translation constraint loss and the reference translation, the more detailed continuity of sentence levels is considered for the machine translation score, proper punishment can be carried out on multiple translations and missing translations, and the accuracy and reliability of the machine translation score are greatly improved.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
fig. 1 is a schematic flow chart of a method for evaluating a machine translation effect according to an embodiment of the present application;
FIG. 2 illustrates a clause dividing manner of a text to be translated and a schematic diagram of a reference translation;
fig. 3 is a schematic structural diagram of a device for evaluating machine translation effects according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a machine translation effect evaluation apparatus provided in the embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The application provides a method for evaluating the machine translation effect, which can be suitable for evaluating the effect of translating a machine translation tool in various scenes. By taking the co-transmission translation as an example, in the co-transmission translation, the translation of each clause is in a gradually increasing form, the context relationship is very close, and by adopting the machine translation effect evaluation method, the translation effect of a machine translation tool in a co-transmission translation scene can be evaluated accurately and reliably.
The scheme can be realized based on a terminal with data processing capacity, and the terminal can be a mobile phone, a computer, a server, a cloud terminal and the like.
In order to evaluate the machine translation effect, the applicant firstly considers similarity matching based on the machine translation and the reference translation and determines the matching accuracy. Meanwhile, when the length of the machine translation is smaller than that of the reference translation, the translation result is good possibly along with the shortening of the sentence length, so that the situation of high matching accuracy rate exists when the translation is incomplete, and therefore punishment is carried out on the length of the machine translation when the machine translation is smaller than the reference translation, namely, the punishment is carried out on the matching accuracy rate by using a length punishment coefficient, and further the final score of the machine translation is obtained.
Through further analysis, the similarity between the machine translation and the reference translation is directly calculated in the mode, the correlation between the clauses is ignored, when the sentence is too long, one sentence cannot be avoided, important words are omitted, unnecessary words are omitted, and the equality of each word cannot be guaranteed. Moreover, the matching accuracy only concerns the accuracy of the translation at the chapter level, the length penalty is only the penalty performed by considering the whole length of the translation, and the context coherence of the whole translation is not considered, so that the sentence level evaluation performance is insufficient.
Therefore, the applicant further provides the following scheme to overcome the problems and realize more accurate and reliable evaluation of the machine translation effect.
Next, as described with reference to fig. 1, the method for evaluating machine translation effect of the present application may include the following steps:
step S100, obtaining a reference translation of the text to be translated and a machine translation of each clause contained in the text to be translated.
Specifically, the reference translation of the text to be translated may be understood as a standard translation result, which may be a result of a professional manual translation.
The text to be translated may include a plurality of clauses, and in this step, the machine translation of each clause is obtained in units of clauses. The machine translation of each clause is the translation result of the clause and the clauses before the clause by the machine translation tool. That is, for the target clause, the clause before the target clause in the text to be translated and the target clause are input into the machine translation tool together to obtain a machine translation result, and the machine translation result is used as the machine translation of the target clause, that is, the machine translation of the target clause carries the translation result of the clause before the target clause, and the previous clause information of the target clause is also taken into consideration during translation of the target clause, so that translation of the target clause is more accurate.
It can be understood that the text to be translated may be a complete sentence, and the machine translation of each target clause may be obtained by inputting each clause preceding the target clause in the text to be translated and the target clause into a machine translation tool, and using the translation result as the machine translation of the target clause. In addition, the text to be translated may be a text paragraph or chapter composed of a plurality of complete sentences, wherein each complete sentence may include a plurality of clauses. Then, for the machine translation of each target clause, the clauses before the target clause in the complete sentence in which the target clause is located and the target clause may be input into a machine translation tool, and the translation result obtained is used as the machine translation of the target clause. Alternatively, all clauses before the target clause in the text to be translated and the target clause may be input into a machine translation tool, and the obtained translation result is used as a machine translation of the target clause.
And step S110, performing length constraint on the machine translation of each clause to obtain a machine translation after constraint.
Wherein the post-constraint machine translation deletes a number of end translations compared to the machine translation. Considering that when a clause is translated by a machine, the content of a subsequent clause cannot be referred to by clauses at the tail part of the clause during machine translation, and the translation result may be inaccurate, for this reason, the method performs length constraint on the machine translation of each clause, and compared with the machine translation, the machine translation after constraint deletes a plurality of translations at the tail part, that is, deletes the translation at the tail part of the machine translation, which may be inaccurate, and the obtained machine translation after constraint is a relatively more accurate translation result.
And when the next adjacent clause of each target clause is translated by a machine, the target clause and the next adjacent clause are input into a machine translation tool together, so that the participle at the tail part of the target clause is carried, and the next adjacent clause is translated again by referring to the content of the next adjacent clause, so that the continuity of the upper and lower clauses is ensured, and the translation result is more accurate.
And step S120, matching the constrained machine translation of each clause with the machine translation of the next adjacent clause to obtain the translation constraint loss of the machine translation of the text to be translated.
Specifically, after each clause is subjected to constrained machine translation, the constrained machine translation of the clause can be matched with the machine translation of the next adjacent clause, because the machine translation of the next adjacent clause contains a translation result of word segmentation at the tail part of the previous clause, the continuity between the constrained machine translation of the previous clause and the machine translation of the next adjacent clause can be judged through matching, the translation constraint loss of the finally obtained machine translation of the text to be translated can be measured, and the expression capacity of the context continuity of the whole text to be translated can be measured when the machine translation tool translates the text to be translated.
And S130, determining a machine translation score of the text to be translated based on the translation constraint loss and the reference translation.
Specifically, after the translation constraint loss of the machine translation of the text to be translated is obtained, the machine translation score of the text to be translated can be determined by combining the reference translation, the machine translation score considers the continuity of more detailed sentence levels, proper punishment can be carried out on multiple translations and missing translations, and the accuracy and reliability of the machine translation score are greatly improved.
The method for evaluating the machine translation effect provided by the embodiment of the application inputs the clause and the clauses before the clause into a machine translation tool to obtain a translation result as a machine translation of the clause for each clause contained in a text to be translated, and further considers that the content of the subsequent clause cannot be referred to by the clause at the tail part of the clause during machine translation, so that the translation result may be inaccurate, therefore, the method carries out length constraint on the machine translation of each clause, the machine translation after constraint deletes a plurality of translations at the tail compared with the machine translation, namely deletes the machine translation at the tail part, which may be inaccurate, and the machine translation of the next adjacent clause of the current clause can refer to the content of the subsequent clause during translation of the clause at the tail part of the current clause during translation, so that the translation result is more accurate and the context relationship is greatly improved, on the basis, the machine translation score of the text to be translated is determined based on the translation constraint loss and the reference translation, the more detailed continuity of sentence levels is considered for the machine translation score, proper punishment can be carried out on multiple translations and missing translations, and the accuracy and reliability of the machine translation score are greatly improved.
In some embodiments of the present application, before the step S100 is executed to obtain the reference translation of the text to be translated and the machine translation of each clause, the present application may further include an operation of performing clause division on the text to be translated, and each clause included in the text to be translated may be obtained by performing clause division on the text to be translated.
The method for clause division of the text to be translated may include multiple methods, for example:
the first clause division mode:
punctuations contained in the text to be translated are used as clause dividing boundaries, and the text to be translated is divided into a plurality of clauses.
Specifically, the text to be translated contains punctuations, such as commas, sentences, question marks, and the like, and the text to be translated can be divided at the punctuations based on the punctuations as clause dividing boundaries, so that a plurality of clauses are obtained.
The second clause division mode:
the method comprises the steps of traversing the text to be translated from the first word segmentation of the text to be translated, judging whether punctuation exists in the sentence before the sentence does not reach the set length, if so, dividing the punctuation into a clause before the sentence exists, and continuously traversing the text to be translated backwards, if not, dividing the traversed text with the set length into a clause when the sentence reaches the set length, and continuously traversing the text to be translated backwards.
That is, length K may be preset in the present application, and if K is 8 segmentations, if a punctuation exists before the traversed text to be translated reaches K segmentations, clauses may be divided by using the punctuation as a clause division boundary, and if the punctuation does not exist yet when the K segmentations are reached, the traversed K segmentations may be divided into one clause and continuously traversed backward.
Of course, the above only illustrates two optional clause dividing manners, and those skilled in the art may also adopt other clause dividing manners as needed to divide the clause of the text to be translated, which is not strictly limited in this application.
In some embodiments of the present application, regarding step S110, length constraint is performed on the machine translation of each clause, and a process of obtaining a machine translation after constraint is described.
It will be appreciated that the length constraint process is applied to the machine translation of each clause, i.e., the process of deleting a number of lengths of translation at the end of the machine translation of the clause. Based on this, in the embodiment of the present application, the length constraint amount of each clause may be determined first, and then based on the length constraint amount, the length constraint is performed on the machine translation of the clause, which includes the following specific processes:
and S1, determining the length constraint quantity of each clause.
And the length constraint quantity is used for indicating the translation length required to be deleted at the end of the machine translation when the length constraint is carried out.
When the length constraint quantity of each clause is determined, the length constraint quantity corresponding to the division mode can be determined by combining the division mode of each clause, and then the length constraint quantity of each clause is obtained.
That is, the present application may set the corresponding length constraint amount in advance for different clause division manners. Two clause dividing manners exemplified in the above embodiments are taken as examples for explanation:
for the first clause dividing manner, that is, the clauses are obtained by dividing according to punctuation, the corresponding length constraint quantity can be set as a first set length.
For the second clause dividing manner, that is, the clauses are divided after the traversal text reaches the set length, the corresponding length constraint quantity can be set to be the second set length.
Since the second clause dividing manner is obtained according to the maximum length division of the clauses tolerable by the application, when length constraint is performed on the clauses of the type, the length constraint quantity of the clauses can be larger than the length constraint quantity of the clauses corresponding to the first clause dividing manner, that is, in the embodiment of the application, the second set length can be set to be larger than the first set length.
For example, the first set length may be 3 tokens, and the second set length may be twice the first set length or some other value, such as 6 tokens.
And S2, based on the length constraint quantity of each clause, carrying out length constraint on the machine translation of each clause.
Based on the above description, the length constraint quantity is used to indicate the length of the translation that needs to be deleted at the end of the machine translation when the length constraint is performed. Based on this, two different alternative ways of length-constraining the machine translation of the clause are provided in this embodiment, as follows:
a first kind,
The length constraint quantity specifies that the translation results of the length constraint quantity word segmentation at the tail of the clause need to be deleted. Then, the specific length constraint procedure may be:
and deleting translation with a plurality of lengths at the tail of the machine translation of each clause based on the length constraint quantity of each clause, wherein the translation with the plurality of lengths is the translation result of the participles with the length constraint quantity at the tail of the clause.
That is, the length of the last translation of the machine translation to be deleted may be a non-fixed value, and the length of the last translation of the machine translation to be deleted corresponds to the translation result of the length-constrained number of the participles at the end of the clause.
For example, if the length constraint amount is 3, in this embodiment, a translation with a certain length at the end of the machine translation of the clause may be deleted, and the deleted translation corresponds to the machine translation results of 3 participles at the end of the clause.
For example as follows:
the clauses are: when the situation is somehow unpaired. The clauses contain word segmentation results as follows: at/case/all/none/time.
The machine translation corresponding to the clause is: when something is wrong.
Assuming that the length constraint quantity is 3, the analysis may determine that the translation at the end of the machine translation corresponds to the following 3 participles in the clause: case/all/not, therefore, in a clause: and deleting the machine translation corresponding to the three participles under/without the condition/all, namely deleting: something is wrong, and the resulting constrained machine translation is: when (1).
A second kind,
The length constraint quantity specifies that the length constraint quantity of participles need to be deleted at the end of the machine translation of the clause. Then, the specific length constraint procedure may be:
and deleting the word segmentation with the length constraint quantity at the tail end in the machine translation of each clause based on the length constraint quantity of each clause.
That is, the length of the last translation of the machine translation that needs to be deleted finally is a fixed value, i.e., the length constraint.
For example, if the length constraint amount is 3, in this embodiment, the last 3 participles of the machine translation of the clause may be deleted.
The above example is still used for illustration:
the last 3 participles in the machine translation can be deleted, namely deletion: something is wrong, and the resulting constrained machine translation is: when (1).
In the embodiment of the application, the process of clause division and clause length constraint of the text to be translated is introduced through the following specific example.
Referring to fig. 2, fig. 2 illustrates a clause dividing manner of a text to be translated and a reference translation.
It is assumed that the clause division of the text to be translated is performed according to the second division mode, and the length is set to be 8 participles.
Analyzing the text to be translated according to the second division mode, obtaining the division result shown in fig. 2, and separating different clauses in fig. 2 by using vertical lines, so that the number of the participles between each punctuation mark in the text to be translated does not exceed 8, and thus, the text to be translated can be divided into clauses 1-4 according to the punctuation marks.
Regarding each clause, the clause and the clauses before the clause are used as the input of a machine translation tool, the obtained translation result is used as the translation result of the clause, and the length constraint is performed on the translation result, assuming that the length constraint quantity is 3, the corresponding length constraint process is to delete the machine translations corresponding to 3 participles at the tail of the clause, and the machine translation translations of each clause and the machine translations after constraint are as shown in the following table 1:
Figure BDA0002682334040000121
TABLE 1
In some embodiments of the present application, a process of matching, in the step S120, the constrained machine translation of each clause with the machine translation of the next adjacent clause to obtain a translation constraint loss of the machine translation of the text to be translated is further described.
The constrained machine translation of the clause is matched with the machine translation of the next adjacent clause, and because the machine translation of the next adjacent clause contains the translation result of the word segmentation at the tail part of the previous clause, the continuity between the constrained machine translation of the previous clause and the machine translation of the next adjacent clause can be judged through matching, the translation constraint loss of the finally obtained machine translation of the text to be translated can be measured, and the expression capacity of the context continuity of the whole text to be translated can be measured when the machine translation tool translates the text to be translated. The specific matching process may include:
judging whether the constrained machine translation of each clause exists in the machine translation of the next adjacent clause; if yes, determining that the clause constraint is successful, otherwise, determining that the clause constraint is failed.
And determining the translation constraint loss of the machine translation of the text to be translated based on the constraint success and failure conditions of each clause in the text to be translated.
Specifically, the number of clauses successfully constrained can be used as the translation constraint loss of the machine translation of the text to be translated.
The following description will be given by taking the example shown in table 1 as an example:
the machine translation after the constraint of clause 1 is in the machine translation of clause 2, the machine translation after the constraint of clause 2 is not in the machine translation of clause 3, and the machine translation after the constraint of clause 3 is in the machine translation of clause 4, so that the number of clauses successfully constrained is 2, and the translation constraint loss of the machine translation of the text to be translated can be set to be 2.
On this basis, the embodiment of the present application further introduces the process of determining the machine translation score of the text to be translated based on the translation constraint loss and the reference translation in the aforementioned step S130. The embodiment of the application provides an optional implementation mode, which comprises the following steps:
and S1, determining a translation constraint loss coefficient of the machine translation of the text to be translated based on the translation constraint loss and the number of clauses contained in the reference translation.
In particular, the foregoing steps have identified a translation constraint penalty for machine translation of the text to be translated. Further, for the reference translation, it may be default that each clause included in the reference translation is constrained to be successful, that is, the number of clauses included in the reference translation may be used as a translation constraint loss of the reference translation. Further, a translation constraint loss coefficient of the machine translation of the text to be translated may be determined based on the translation constraint loss of the machine translation and the translation constraint loss of the reference translation. Wherein the translation constraint loss coefficient increases with an increase in translation constraint loss for the machine translation of the text to be translated.
Optionally, in consideration of the constraint translation loss of the reference translation, the higher the number of times of the constraint translation of the machine translation is successful, the higher the accuracy thereof, so that the translation constraint loss coefficient may be set to an exponential variation of the ratio of the constraint translation loss of the machine translation to the constraint translation loss of the reference translation, as shown in the following formula:
Figure BDA0002682334040000141
where CF is the constraint translation loss coefficient, lcConstrained translation loss for machine translation,/rA constraint translation loss for a reference translation.
Take the case exemplified in Table 1 above as an example, where lcIs equal to 2, lrEqual to 3, then CF is:
Figure BDA0002682334040000142
and S2, matching the machine translation of the text to be translated with the reference translation, and determining the matching accuracy.
Specifically, the matching accuracy of the machine translation of the text to be translated and the n-gram of the reference translation can be determined.
The calculation process of the n-gram matching accuracy may include the following:
1) and calculating the maximum matching times of the n-gram in the reference translation.
The matching rule for n-gram can be modified according to the following formula in some cases, such as the machine translation of the text to be translated has repeated words to cause inaccurate matching:
Countclip=min(COUNT,Max_Ref_Count)
wherein, CountclipThe maximum number of matches that the n-gram appears in the reference translation, COUNT refers to the number of occurrences of the n-gram grammar in the machine translation, and Max _ Ref _ Count refers to the maximum number of occurrences of the n-gram grammar in the reference translation.
By utilizing the formula, traversal matching calculation can be carried out on the grams from 1 to n, the matching principle is that one proportion of n groups of words between the machine translation and the reference translation is compared, from the practical embodiment, the 1-gram represents the height of a single word which is translated, the loyalty of the translation is mainly represented, the fluency can be represented as n is larger, the readability is better as the value is higher, and the value n can be selected to be 4 in consideration of the weight of the loyalty and the fluency of the translation, namely 4-gram is selected for evaluation.
And step 2) obtaining the matching accuracy of the n-gram according to the maximum matching times of the n-gram in the reference translation.
The matching accuracy of the n-gram can be calculated as follows:
Figure BDA0002682334040000151
wherein, PnMatching accuracy of n-gram, C machine translation, Countclip(n-gram) is the maximum number of matches that the n-gram has occurred in the reference translation, and COUNT (n-gram) is the number of times the n-gram has occurred in the machine translation.
And S3, punishing the matching accuracy by using the translation constraint loss coefficient to obtain a machine translation score.
Specifically, the previous step is based on the matching accuracy calculation of the n-gram, and the consistency of missed translations, multiple translations and clauses is not considered. If a sentence which is not translated is found, the translated words are reliable and accurate, high matching accuracy can be calculated according to the calculation, and whether each clause is related cannot be judged, so that the matching accuracy of the n-gram is not enough to be considered.
In this step, punishment is carried out on the matching accuracy by using the obtained constraint translation loss coefficient to obtain a machine translation score:
Figure BDA0002682334040000152
wherein, the Score is the Score of the machine translation, the value is between 0 and 1, the larger the value is, the better the translation effect of the translation is expressed, namely, the higher the translation quality is, N is the maximum order of the N-element grammar, w isnIs a weight coefficient, wn=1/n。
In the embodiment, the matching accuracy is punished by using the constraint translation loss coefficient, and the constraint translation loss coefficient is a result of counting the translation constraint success conditions of each clause in the text to be translated, so that the constraint translation loss coefficient can punish more finely from the sentence level, punish both missing translation, multi-translation and consistency among clauses, and finally obtain more accurate and reliable scores.
The following describes the machine translation effect evaluation device provided in the embodiment of the present application, and the machine translation effect evaluation device described below and the machine translation effect evaluation method described above may be referred to in correspondence with each other.
Referring to fig. 3, fig. 3 is a schematic structural diagram of a device for evaluating a machine translation effect disclosed in the embodiment of the present application.
As shown in fig. 3, the apparatus may include:
a translation obtaining unit 11, configured to obtain a reference translation of a text to be translated and a machine translation of each clause included in the text to be translated, where the machine translation of each clause is a translation result of the clause and a previous clause by a machine translation tool;
a constraint processing unit 12, configured to perform length constraint on the machine translation of each clause to obtain a constrained machine translation, where the constrained machine translation deletes a plurality of translations at the end compared to the machine translation;
a translation constraint loss determining unit 13, configured to match the constrained machine translation of each clause with the machine translation of the next adjacent clause, to obtain a translation constraint loss of the machine translation of the text to be translated;
and the scoring unit 14 is used for determining the machine translation score of the text to be translated based on the translation constraint loss and the reference translation.
Optionally, the apparatus of the present application may further include:
and the clause dividing unit is used for carrying out clause division on the text to be translated before the translation obtaining unit is executed to obtain each clause contained in the text to be translated.
Optionally, the process of clause division of the text to be translated by the clause dividing unit may include:
dividing the text to be translated into a plurality of clauses by taking punctuations contained in the text to be translated as clause dividing boundaries;
or the like, or, alternatively,
traversing the text to be translated, judging whether punctuation exists in the sentence before the sentence does not reach the set length, if so, dividing the text before the punctuation into a clause, and continuously traversing the text to be translated backwards, if not, dividing the traversed text with the set length into a clause when the set length is reached, and continuously traversing the text to be translated backwards.
Optionally, the process of performing length constraint on the machine translation of each clause by the constraint processing unit may include:
determining the length constraint quantity of each clause, wherein the length constraint quantity is used for indicating the translation length required to be deleted at the end of the machine translation when length constraint is carried out;
and length constraint is carried out on the machine translation of each clause based on the length constraint quantity of each clause.
Optionally, the process of determining the length constraint quantity of each clause by the constraint processing unit may include:
and determining the length constraint quantity corresponding to the division mode according to the division mode of each clause to obtain the length constraint quantity of each clause.
Optionally, if the clauses are obtained by dividing according to punctuations, the corresponding length constraint quantity may be a first set length;
if the clauses are obtained by dividing after the traversal text reaches the set length, the corresponding length constraint quantity can be a second set length, and the second set length is larger than the first set length.
Optionally, the constraint processing unit may perform a length constraint process on the machine translation of each clause based on the length constraint amount of each clause, where the length constraint process includes:
deleting translations with a plurality of lengths at the tail of the machine translation of each clause based on the length constraint quantity of each clause, wherein the translations with the plurality of lengths are translation results of the participles with the length constraint quantity at the tail of the clause;
or the like, or, alternatively,
and deleting the word segmentation with the length constraint quantity at the tail end in the machine translation of each clause based on the length constraint quantity of each clause.
Optionally, the process of matching the constrained machine translation of each clause with the machine translation of the next adjacent clause by the translation constraint loss determining unit to obtain the translation constraint loss of the machine translation of the text to be translated may include:
judging whether the constrained machine translation of each clause exists in the machine translation of the next adjacent clause; if yes, determining that the clause constraint is successful, otherwise, determining that the clause constraint is failed;
and determining the translation constraint loss of the machine translation of the text to be translated based on the constraint success and failure conditions of each clause in the text to be translated.
Optionally, the process of determining, by the scoring unit, the machine translation score of the text to be translated based on the translation constraint loss and the reference translation may include:
determining a translation constraint loss coefficient of the machine translation of the text to be translated based on the translation constraint loss and the number of clauses contained in the reference translation;
matching the machine translation of the text to be translated with the reference translation to determine matching accuracy;
and punishing the matching accuracy by using the translation constraint loss coefficient to obtain a machine translation score.
Optionally, the process of matching the machine translation of the text to be translated with the reference translation by the scoring unit to determine the matching accuracy may include:
and determining the matching accuracy of the n-gram grammar of the machine translation of the text to be translated and the reference translation.
The machine translation effect evaluating device provided by the embodiment of the application can be applied to machine translation effect evaluating equipment, such as a terminal: mobile phones, computers, etc. Optionally, fig. 4 is a block diagram illustrating a hardware structure of a machine translation effect evaluation apparatus, and referring to fig. 4, the hardware structure of the machine translation effect evaluation apparatus may include: at least one processor 1, at least one communication interface 2, at least one memory 3 and at least one communication bus 4;
in the embodiment of the application, the number of the processor 1, the communication interface 2, the memory 3 and the communication bus 4 is at least one, and the processor 1, the communication interface 2 and the memory 3 complete mutual communication through the communication bus 4;
the processor 1 may be a central processing unit CPU, or an application Specific Integrated circuit asic, or one or more Integrated circuits configured to implement embodiments of the present invention, etc.;
the memory 3 may include a high-speed RAM memory, and may further include a non-volatile memory (non-volatile memory) or the like, such as at least one disk memory;
wherein the memory stores a program and the processor can call the program stored in the memory, the program for:
acquiring a reference translation of a text to be translated and a machine translation of each clause contained in the text to be translated, wherein the machine translation of each clause is a translation result of a machine translation tool on the clause and a clause before the clause;
performing length constraint on the machine translation of each clause to obtain a constrained machine translation, wherein compared with the machine translation, the constrained machine translation deletes a plurality of translations at the tail end;
matching the constrained machine translation of each clause with the machine translation of the next adjacent clause to obtain the translation constraint loss of the machine translation of the text to be translated;
determining a machine translation score for the text to be translated based on the translation constraint loss and the reference translation.
Alternatively, the detailed function and the extended function of the program may be as described above.
Embodiments of the present application further provide a storage medium, where a program suitable for execution by a processor may be stored, where the program is configured to:
acquiring a reference translation of a text to be translated and a machine translation of each clause contained in the text to be translated, wherein the machine translation of each clause is a translation result of a machine translation tool on the clause and a clause before the clause;
performing length constraint on the machine translation of each clause to obtain a constrained machine translation, wherein compared with the machine translation, the constrained machine translation deletes a plurality of translations at the tail end;
matching the constrained machine translation of each clause with the machine translation of the next adjacent clause to obtain the translation constraint loss of the machine translation of the text to be translated;
determining a machine translation score for the text to be translated based on the translation constraint loss and the reference translation.
Alternatively, the detailed function and the extended function of the program may be as described above.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, the embodiments may be combined as needed, and the same and similar parts may be referred to each other.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (13)

1. A method for evaluating machine translation effect is characterized by comprising the following steps:
acquiring a reference translation of a text to be translated and a machine translation of each clause contained in the text to be translated, wherein the machine translation of each clause is a translation result of a machine translation tool on the clause and a clause before the clause;
performing length constraint on the machine translation of each clause to obtain a constrained machine translation, wherein compared with the machine translation, the constrained machine translation deletes a plurality of translations at the tail end;
matching the constrained machine translation of each clause with the machine translation of the next adjacent clause to obtain the translation constraint loss of the machine translation of the text to be translated;
determining a machine translation score for the text to be translated based on the translation constraint loss and the reference translation.
2. The method of claim 1, wherein before the obtaining of the reference translation of the text to be translated and the machine translation of each clause included in the text to be translated, the method further comprises:
and carrying out clause division on the text to be translated to obtain each clause contained in the text to be translated.
3. The method of claim 2, wherein the clause division of the text to be translated comprises:
dividing the text to be translated into a plurality of clauses by taking punctuations contained in the text to be translated as clause dividing boundaries;
or the like, or, alternatively,
traversing the text to be translated, judging whether punctuation exists in the sentence before the sentence does not reach the set length, if so, dividing the text before the punctuation into a clause, and continuously traversing the text to be translated backwards, if not, dividing the traversed text with the set length into a clause when the set length is reached, and continuously traversing the text to be translated backwards.
4. The method of claim 1, wherein said length-constraining the machine translation of each clause comprises:
determining the length constraint quantity of each clause, wherein the length constraint quantity is used for indicating the translation length required to be deleted at the end of the machine translation when length constraint is carried out;
and length constraint is carried out on the machine translation of each clause based on the length constraint quantity of each clause.
5. The method of claim 4, wherein determining the length constraint for each clause comprises:
and determining the length constraint quantity corresponding to the division mode according to the division mode of each clause to obtain the length constraint quantity of each clause.
6. The method of claim 5,
if the clauses are obtained by dividing according to punctuations, the corresponding length constraint quantity is a first set length;
if the clauses are obtained by dividing after the traversal text reaches the set length, the corresponding length constraint quantity is a second set length, and the second set length is larger than the first set length.
7. The method of claim 4, wherein the length-constraining the machine translation of each clause based on the amount of length constraint for each clause comprises:
deleting translations with a plurality of lengths at the tail of the machine translation of each clause based on the length constraint quantity of each clause, wherein the translations with the plurality of lengths are translation results of the participles with the length constraint quantity at the tail of the clause;
or the like, or, alternatively,
and deleting the word segmentation with the length constraint quantity at the tail end in the machine translation of each clause based on the length constraint quantity of each clause.
8. The method of claim 1, wherein matching the constrained machine translation of each clause with the machine translation of the next adjacent clause to obtain a translation constraint loss of the machine translation of the text to be translated comprises:
judging whether the constrained machine translation of each clause exists in the machine translation of the next adjacent clause; if yes, determining that the clause constraint is successful, otherwise, determining that the clause constraint is failed;
and determining the translation constraint loss of the machine translation of the text to be translated based on the constraint success and failure conditions of each clause in the text to be translated.
9. The method of claim 1, wherein determining a machine translation score for the text to be translated based on the translation constraint loss and the reference translation comprises:
determining a translation constraint loss coefficient of the machine translation of the text to be translated based on the translation constraint loss and the number of clauses contained in the reference translation;
matching the machine translation of the text to be translated with the reference translation to determine matching accuracy;
and punishing the matching accuracy by using the translation constraint loss coefficient to obtain a machine translation score.
10. The method of claim 9, wherein matching the machine translation of the text to be translated with the reference translation to determine a matching accuracy comprises:
and determining the matching accuracy of the n-gram grammar of the machine translation of the text to be translated and the reference translation.
11. A machine translation effect evaluating apparatus, comprising:
the translation acquiring unit is used for acquiring a reference translation of a text to be translated and a machine translation of each clause contained in the text to be translated, wherein the machine translation of each clause is a translation result of the clause and a previous clause by a machine translation tool;
the constraint processing unit is used for carrying out length constraint on the machine translation of each clause to obtain a constrained machine translation, and the constrained machine translation is compared with a plurality of translations at the tail of the machine translation and deleted;
a translation constraint loss determining unit, configured to match the constrained machine translation of each clause with the machine translation of the next adjacent clause, to obtain a translation constraint loss of the machine translation of the text to be translated;
and the scoring unit is used for determining the machine translation score of the text to be translated based on the translation constraint loss and the reference translation.
12. A machine translation effect evaluating apparatus, comprising: a memory and a processor;
the memory is used for storing programs;
the processor is used for executing the program to realize the steps of the machine translation effect evaluation method according to any one of claims 1 to 10.
13. A storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of the method for evaluating machine translation effects according to any of claims 1-10.
CN202010965988.0A 2020-09-15 2020-09-15 Machine translation effect evaluation method, device, equipment and storage medium Active CN112052693B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010965988.0A CN112052693B (en) 2020-09-15 2020-09-15 Machine translation effect evaluation method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010965988.0A CN112052693B (en) 2020-09-15 2020-09-15 Machine translation effect evaluation method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112052693A true CN112052693A (en) 2020-12-08
CN112052693B CN112052693B (en) 2024-07-05

Family

ID=73602793

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010965988.0A Active CN112052693B (en) 2020-09-15 2020-09-15 Machine translation effect evaluation method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112052693B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1641631A (en) * 2004-01-13 2005-07-20 中国科学院计算技术研究所 Machine translation automatic evaluating method and system thereof
JP2008084191A (en) * 2006-09-28 2008-04-10 Toshiba Corp Mechanical translation device, mechanical translation method and mechanical translation program
CN104133812A (en) * 2014-07-17 2014-11-05 北京信息科技大学 User-query-intention-oriented Chinese sentence similarity hierarchical calculation method and user-query-intention-oriented Chinese sentence similarity hierarchical calculation device
CN107133223A (en) * 2017-04-20 2017-09-05 南京大学 A kind of machine translation optimization method for exploring more reference translation information automatically
CN107818082A (en) * 2017-09-25 2018-03-20 沈阳航空航天大学 With reference to the semantic role recognition methods of phrase structure tree
CN109344408A (en) * 2018-08-24 2019-02-15 腾讯科技(深圳)有限公司 A kind of translation detection method, device and electronic equipment
CN109977424A (en) * 2017-12-27 2019-07-05 北京搜狗科技发展有限公司 A kind of training method and device of Machine Translation Model
CN110472256A (en) * 2019-08-20 2019-11-19 南京题麦壳斯信息科技有限公司 A kind of MT engine assessment preferred method and system based on chapter
CN111126078A (en) * 2019-12-19 2020-05-08 北京百度网讯科技有限公司 Translation method and device
CN111597778A (en) * 2020-04-15 2020-08-28 哈尔滨工业大学 Method and system for automatically optimizing machine translation based on self-supervision
CN111611811A (en) * 2020-05-25 2020-09-01 腾讯科技(深圳)有限公司 Translation method, translation device, electronic equipment and computer readable storage medium

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1641631A (en) * 2004-01-13 2005-07-20 中国科学院计算技术研究所 Machine translation automatic evaluating method and system thereof
JP2008084191A (en) * 2006-09-28 2008-04-10 Toshiba Corp Mechanical translation device, mechanical translation method and mechanical translation program
CN104133812A (en) * 2014-07-17 2014-11-05 北京信息科技大学 User-query-intention-oriented Chinese sentence similarity hierarchical calculation method and user-query-intention-oriented Chinese sentence similarity hierarchical calculation device
CN107133223A (en) * 2017-04-20 2017-09-05 南京大学 A kind of machine translation optimization method for exploring more reference translation information automatically
CN107818082A (en) * 2017-09-25 2018-03-20 沈阳航空航天大学 With reference to the semantic role recognition methods of phrase structure tree
CN109977424A (en) * 2017-12-27 2019-07-05 北京搜狗科技发展有限公司 A kind of training method and device of Machine Translation Model
CN109344408A (en) * 2018-08-24 2019-02-15 腾讯科技(深圳)有限公司 A kind of translation detection method, device and electronic equipment
CN110472256A (en) * 2019-08-20 2019-11-19 南京题麦壳斯信息科技有限公司 A kind of MT engine assessment preferred method and system based on chapter
CN111126078A (en) * 2019-12-19 2020-05-08 北京百度网讯科技有限公司 Translation method and device
CN111597778A (en) * 2020-04-15 2020-08-28 哈尔滨工业大学 Method and system for automatically optimizing machine translation based on self-supervision
CN111611811A (en) * 2020-05-25 2020-09-01 腾讯科技(深圳)有限公司 Translation method, translation device, electronic equipment and computer readable storage medium

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
SELJAN S ET AL: "Huamn Evaluation of Online Machine Translation Services for English/Russian-Croatian", NEW CONTRIBUTIONS IN INFORMATIONS SYSTEMS AND TECHNOLOGIES》 *
孙连恒, 杨莹, 姚天顺: "OpenE:一种基于n-gram共现的自动机器翻译评测方法", 中文信息学报, no. 02 *
宋鼎新;黄德根;: "一种融合句法短语的汉英统计机器翻译方法", 小型微型计算机系统, no. 10, 15 October 2017 (2017-10-15) *
李茂西 等: "机器译文自动评价中基于IHMM的近义词匹配方法研究", 《中文信息学报》, vol. 30, no. 4 *
胡永华: "英文译文质量自动评测技术的研究", 《中国优秀硕士学位论文全文数据库电子期刊 信息科技辑》, vol. 2010, no. 8 *

Also Published As

Publication number Publication date
CN112052693B (en) 2024-07-05

Similar Documents

Publication Publication Date Title
JP4301515B2 (en) Text display method, information processing apparatus, information processing system, and program
CN107391486B (en) Method for identifying new words in field based on statistical information and sequence labels
EP3866028A2 (en) Method and apparatus for constructing quality evaluation model, device and storage medium
CN111079412A (en) Text error correction method and device
CN111563384B (en) Evaluation object identification method and device for E-commerce products and storage medium
CN109033244B (en) Search result ordering method and device
CN108304377B (en) Extraction method of long-tail words and related device
CN110334356A (en) Article matter method for determination of amount, article screening technique and corresponding device
US9881000B1 (en) Avoiding sentiment model overfitting in a machine language model
WO2019201024A1 (en) Method, apparatus and device for updating model parameter, and storage medium
CN113204979B (en) Model training method and device, electronic equipment and storage medium
CN108536702B (en) Method and device for determining related entities and computing equipment
CN111737961B (en) Method and device for generating story, computer equipment and medium
CN113033204A (en) Information entity extraction method and device, electronic equipment and storage medium
CN105243053B (en) Extract the method and device of document critical sentence
JP7040155B2 (en) Information processing equipment, information processing methods and programs
CN111950267B (en) Text triplet extraction method and device, electronic equipment and storage medium
CN110909532B (en) User name matching method and device, computer equipment and storage medium
CN112052693A (en) Method, device and equipment for evaluating machine translation effect and storage medium
Tschuggnall et al. Reduce & attribute: Two-step authorship attribution for large-scale problems
US20160283446A1 (en) Input assistance device, input assistance method and storage medium
CN113378555B (en) Intelligent association method of individual strands and related products
CN115511672A (en) Method for evaluating mental calculation ability of children
CN115048908A (en) Method and device for generating text directory
CN110827794B (en) Method and device for evaluating quality of voice recognition intermediate result

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant