CN109190129A

CN109190129A - A kind of multilingual translation quality evaluation engine based near synonym knowledge mapping

Info

Publication number: CN109190129A
Application number: CN201810997778.2A
Authority: CN
Inventors: 何恩培; 李靖
Original assignee: Expressive Language Networking Polytron Technologies Inc
Current assignee: Expressive Language Networking Polytron Technologies Inc
Priority date: 2018-08-31
Filing date: 2018-08-31
Publication date: 2019-01-11

Abstract

The present invention proposes a kind of multilingual translation QA system based near synonym knowledge mapping, and the evaluation system includes source languages input module, analytic unit, evaluation component and knowledge library component；The knowledge library component includes the near synonym knowledge mapping knowledge base that sustainable training updates；Analysis result is inputted the evaluation component, to obtain the evaluation score of tester based on the test result that the knowledge base block analysis tester provides for the corpus to be translated of source languages input module input by the analytic unit.The invention also includes corresponding evaluation methods.

Description

A kind of multilingual translation quality evaluation engine based near synonym knowledge mapping

Technical field

The present invention relates to translation quality evaluations, more particularly to a kind of turned over based on the multilingual of near synonym knowledge mapping Translate quality evaluating method and system.

Background technique

Currently, the written translation ability of a people is evaluated, marking evaluation mainly is carried out to it by various test modes.Example Such as, the examination methods of marking combined using subjective item and objective item.For objective item, candidate only needs to select just True candidate item can score, this partial evaluation work can be completed by machine statistical test；For subjective item, by It is different in the translation that different translators provides same entry to be translated, and the Key for Reference that the person that usually do not set a question provides has Limit easily generates erroneous judgement if counted using simple machine, as it is likely that there is each not phase that different translators provide With translation result be possible to be all different with Key for Reference, but the case where be correct option.

At this point, usually introducing the means manually marked examination papers.But when the enormous amount of paper, if every portion paper is adopted Manually mark examination papers, then will be obviously improved test job amount, increases testing cost.

And in actual scene, when interpreter translates and examines interpreter, can all there be the translation matter to interpreter The process evaluated is measured, this process is also all to have corresponding GUIDED TRANSLATION teacher to carry out, unusual labor intensive, while again Very strong subjectivity.

Summary of the invention

To solve the problems, such as that existing translation ability assessment and translation quality assessment exist, the invention proposes based on close The translation quality evaluation of adopted word knowledge mapping, using the knowledge mapping of a set of multilingual near synonym as basic knowledge base, A set of assessment machine, " Evaluation Machine " are used on the basis of it, abbreviation EM carries out the translation quality of assessment interpreter.

In the first aspect of the invention, a kind of multilingual translation quality evaluation system based near synonym knowledge mapping is proposed System, the evaluation system includes source languages input module, analytic unit, evaluation component and knowledge library component；The knowledge base Component includes the near synonym knowledge mapping knowledge base that sustainable training updates；The analytic unit is based on the knowledge library component The test result that analysis tester provides for the corpus to be translated of source languages input module input, will be described in analysis result input Evaluation component, to obtain the evaluation score of tester；

The knowledge mapping knowledge base includes the similarity distance score of near synonym or related term；The evaluation component is based on described Knowledge mapping knowledge base and model answer are fitted marking, obtain the translation quality score of tester.

When specific implementation, can persistence maintenance a set of automatically updated multilingual near synonym knowledge mapping in advance, this knowledge Map includes the similarity distance score of near synonym or related term；The knowledge library component is based on constructing on the basis of word2vec The distance metrics of near synonym and relative words.

When interpreter has been connected to one section of translation corpus, referred to as source languages corpus, object language, while model answer have been translated into It can prepare several parts of model answers of source corpus, it is independent good that these answers are prepared in advance by teacher, ensure that the correctness of translation； The i.e. described model answer is the translation result for corresponding to the corpus to be translated of preparation in advance；

Preferably, the model answer is more parts, every part of source independence.

Then, the analytic unit is segmented and is filtered to the test result, obtains key evaluating word.

Specifically, analytic unit can segment the translation corpus of interpreter, it then will do it a filtering, filter Word out can be critical evaluation word, for example the ratio translated for the semantic word for having central role and certain professional domain needs is calibrated Word；

Preferably, above system further includes that knowledge library component updates engine, the output knot based on analytic unit and/or evaluation component Fruit is updated the knowledge mapping knowledge base.Based on this point, what is obtained is that one kind can be known with the near synonym of continuous updating Know map knowledge base, in this, as the knowledge base basis of evaluation, ensure that vocabulary can closely follow the development of upper language, be unlikely to fall Afterwards；Meanwhile this knowledge base constructs the distance metric of near synonym and relative words on the basis of word2vec, ensure that vocabulary Between relationship degree measurement, overcome original near synonym only whether the obstacle of relationship, while improving the fining of this method Degree.

It is worth noting that the output based on analytic unit and/or evaluation component is as a result, to the knowledge mapping knowledge base It is updated, not only includes actively updating, further include passively updating.It actively updates, refers to the evaluation score according to output to determine It is fixed whether to update corpus, knowledge base；It is passive to update, then it can choose and regularly updated from internet corpus.

Different from the prior art, the present invention can also be set when the output result of analytic unit and/or evaluation component meets When predetermined condition, the update of manual feedback prompt is realized.This point illustrates emphasis in specific embodiments of the present invention part. Using this mechanism, erroneous judgement problem that may be present can be assessed to avoid simple equipment analysis and machine.

In the second aspect of the invention, a kind of multilingual translation quality evaluation based near synonym knowledge mapping is proposed Method, the method includes source languages input step, target language analytical procedure, target language appraisal procedures；

The source languages input step, inputs corpus to be translated, and personnel to be measured are based on the corpus to be translated and provide test result composition Target language；

The target language analytical procedure, analyzes the test result, provides critical evaluation word；

The target language appraisal procedure is based on the key evaluating word, comments the translation quality of the personnel to be measured Estimate；

The target language analytical procedure, based on the near synonym knowledge mapping knowledge base that sustainable training updates, to the test As a result it carries out participle and filtration treatment, the distance metric based near synonym and relative words obtains the key evaluating word.

The target language appraisal procedure is fitted marking based on the knowledge mapping knowledge base and model answer, obtains The translation quality score of personnel to be measured out.

The model answer is from a variety of different translation engines；And/or different human translations.

Preferably, further include that feedback updates step, impose a condition when the translation quality score of the personnel to be measured meets When, the knowledge mapping knowledge base is updated.

It, can be to avoid it should be pointed out that knowledge library component of the present invention updates engine, feedback update step Simple equipment analysis and machine assess erroneous judgement problem that may be present.In specific embodiments of the present invention part, by emphasis This is illustrated.

Detailed description of the invention

Fig. 1 is the architecture diagram of translation quality evaluation of the invention

Fig. 2 is translation quality evaluation method flow chart of the invention

Fig. 3 is evaluation component (EM) core architecture figure in translation quality evaluation of the invention

Fig. 4 is a kind of membership credentials figure of near synonym knowledge base

Fig. 5 is the instance graph for calculating word and moving distance

Specific embodiment

It referring to Fig.1, is a specific architecture diagram of multilingual translation QA system of the invention, the evaluation system Including the input of source languages, analysis module, evaluation component (EM) and knowledge library component, (corpus constantly accumulated is cyclically updated closely Adopted word knowledge base)；The knowledge library component includes the near synonym knowledge mapping knowledge base that sustainable training updates；The analysis mould Block, the test knot provided based on the knowledge base block analysis tester for the corpus to be translated of source languages input module input Analysis result is inputted the evaluation component, to obtain the evaluation score of tester by fruit.

In the concrete realization, evaluation component is evaluation automatic machine (EM), can be passed through on the analysis foundation of analysis module Model obtains the score of translation quality, this EM is based on machine learning and is fitted to obtain, and is the mould obtained by the training of a large amount of corpus Type, model ensure that the fairness of evaluation, abandon the subjectivity of special messenger's evaluation；

It is translation quality evaluation method flow chart of the invention of the invention, including source languages input step, target referring to Fig. 2 Languages analytical procedure, target language appraisal procedure；

When specific implementation, the above process be may be summarized to be:

1) safeguard that a set of automatically updated multilingual near synonym knowledge mapping, this knowledge mapping include near synonym or correlation The similarity distance score of word；

2) when interpreter has been connected to one section of translation corpus, referred to as source corpus, object language has been translated into, while model answer can be quasi- Several parts of model answers of standby source corpus, it is independent good that these answers are prepared in advance by teacher, ensure that the correctness of translation；

3) analysis module can segment the translation corpus of interpreter, then will do it a filtering, filtering the word obtained can be Critical evaluation word, such as word that semanteme has the word of central role and certain professional domain to need the ratio translated calibrated；

4) result of analysis module is sent into EM, EM can be carried out near synonym knowledge mapping and the common base of model answer Fitting marking, obtains the translation quality score of interpreter.

As another innovative point of the invention, method of the present invention includes that feedback updates step, when it is described to When the translation quality score of survey personnel meets setting condition, the knowledge mapping knowledge base is updated.Update herein is not The problems such as only conventional corpus updates, and can also save outstanding translation result, and equipment analysis bring is avoided to judge by accident.

Inventor has found that computer based analytic unit can not be forever quasi- in the long-term quality evaluation course of work Really, automatic assessment machine (EM) essence based on machine learning is mechanical, and the assessment result provided has no idea to embody literary grace. And language translation is then the infinite process of a potentiality, in literature, it is likely that occur, tester gives one and exquisite turns over It translates as a result, still it does not include key evaluating word that any model answer occurs, at this point, appraisal result may directly score is Zero, so as to cause erroneous judgement.

One simply example is as follows:

For example, the corpus to be translated of input is

(T1)I look for what I miss, I know not what it is, I feel so sad, so drear, so lonely, without cheer.

Its model answer may be:

(A1) I finds my miss, I does not know that it is, I feels such sadness, so cloudy, so lonely, does not have It hails.

(A2) I finds what I missed, and it is what that I, which does not know, I feels to be hard hit, so dull, so lonely, does not have It hails.

But as a high-caliber translator (personnel to be measured), which show following test results:

(A3) it seeks and looks for, it is desolate, it is heartbreaking sad.

But above-mentioned more graceful answer, regardless of analytic unit segments it, how to be filtered, how Near synonym analysis etc. is carried out, the impossible primary election in model answer of the crucial part of speech evaluating word obtained, at this point, if following blindly machine Device engine then necessarily leads to erroneous judgement, and therefore, feedback updates step and has an effect, and manual feedback module detects that the result is More preferably as a result, being saved into EM module and knowledge mapping knowledge base.Only consider that score is higher different from the prior art Situation, in the present invention, cause settings condition that feedback updates not only can be tester score it is higher (higher than certain threshold value), It can also be that the score of tester is obvious relatively low (even 0).

It is the evaluation group in translation quality evaluation of the invention in a specific translation quality evaluation referring to Fig. 3 Part (EM) core architecture figure.Wherein, the value of logistic regression function F output is exactly the translation quality score of interpreter.

Firstly, it is necessary to which, it is noted that comparative diagram 3 and Fig. 1 are it is found that logistic regression function can be equivalent in Fig. 1 in Fig. 3 The combination of analysis module and EM module；

But the present invention may be designed in, interpreter's answer in Fig. 3 has been that treated by the analysis module in Fig. 1 Then efficient matchings input results obtain scoring into logistic regression function F.

It is specifically described its implementation process and related concept below.

Near synonym knowledge base, this knowledge base press following relational organization, wherein ellipse is word node, the line between word node Relatedness metric is 0.762 between its relatedness metric, such as aircraft and helicopter；It specifically can refer to Fig. 4.

Based on the sentence similarity evaluation module of " word shifting distance ", module is accomplished by

The 1 near synonym knowledge base by the training of a large amount of corpus is mainly used to adjust the distance metric between word, guarantees the accurate of distance Property；

2 will train machine learning method one fitting function of training an of logistic regression simultaneously, be denoted as F；

The similarity distance (word-based shifting distance) that the answer and model answer of 3 calculating interpreters is shown in, then presses each model answer It is inputted according to fitting function F, F can export a value, this value is exactly interpreter's translation quality score；Calculation formula is, InFor i, distance corresponding to two words of j near synonym knowledge mapping can thus calculate score.

Word moves distance

It is to do basis with word2vec that word, which moves distance, the method to calculate the similitude between 2 documents.The basic think of of word shifting distance Want that calculating term vector corresponding to any two word in two documents asks Euclidean distance then weighted sum again, is probably this The form of sample:, whereinFor i, (nearly justice is used herein in the Euclidean distance of term vector corresponding to two words of j Distance in word knowledge mapping).Here key is exactly the calculating of Ti, and specific calculate sees Fig. 5.

There are two document (document1- document 2) in Fig. 5, and after removing stop words, every document is only left 4 Word, we seek to compare the similarity between two documents with this four words.Herein, it will be assumed that ' Obama ' this Weight of the word in document 1 is that 0.5(can be calculated simply with word frequency or TFIDF), then due to ' Obama ' ' president ' the very high weight very high with the similarity of ' president ', then we can be moved to by ' Obama ', It is assumed that be 0.4, other words are due to distant with the distance of ' Obama ' in document 2, so smaller weight can be assigned to. Here constraint is, by some word ii in document 1 be moved to the sum of weight of each word in document 2 should in document 1 This word ii weight it is equal, i.e., the weight (0.5) of oneself is given each word in document 2 by ' Obama '.Equally, literary Some word jj in shelves 2, which is received, should be equal to word jj in document 2 by the sum of the weight that each word in document 1 is flowed into Weight.

This cost is acquired after lower bound minimizes, word in all document a can be acquired and be transferred to text The most short total distance of word, represents the similarity between two documents in shelves b.Word moves the computation complexity of distance algorithm by excellent It can achieve O (p2) after changing, wherein p is the number of not repetitor in document.

In general, innovative point of the invention includes at least:

1) the near synonym knowledge mapping knowledge base that a set of sustainable training updates；

2) analysis module is translated, the output of this analysis can supply EM；

3) evaluation automatic machine (EM) can obtain the score of translation quality by model on the analysis foundation of analysis module.

4) conventional better quality translation result and unconventional exquisite translation result have been comprehensively considered, have been unlikely to allow simple Mechanical machine translation generate erroneous judgement.

Claims

1. a kind of multilingual translation QA system based near synonym knowledge mapping, the evaluation system includes that source languages are defeated Enter component, analytic unit, evaluation component and knowledge library component；The knowledge library component includes the nearly justice that sustainable training updates Word knowledge mapping knowledge base；The analytic unit is directed to source languages input group based on the knowledge base block analysis tester Analysis result is inputted the evaluation component, to obtain tester's by the test result that the corpus to be translated of part input provides Evaluate score；

It is characterized by:

2. the system as claimed in claim 1, wherein the analytic unit is segmented and filtered to the test result, is obtained Key evaluating word out.

3. system as claimed in claim 1 or 2, wherein the knowledge library component is based on constructing closely on the basis of word2vec The distance metric of adopted word and relative words.

4. the system as claimed in claim 1, the model answer is the translation for corresponding to the corpus to be translated of preparation in advance As a result.

5. the system as described in claim 4 or 1, the model answer is more parts, every part of source independence.

6. the system as claimed in claim 1 further comprises that knowledge library component updates engine, based on analytic unit and/or comments The output of component is estimated as a result, being updated to the knowledge mapping knowledge base.

7. a kind of multilingual translation quality evaluating method based near synonym knowledge mapping, the method includes the inputs of source languages to walk Suddenly, target language analytical procedure, target language appraisal procedure；

The target language analytical procedure, analyzes the test result, provides key evaluating word；

It is characterized by: the target language analytical procedure, based on the near synonym knowledge mapping knowledge base that sustainable training updates, Participle is carried out to the test result and filtration treatment, the distance metric based near synonym and relative words obtain the pass Keyness evaluating word.

8. the method for claim 7, the target language appraisal procedure is based on the knowledge mapping knowledge base and standard Answer is fitted marking, obtains the translation quality score of personnel to be measured.

9. method according to claim 8, wherein the model answer is from a variety of different translation engines；And/or it is different Human translation.

10. further including such as the described in any item methods of claim 7-9, feedback updates step, when the translation of the personnel to be measured When quality score meets setting condition, the knowledge mapping knowledge base is updated.