CN109710948A - MT engine recommended method and device - Google Patents

MT engine recommended method and device Download PDF

Info

Publication number
CN109710948A
CN109710948A CN201811426193.1A CN201811426193A CN109710948A CN 109710948 A CN109710948 A CN 109710948A CN 201811426193 A CN201811426193 A CN 201811426193A CN 109710948 A CN109710948 A CN 109710948A
Authority
CN
China
Prior art keywords
translation
original text
score
machine translation
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811426193.1A
Other languages
Chinese (zh)
Inventor
宋安琪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Language Network (wuhan) Information Technology Co Ltd
Original Assignee
Language Network (wuhan) Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Language Network (wuhan) Information Technology Co Ltd filed Critical Language Network (wuhan) Information Technology Co Ltd
Priority to CN201811426193.1A priority Critical patent/CN109710948A/en
Publication of CN109710948A publication Critical patent/CN109710948A/en
Pending legal-status Critical Current

Links

Abstract

The embodiment of the present invention provides a kind of MT engine recommended method and device, which comprises the original text translated to interpreter is inputted multiple and different MT engines and is translated, multiple machine translation translations are obtained;The original text to interpreter's translation and the multiple machine translation translation are input in preparatory trained MT engine assessment models, the score of the multiple machine translation translation of the MT engine assessment models output is obtained;The machine translation translation of highest scoring is recommended into the interpreter;Wherein, the MT engine assessment models are obtained after the score based on original text sample and corresponding multiple machine translation translation samples and predetermined the multiple machine translation translation sample is trained.The embodiment of the present invention recommends better machine translation translation by assessing multiple machine translation translations documents to be translated, for interpreter, to promote interpreter's translation quality and translation efficiency.

Description

MT engine recommended method and device
Technical field
The present embodiments relate to natural language processing technique fields, push away more particularly, to a kind of MT engine Recommend method and device.
Background technique
With the development of artificial intelligence, the quality of machine translation is constantly improved, and the postedit based on machine translation becomes A kind of new trend of interpreter's translation.But the MT engine of mainstream is not all to translate best, Yi Xieqi in any field His MT engine also has the characteristic of oneself in certain fields.
Have multiple popular MT engines at present, when being translated, the selected machine translation of interpreter An important factor for engine is influence interpreter's translation quality.Therefore, how to provide a kind of method can be according to original to be translated Text recommends suitable MT engine to interpreter, to improve translation quality, it appears particularly important.
Summary of the invention
The embodiment of the present invention provides a kind of machine translation for overcoming the above problem or at least being partially solved the above problem Engines recommendations method and device.
In a first aspect, the embodiment of the present invention provides a kind of MT engine recommended method, comprising:
The original text translated to interpreter is inputted multiple and different MT engines to translate, obtains multiple machine translation Translation;
The original text to interpreter's translation and the multiple machine translation translation are input to preparatory trained machine In translation engine assessment models, obtaining for the multiple machine translation translation of the MT engine assessment models output is obtained Point;
The machine translation translation of highest scoring is recommended into the interpreter;
Wherein, the MT engine assessment models are based on original text sample and corresponding multiple machine translation translation samples What the score of this and predetermined the multiple machine translation translation sample obtained after being trained.
Second aspect, the embodiment of the present invention provide a kind of MT engine recommendation apparatus, comprising:
Translation module is translated for the original text translated to interpreter to be inputted multiple and different MT engines, is obtained Obtain multiple machine translation translations;
Prediction module, for the original text to interpreter's translation and the multiple machine translation translation to be input in advance In trained MT engine assessment models, the multiple machine of the MT engine assessment models output is obtained Translate the score of translation;
Recommending module, for the machine translation translation of highest scoring to be recommended the interpreter;
Wherein, the MT engine assessment models are based on original text sample and corresponding multiple machine translation translation samples What the score of this and predetermined the multiple machine translation translation sample obtained after being trained.
The third aspect, the embodiment of the present invention provides a kind of electronic equipment, including memory, processor and is stored in memory Computer program that is upper and can running on a processor, is realized when the processor executes described program as first aspect provides MT engine recommended method the step of.
Fourth aspect, the embodiment of the present invention provide a kind of non-transient computer readable storage medium, are stored thereon with calculating Machine program, realizing the MT engine recommended method as provided by first aspect when which is executed by processor Step.
MT engine recommended method provided in an embodiment of the present invention and device, can be by largely learning human translation Reference translation and machine translation translation establish assessment models, by commenting multiple machine translation translations documents to be translated Estimate, recommends better machine translation translation for interpreter, to promote interpreter's translation quality and translation efficiency.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is this hair Bright some embodiments for those of ordinary skill in the art without creative efforts, can be with root Other attached drawings are obtained according to these attached drawings.
Fig. 1 is the flow diagram of MT engine recommended method provided in an embodiment of the present invention;
Fig. 2 is the flow diagram of training machine translation engine assessment models provided in an embodiment of the present invention;
Fig. 3 is the structural schematic diagram of MT engine recommendation apparatus provided in an embodiment of the present invention;
Fig. 4 is the entity structure schematic diagram of electronic equipment provided in an embodiment of the present invention.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art Every other embodiment obtained without creative efforts, shall fall within the protection scope of the present invention.
Fig. 1 is the flow diagram of MT engine recommended method provided in an embodiment of the present invention, as shown, packet It includes:
The original text translated to interpreter is inputted multiple and different MT engines and translates by step 100, is obtained multiple Machine translation translation.
Existing MT engine includes: that Google, Baidu, Netease have, and original text to be translated is inputted different MT engine obtains multiple machine translation translations.
The original text to interpreter's translation and the multiple machine translation translation are input to preparatory training by step 101 In good MT engine assessment models, the multiple machine translation of the MT engine assessment models output is obtained The score of translation.
Specifically, the input of MT engine assessment models is the original text and multiple machines obtained translated to interpreter Translation is translated, is exported as the score of multiple machine translation translations.
MT engine assessment models, which have, translates machine translation based on original text and corresponding multiple machine translation translations The function that the score of text is predicted.
Wherein, the MT engine assessment models are based on original text sample and corresponding multiple machine translation translation samples What the score of this and predetermined the multiple machine translation translation sample obtained after being trained.
It is translated it is worth noting that the score of machine translation translation reflects machine translation translation with reference corresponding to original text Similarity between text.
The machine translation translation of highest scoring is recommended the interpreter by step 102;
Specifically, the score of multiple machine translation translations is compared, highest scoring illustrates machine translation translation and ginseng The similarity highest of translation is examined to get the quality highest of machine translation translation corresponding to point highest, by the machine of highest scoring Translation translation recommends interpreter, so that interpreter carries out postedit on the basis of this machine translation translation, forms final translation Original text, to promote interpreter's translation quality and translation efficiency.
MT engine recommended method provided in an embodiment of the present invention, can be by largely learning the reference of human translation Translation and machine translation translation establish assessment models, by assessing multiple machine translation translations documents to be translated, are Interpreter recommends better machine translation translation, to promote interpreter's translation quality and translation efficiency.
As shown in Fig. 2, being the flow diagram of training machine translation engine assessment models provided in an embodiment of the present invention, i.e., Content based on the above embodiment, training obtains the MT engine assessment models with the following method:
Step 200 obtains original text sample and the corresponding reference translation of the original text sample by bilingualism corpora, and adjusts The corresponding multiple machine translation translation samples of the original text sample are obtained with different machines translation engine.
Specifically, the corresponding reference translation of original text sample is the personnel of Professional translator's translation, by existing bilingual Corpus is available to arrive a large amount of original text and corresponding reference translation.Original text is input to multiple and different machine translation to draw It holds up and is translated, obtain multiple machine translation translations.
Step 201, according to the corresponding reference translation of the original text sample, calculate the corresponding multiple machines of the original text sample Translate the score of translation sample.
Specifically, the score of machine translation translation sample for measure machine translation translation and the artificial translation result of profession it Between similarity, score is higher to illustrate machine translation translation closer to the artificial translation result of profession, the quality of machine translation translation It is higher.Therefore, the score of machine translation translation can translate the similarity between translation and reference translation by computing machine To obtain.
After calculating the score for obtaining multiple machine translation translation samples, sample data set is established, wherein each sample includes As input, the score of multiple machine translation translations is exported as target for original text and corresponding multiple machine translation translations.
Step 202, building deep learning network model, the original text sample and the original text sample is corresponding multiple Machine translation translation sample inputs the deep learning network model and is trained, and is turned over according to the multiple machine that model exports The score of translation sample and the score for calculating the multiple machine translation translation sample obtained calculate loss function, by anti- The parameter of the deep learning network model is updated to propagation algorithm, until meeting preset trained termination condition, saves training At the end of the deep learning network model parameter, obtain MT engine assessment models.
Specifically, based on accessed sample set, deep learning network model is constructed, initializes deep learning network mould The parameter of type will using original text sample and corresponding multiple machine translation translation samples as the input of deep learning network model The target that the score of the multiple machine translation translations obtained is exported as model is calculated, starts to carry out deep learning network of network Training, and according to the mesh of the prediction score of multiple machine translation translation samples of model output and the multiple machine translation translation It marks score and calculates loss function, model parameter is updated by back-propagation algorithm, until meeting pre-set training terminates item Part after training, obtains MT engine assessment models.
Content based on the above embodiment, it is described according to the corresponding reference translation of the original text sample, calculate the original text The step of score of the corresponding multiple machine translation translation samples of sample, specifically:
According to the corresponding reference translation of the original text sample, it is corresponding multiple that the original text sample is calculated using following formula The score of machine translation translation sample:
Score=(BLEU+Similarity+PosScore)/3 (1)
Wherein, BLEU indicates the BLEU score between machine translation translation and reference translation, and Similarity indicates machine The editing distance similarity between translation and reference translation is translated, PosScore is indicated between machine translation translation and reference translation Sincere Word similarity.
Specifically, BLEU (bilingual evaluation understudy, bilingual intertranslation quality auxiliary tool) is one It is a common for measuring the index of machine translation text.But the accuracy of BLEU is not so good.Therefore, implement in the present invention BLEU is combined to common measurement machine translation translation in example with string editing Distance conformability degree and sincere Word similarity Quality.
Editing distance calculating formula of similarity are as follows:
Similarity=(Max (x, y)-Levenshtein (x, y))/Max (x, y) (2)
X indicates interpreter's reference translation in formula (2), and y indicates machine translation translation, most greatly enhancing between Max (x, y) expression x and y Degree, Levenshtein (x, y) indicate the Levenshtein distance between x and y;
Sincere Word similarity calculation formula are as follows:
In formula (3), i=0,1,2,3 respectively represent noun, verb, adjective, adverbial word, α0Value be 0.3, α1Value be 0.3, α2Value be 0.25, α3Value be 0.15, countiRepresent the quantity of all kinds of parts of speech in reference translation, wordjIndicate reference Belong to a certain vocabulary of a certain part of speech, n=count in translationi- 1, sim (wordj) indicate vocabulary and ginseng in machine translation translation The similarity of the vocabulary of same type in translation is examined, if the quantity of a kind of part of speech of certain in reference translation is zero,It is taken as 1.
The calculating process of sincere Word similarity PosScore is exemplified below:
Original text: The time-of-flight camera calculates the distance of the object based on the measured time.
Reference translation: time of flight camera is based on the measured time come the distance of computing object.
Machine translation translation: time-of-flight camera calculates the distance of object according to the time of measurement.
Reference translation word segmentation result are as follows: flight (verb) time (noun) camera (noun) is based on measured by (preposition) (adjective) time (noun) (preposition) calculates (auxiliary word) distance (noun) of (verb) object (noun)
Machine translation translation word segmentation result are as follows: flight (verb) time (noun) camera (noun) is according to (preposition) measurement (adjective) time (noun) calculates (auxiliary word) distance (noun) of (verb) object (noun)
Noun can be calculated and be scored at (1+0.67+1+1)/5=0.734, verb is scored at 1, and adjective is scored at 0.75, adverbial word is scored at 1.
So, PosScore is scored at 0.3*0.734+0.3*1+0.25*0.75+0.15*1=0.86.
Using the score for multiple machine translation translation samples that formula (1) calculates, calculated result is more accurate.
Content based on the above embodiment, the described the step of machine translation translation of highest scoring is recommended into the interpreter Later, further includes:
The translation original text finally confirmed using the interpreter recalculates the machine translation for recommending the interpreter as reference translation The score of translation;
Loss function is calculated according to the score that the score recalculated and the MT engine assessment models export, is led to It crosses back-propagation algorithm and updates the MT engine assessment models.
Specifically, after the machine translation translation of highest scoring being recommended interpreter, interpreter translates according to the machine translation of recommendation Text is edited, and determines final translation original text, which is the reference translation for having professional human translation acknowledged, according to this Translation original text can use formula (1) and recalculate the score for recommending the machine translation translation of interpreter, and then basis recalculates Score and the score of the machine translation translation of MT engine assessment models output calculate loss function, by reversed Propagation algorithm updates the MT engine assessment models.
MT engine recommended method provided in an embodiment of the present invention can be by constantly learning the final translation of interpreter Original text constantly updates MT engine assessment models.
As shown in figure 3, being the structural schematic diagram of MT engine recommendation apparatus provided in an embodiment of the present invention, comprising: Translation module 310, prediction module 320 and recommending module 330, wherein
Translation module 310 is translated for the original text translated to interpreter to be inputted multiple and different MT engines, Obtain multiple machine translation translation
Specifically, existing MT engine includes: that Google, Baidu, Netease have, and translation module 310 will be wait turn over The original text translated inputs different MT engines, obtains multiple machine translation translations.
Prediction module 320, for the original text to interpreter's translation and the multiple machine translation translation to be input to In preparatory trained MT engine assessment models, the multiple of the MT engine assessment models output is obtained The score of machine translation translation.
Specifically, the input of MT engine assessment models is the original text and multiple machines obtained translated to interpreter Translation is translated, is exported as the score of multiple machine translation translations.
MT engine assessment models, which have, translates machine translation based on original text and corresponding multiple machine translation translations The function that the score of text is predicted.
Wherein, the MT engine assessment models are based on original text sample and corresponding multiple machine translation translation samples What the score of this and predetermined the multiple machine translation translation sample obtained after being trained.
It is translated it is worth noting that the score of machine translation translation reflects machine translation translation with reference corresponding to original text Similarity between text.
Recommending module 330, for the machine translation translation of highest scoring to be recommended the interpreter;
Specifically, recommending module 330 is compared the score of multiple machine translation translations, and highest scoring illustrates that machine turns over The similarity highest of translation and reference translation to get machine translation translation corresponding to point highest quality highest, by score Highest machine translation translation recommends interpreter, so that interpreter carries out postedit on the basis of this machine translation translation, is formed Final translation original text, to promote interpreter's translation quality and translation efficiency
MT engine recommendation apparatus provided in an embodiment of the present invention, can be by largely learning the reference of human translation Translation and machine translation translation establish assessment models, by assessing multiple machine translation translations documents to be translated, are Interpreter recommends better machine translation translation, to promote interpreter's translation quality and translation efficiency.
Content based on the above embodiment, described device further include training module, and the training module is specifically used for:
Original text sample and the corresponding reference translation of the original text sample are obtained by bilingualism corpora, and calls different machines Device translation engine obtains the corresponding multiple machine translation translation samples of the original text sample;
According to the corresponding reference translation of the original text sample, the corresponding multiple machine translation translations of the original text sample are calculated The score of sample;
Deep learning network model is constructed, by the original text sample and the corresponding multiple machine translation of the original text sample Translation sample inputs the deep learning network model and is trained, the multiple machine translation translation sample exported according to model This score and the score for calculating the multiple machine translation translation sample obtained calculate loss function, are calculated by backpropagation Method updates the parameter of the deep learning network model, until meeting preset trained termination condition, saves institute at the end of training The parameter of deep learning network model is stated, MT engine assessment models are obtained.
Specifically, the corresponding reference translation of original text sample is the personnel of Professional translator's translation, by existing bilingual Corpus is available to arrive a large amount of original text and corresponding reference translation.Original text is input to multiple and different machine translation to draw It holds up and is translated, obtain multiple machine translation translations.
The score of machine translation translation sample is used to measure the phase between machine translation translation and professional artificial translation result Like degree, score is higher to illustrate machine translation translation closer to the artificial translation result of profession, and the quality of machine translation translation is higher. Therefore, the score of machine translation translation can translate the similarity between translation and reference translation by computing machine to obtain.
After calculating the score for obtaining multiple machine translation translation samples, sample data set is established, wherein each sample includes As input, the score of multiple machine translation translations is exported as target for original text and corresponding multiple machine translation translations.
Training module constructs deep learning network model based on accessed sample set, initializes deep learning network The parameter of model, using original text sample and corresponding multiple machine translation translation samples as the input of deep learning network model, The target that exports as model of score of the multiple machine translation translations obtained will be calculated, start to deep learning network of network into Row training, and according to the prediction score of multiple machine translation translation samples of model output and the multiple machine translation translation Target score calculates loss function, updates model parameter by back-propagation algorithm, until meeting pre-set training terminates Condition after training, obtains MT engine assessment models.
Content based on the above embodiment, the training module are specifically used for:
According to the corresponding reference translation of the original text sample, it is corresponding multiple that the original text sample is calculated using following formula The score of machine translation translation sample:
Score=(BLEU+Similarity+PosScore)/3 (1),
Wherein, BLEU indicates the BLEU score between machine translation translation and reference translation, and Similarity indicates machine The editing distance similarity between translation and reference translation is translated, PosScore is indicated between machine translation translation and reference translation Sincere Word similarity.
Specifically, BLEU (bilingual evaluation understudy, bilingual intertranslation quality auxiliary tool) is one It is a common for measuring the index of machine translation text.But the accuracy of BLEU is not so good.Therefore, implement in the present invention BLEU is combined to common measurement machine translation translation in example with string editing Distance conformability degree and sincere Word similarity Quality.
Editing distance calculating formula of similarity are as follows:
Similarity=(Max (x, y)-Levenshtein (x, y))/Max (x, y) (2)
X indicates interpreter's reference translation in formula (2), and y indicates machine translation translation, most greatly enhancing between Max (x, y) expression x and y Degree, Levenshtein (x, y) indicate the Levenshtein distance between x and y;
Sincere Word similarity calculation formula are as follows:
In formula (3), i=0,1,2,3 respectively represent noun, verb, adjective, adverbial word, α0Value be 0.3, α1Value be 0.3, α2Value be 0.25, α3Value be 0.15, countiRepresent the quantity of all kinds of parts of speech in reference translation, wordjIndicate reference Belong to a certain vocabulary of a certain part of speech, n=count in translationi- 1, sim (wordj) indicate vocabulary and ginseng in machine translation translation The similarity of the vocabulary of same type in translation is examined, if the quantity of a kind of part of speech of certain in reference translation is zero,It is taken as 1.
Content based on the various embodiments described above, described device further include update module, and the update module is specifically used for:
The translation original text finally confirmed using the interpreter recalculates the machine translation for recommending the interpreter as reference translation The score of translation;
Loss function is calculated according to the score that the score recalculated and the MT engine assessment models export, is led to It crosses back-propagation algorithm and updates the MT engine assessment models.
Specifically, after the machine translation translation of highest scoring being recommended interpreter, interpreter translates according to the machine translation of recommendation Text is edited, and determines final translation original text, which is the reference translation for having professional human translation acknowledged, updates mould Root tuber, which according to the translation original text can use formula (1) and recalculate, recommends the score of the machine translation translation of interpreter, then basis The score for recalculating the machine translation translation of score and MT engine assessment models output calculates loss function, The MT engine assessment models are updated by back-propagation algorithm.
Fig. 4 is the entity structure schematic diagram of electronic equipment provided in an embodiment of the present invention, as shown in figure 4, the electronic equipment It may include: processor (processor) 410,420, memory communication interface (Communications Interface) (memory) 430 and communication bus 440, wherein processor 410, communication interface 420, memory 430 pass through communication bus 440 Complete mutual communication.Processor 410 can call the meter that is stored on memory 430 and can run on the processor 410 Calculation machine program, to execute the MT engine recommended method of the various embodiments described above offer, for example, by what is translated to interpreter Original text inputs multiple and different MT engines and is translated, and obtains multiple machine translation translations;It described will be translated to interpreter Original text and the multiple machine translation translation be input in preparatory trained MT engine assessment models, obtain institute State the score of the multiple machine translation translation of MT engine assessment models output;The machine translation of highest scoring is translated Text recommends the interpreter;Wherein, the MT engine assessment models are based on original text sample and corresponding multiple machines What the score of translation translation sample and predetermined the multiple machine translation translation sample obtained after being trained.
In addition, the logical order in above-mentioned memory 430 can be realized by way of SFU software functional unit and conduct Independent product when selling or using, can store in a computer readable storage medium.Based on this understanding, originally The technical solution of the inventive embodiments substantially part of the part that contributes to existing technology or the technical solution in other words It can be embodied in the form of software products, which is stored in a storage medium, including several fingers It enables and using so that a computer equipment (can be personal computer, server or the network equipment etc.) executes the present invention respectively The all or part of the steps of a embodiment the method.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic or disk Etc. the various media that can store program code.
The embodiment of the present invention also provides a kind of non-transient computer readable storage medium, is stored thereon with computer program, The MT engine recommended method that the various embodiments described above provide is realized when the computer program is executed by processor, such as is wrapped It includes: the original text translated to interpreter being inputted into multiple and different MT engines and is translated, obtain multiple machine translation translations; The original text to interpreter's translation and the multiple machine translation translation are input to preparatory trained MT engine In assessment models, the score of the multiple machine translation translation of the MT engine assessment models output is obtained;Will Highest machine translation translation is divided to recommend the interpreter;Wherein, the MT engine assessment models are based on original text sample The score of this and corresponding multiple machine translation translation samples and predetermined the multiple machine translation translation sample into It is obtained after row training.
The apparatus embodiments described above are merely exemplary, wherein described, unit can as illustrated by the separation member It is physically separated with being or may not be, component shown as a unit may or may not be physics list Member, it can it is in one place, or may be distributed over multiple network units.It can be selected according to the actual needs In some or all of the modules achieve the purpose of the solution of this embodiment.Those of ordinary skill in the art are not paying creativeness Labour in the case where, it can understand and implement.
Through the above description of the embodiments, those skilled in the art can be understood that each embodiment can It realizes by means of software and necessary general hardware platform, naturally it is also possible to pass through hardware.Based on this understanding, on Stating technical solution, substantially the part that contributes to existing technology can be embodied in the form of software products in other words, should Computer software product may be stored in a computer readable storage medium, such as ROM/RAM, magnetic disk, CD, including several fingers It enables and using so that a computer equipment (can be personal computer, server or the network equipment etc.) executes each implementation Method described in certain parts of example or embodiment.
Finally, it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although Present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: it still may be used To modify the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features; And these are modified or replaceed, technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution spirit and Range.

Claims (10)

1. a kind of MT engine recommended method characterized by comprising
The original text translated to interpreter is inputted multiple and different MT engines to translate, multiple machine translation is obtained and translates Text;
The original text to interpreter's translation and the multiple machine translation translation are input to preparatory trained machine translation In engine assessment models, the score of the multiple machine translation translation of the MT engine assessment models output is obtained;
The machine translation translation of highest scoring is recommended into the interpreter;
Wherein, the MT engine assessment models are to be based on original text sample and corresponding multiple machine translation translation samples, And the score of predetermined the multiple machine translation translation sample be trained after obtain.
2. the method according to claim 1, wherein the MT engine assessment models are with the following method Training obtains:
Original text sample and the corresponding reference translation of the original text sample are obtained by bilingualism corpora, and different machines is called to turn over It translates engine and obtains the corresponding multiple machine translation translation samples of the original text sample;
According to the corresponding reference translation of the original text sample, the corresponding multiple machine translation translation samples of the original text sample are calculated Score;
Deep learning network model is constructed, by the original text sample and the corresponding multiple machine translation translations of the original text sample Sample inputs the deep learning network model and is trained, according to the multiple machine translation translation sample of model output Score and the score for calculating the multiple machine translation translation sample obtained calculate loss function, more by back-propagation algorithm The parameter of the new deep learning network model saves the depth at the end of training until meeting preset trained termination condition The parameter of learning network model is spent, MT engine assessment models are obtained.
3. according to the method described in claim 2, it is characterized in that, described according to the corresponding reference translation of the original text sample, The step of calculating the score of the corresponding multiple machine translation translation samples of the original text sample, specifically:
According to the corresponding reference translation of the original text sample, the corresponding multiple machines of the original text sample are calculated using following formula Translate the score of translation sample:
Score=(BLEU+Similarity+PosScore)/3,
Wherein, BLEU indicates the BLEU score between machine translation translation and reference translation, and Similarity indicates machine translation Editing distance similarity between translation and reference translation, PosScore indicate the reality between machine translation translation and reference translation Meaning Word similarity.
4. the method according to claim 1, wherein described recommend institute for the machine translation translation of highest scoring After the step of stating interpreter, further includes:
The translation original text finally confirmed using the interpreter recalculates the machine translation translation for recommending the interpreter as reference translation Score;
Loss function is calculated according to the score that the score recalculated and the MT engine assessment models export, by anti- The MT engine assessment models are updated to propagation algorithm.
5. a kind of MT engine recommendation apparatus characterized by comprising
Translation module is translated for the original text translated to interpreter to be inputted multiple and different MT engines, is obtained more A machine translation translation;
Prediction module, for the original text to interpreter's translation and the multiple machine translation translation to be input to preparatory training In good MT engine assessment models, the multiple machine translation of the MT engine assessment models output is obtained The score of translation;
Recommending module, for the machine translation translation of highest scoring to be recommended the interpreter;
Wherein, the MT engine assessment models are to be based on original text sample and corresponding multiple machine translation translation samples, And the score of predetermined the multiple machine translation translation sample be trained after obtain.
6. device according to claim 5, which is characterized in that further include training module, the training module is specifically used for:
Original text sample and the corresponding reference translation of the original text sample are obtained by bilingualism corpora, and different machines is called to turn over It translates engine and obtains the corresponding multiple machine translation translation samples of the original text sample;
According to the corresponding reference translation of the original text sample, the corresponding multiple machine translation translation samples of the original text sample are calculated Score;
Deep learning network model is constructed, by the original text sample and the corresponding multiple machine translation translations of the original text sample Sample inputs the deep learning network model and is trained, according to the multiple machine translation translation sample of model output Score and the score for calculating the multiple machine translation translation sample obtained calculate loss function, more by back-propagation algorithm The parameter of the new deep learning network model saves the depth at the end of training until meeting preset trained termination condition The parameter of learning network model is spent, MT engine assessment models are obtained.
7. device according to claim 5, which is characterized in that the training module is specifically used for:
According to the corresponding reference translation of the original text sample, the corresponding multiple machines of the original text sample are calculated using following formula Translate the score of translation sample:
Score=(BLEU+Similarity+PosScore)/3,
Wherein, BLEU indicates the BLEU score between machine translation translation and reference translation, and Similarity indicates machine translation Editing distance similarity between translation and reference translation, PosScore indicate the reality between machine translation translation and reference translation Meaning Word similarity.
8. device according to claim 5, which is characterized in that further include update module, the update module is specifically used for:
The translation original text finally confirmed using the interpreter recalculates the machine translation translation for recommending the interpreter as reference translation Score;
Loss function is calculated according to the score that the score recalculated and the MT engine assessment models export, by anti- The MT engine assessment models are updated to propagation algorithm.
9. a kind of electronic equipment characterized by comprising
At least one processor;And
At least one processor being connect with the processor communication, in which:
The memory is stored with the program instruction that can be executed by the processor, and the processor calls described program to instruct energy Enough methods executed as described in Claims 1-4 is any.
10. a kind of non-transient computer readable storage medium, which is characterized in that the non-transient computer readable storage medium is deposited Computer instruction is stored up, the computer instruction makes the computer execute the method as described in Claims 1-4 is any.
CN201811426193.1A 2018-11-27 2018-11-27 MT engine recommended method and device Pending CN109710948A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811426193.1A CN109710948A (en) 2018-11-27 2018-11-27 MT engine recommended method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811426193.1A CN109710948A (en) 2018-11-27 2018-11-27 MT engine recommended method and device

Publications (1)

Publication Number Publication Date
CN109710948A true CN109710948A (en) 2019-05-03

Family

ID=66255191

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811426193.1A Pending CN109710948A (en) 2018-11-27 2018-11-27 MT engine recommended method and device

Country Status (1)

Country Link
CN (1) CN109710948A (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110175335A (en) * 2019-05-08 2019-08-27 北京百度网讯科技有限公司 The training method and device of translation model
CN110502762A (en) * 2019-08-27 2019-11-26 北京金山数字娱乐科技有限公司 A kind of transcription platform and its management method
CN110532574A (en) * 2019-08-20 2019-12-03 语联网(武汉)信息技术有限公司 MT engine selection method and device
CN110543643A (en) * 2019-08-21 2019-12-06 语联网(武汉)信息技术有限公司 Training method and device of text translation model
CN110674871A (en) * 2019-09-24 2020-01-10 北京中科凡语科技有限公司 Translation-oriented automatic scoring method and automatic scoring system
CN110717340A (en) * 2019-09-29 2020-01-21 百度在线网络技术(北京)有限公司 Recommendation method and device, electronic equipment and storage medium
CN110991193A (en) * 2019-11-27 2020-04-10 语联网(武汉)信息技术有限公司 Translation matrix model selection system based on OpenKiwi
CN110991194A (en) * 2019-11-27 2020-04-10 语联网(武汉)信息技术有限公司 Engine optimization method based on OpenKiwi evolution and translation system
CN111046676A (en) * 2019-11-27 2020-04-21 语联网(武汉)信息技术有限公司 GMM-based machine-turning engine testing method and translation toolkit
CN111144134A (en) * 2019-11-27 2020-05-12 语联网(武汉)信息技术有限公司 Translation engine automatic evaluation system based on OpenKiwi
CN111160048A (en) * 2019-11-27 2020-05-15 语联网(武汉)信息技术有限公司 Translation engine optimization system and method based on cluster evolution
CN111626066A (en) * 2020-05-27 2020-09-04 辛钧意 Paragraph translation system and method based on big data
CN111666776A (en) * 2020-06-23 2020-09-15 北京字节跳动网络技术有限公司 Document translation method and device, storage medium and electronic equipment
CN111680526A (en) * 2020-06-09 2020-09-18 语联网(武汉)信息技术有限公司 Human-computer interaction translation system and method based on reverse translation result comparison
CN111753559A (en) * 2020-06-28 2020-10-09 语联网(武汉)信息技术有限公司 Large-scale translation corpus task processing system under multi-source input mode
CN111797639A (en) * 2020-06-28 2020-10-20 语联网(武汉)信息技术有限公司 Machine translation quality evaluation method and system
CN111814493A (en) * 2020-04-21 2020-10-23 北京嘀嘀无限科技发展有限公司 Machine translation method, device, electronic equipment and storage medium
CN112749316A (en) * 2019-10-29 2021-05-04 阿里巴巴集团控股有限公司 Translation quality determination method and device, storage medium and processor

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110029300A1 (en) * 2009-07-28 2011-02-03 Daniel Marcu Translating Documents Based On Content
CN103678285A (en) * 2012-08-31 2014-03-26 富士通株式会社 Machine translation method and machine translation system
CN104731777A (en) * 2015-03-31 2015-06-24 网易有道信息技术(北京)有限公司 Translation evaluation method and device
US20170132217A1 (en) * 2015-11-06 2017-05-11 Samsung Electronics Co., Ltd. Apparatus and method for evaluating quality of automatic translation and for constructing distributed representation model
CN107133223A (en) * 2017-04-20 2017-09-05 南京大学 A kind of machine translation optimization method for exploring more reference translation information automatically

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110029300A1 (en) * 2009-07-28 2011-02-03 Daniel Marcu Translating Documents Based On Content
CN103678285A (en) * 2012-08-31 2014-03-26 富士通株式会社 Machine translation method and machine translation system
CN104731777A (en) * 2015-03-31 2015-06-24 网易有道信息技术(北京)有限公司 Translation evaluation method and device
US20170132217A1 (en) * 2015-11-06 2017-05-11 Samsung Electronics Co., Ltd. Apparatus and method for evaluating quality of automatic translation and for constructing distributed representation model
CN107133223A (en) * 2017-04-20 2017-09-05 南京大学 A kind of machine translation optimization method for exploring more reference translation information automatically

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
D LIU: "Source-Language Features and Maximum Correlation Training for Machine Translation Evaluation", 《PROCEEDINGS OF THE CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS》 *
J GIM6NEZ: "Heterogeneous Autmatic MT Evaluation Through Non-Parametric Metric Combination", 《PROCEEDINGS OF THE THIRD INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING》 *
吴焕钦: "基于伪数据的机器翻译质量评估模型的训练", 《北京大学学报(自然科学版)》 *
李良友等: "机器翻译自动评价综述", 《中文信息学报》 *

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110175335B (en) * 2019-05-08 2023-05-09 北京百度网讯科技有限公司 Translation model training method and device
CN110175335A (en) * 2019-05-08 2019-08-27 北京百度网讯科技有限公司 The training method and device of translation model
CN110532574A (en) * 2019-08-20 2019-12-03 语联网(武汉)信息技术有限公司 MT engine selection method and device
CN110543643A (en) * 2019-08-21 2019-12-06 语联网(武汉)信息技术有限公司 Training method and device of text translation model
CN110543643B (en) * 2019-08-21 2022-11-11 语联网(武汉)信息技术有限公司 Training method and device of text translation model
CN110502762A (en) * 2019-08-27 2019-11-26 北京金山数字娱乐科技有限公司 A kind of transcription platform and its management method
CN110674871A (en) * 2019-09-24 2020-01-10 北京中科凡语科技有限公司 Translation-oriented automatic scoring method and automatic scoring system
CN110674871B (en) * 2019-09-24 2023-04-07 北京中科凡语科技有限公司 Translation-oriented automatic scoring method and automatic scoring system
CN110717340A (en) * 2019-09-29 2020-01-21 百度在线网络技术(北京)有限公司 Recommendation method and device, electronic equipment and storage medium
CN110717340B (en) * 2019-09-29 2023-11-21 百度在线网络技术(北京)有限公司 Recommendation method, recommendation device, electronic equipment and storage medium
CN112749316A (en) * 2019-10-29 2021-05-04 阿里巴巴集团控股有限公司 Translation quality determination method and device, storage medium and processor
CN111144134A (en) * 2019-11-27 2020-05-12 语联网(武汉)信息技术有限公司 Translation engine automatic evaluation system based on OpenKiwi
CN111160048A (en) * 2019-11-27 2020-05-15 语联网(武汉)信息技术有限公司 Translation engine optimization system and method based on cluster evolution
CN111046676A (en) * 2019-11-27 2020-04-21 语联网(武汉)信息技术有限公司 GMM-based machine-turning engine testing method and translation toolkit
CN110991194A (en) * 2019-11-27 2020-04-10 语联网(武汉)信息技术有限公司 Engine optimization method based on OpenKiwi evolution and translation system
CN110991193A (en) * 2019-11-27 2020-04-10 语联网(武汉)信息技术有限公司 Translation matrix model selection system based on OpenKiwi
CN111814493A (en) * 2020-04-21 2020-10-23 北京嘀嘀无限科技发展有限公司 Machine translation method, device, electronic equipment and storage medium
CN111626066A (en) * 2020-05-27 2020-09-04 辛钧意 Paragraph translation system and method based on big data
CN111626066B (en) * 2020-05-27 2021-04-13 重庆六花网络科技有限公司 Paragraph translation system and method based on big data
CN111680526A (en) * 2020-06-09 2020-09-18 语联网(武汉)信息技术有限公司 Human-computer interaction translation system and method based on reverse translation result comparison
CN111680526B (en) * 2020-06-09 2023-09-08 语联网(武汉)信息技术有限公司 Man-machine interactive translation system and method based on comparison of reverse translation results
US11580314B2 (en) 2020-06-23 2023-02-14 Beijing Bytedance Network Technology Co., Ltd. Document translation method and apparatus, storage medium, and electronic device
CN111666776B (en) * 2020-06-23 2021-07-23 北京字节跳动网络技术有限公司 Document translation method and device, storage medium and electronic equipment
CN111666776A (en) * 2020-06-23 2020-09-15 北京字节跳动网络技术有限公司 Document translation method and device, storage medium and electronic equipment
CN111797639A (en) * 2020-06-28 2020-10-20 语联网(武汉)信息技术有限公司 Machine translation quality evaluation method and system
CN111753559A (en) * 2020-06-28 2020-10-09 语联网(武汉)信息技术有限公司 Large-scale translation corpus task processing system under multi-source input mode
CN111753559B (en) * 2020-06-28 2024-02-23 语联网(武汉)信息技术有限公司 Large-scale translation corpus task processing system in multi-source input mode
CN111797639B (en) * 2020-06-28 2024-03-26 语联网(武汉)信息技术有限公司 Machine translation quality assessment method and system

Similar Documents

Publication Publication Date Title
CN109710948A (en) MT engine recommended method and device
CN108363790B (en) Method, device, equipment and storage medium for evaluating comments
CN109670191B (en) Calibration optimization method and device for machine translation and electronic equipment
KR20200094627A (en) Method, apparatus, device and medium for determining text relevance
CN109299280B (en) Short text clustering analysis method and device and terminal equipment
CN107704503A (en) User's keyword extracting device, method and computer-readable recording medium
CN111310440B (en) Text error correction method, device and system
CN110457708B (en) Vocabulary mining method and device based on artificial intelligence, server and storage medium
CN105893410A (en) Keyword extraction method and apparatus
CN108875074A (en) Based on answer selection method, device and the electronic equipment for intersecting attention neural network
KR20170055970A (en) Computer-implemented identification of related items
CN111046154A (en) Information retrieval method, information retrieval device, information retrieval medium and electronic equipment
CN112579727B (en) Document content extraction method and device, electronic equipment and storage medium
JP7430820B2 (en) Sorting model training method and device, electronic equipment, computer readable storage medium, computer program
CN110874528B (en) Text similarity obtaining method and device
CN110532575A (en) Text interpretation method and device
CN111144120A (en) Training sentence acquisition method and device, storage medium and electronic equipment
CN111563384A (en) Evaluation object identification method and device for E-commerce products and storage medium
CN110210028A (en) For domain feature words extracting method, device, equipment and the medium of speech translation text
JP2018025874A (en) Text analyzer and program
CN109117475B (en) Text rewriting method and related equipment
CN110263127A (en) Text search method and device is carried out based on user query word
CN115860006A (en) Aspect level emotion prediction method and device based on semantic syntax
CN112463989A (en) Knowledge graph-based information acquisition method and system
WO2023029354A1 (en) Text information extraction method and apparatus, and storage medium and computer device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190503

RJ01 Rejection of invention patent application after publication