CN109710948A - MT engine recommended method and device - Google Patents
MT engine recommended method and device Download PDFInfo
- Publication number
- CN109710948A CN109710948A CN201811426193.1A CN201811426193A CN109710948A CN 109710948 A CN109710948 A CN 109710948A CN 201811426193 A CN201811426193 A CN 201811426193A CN 109710948 A CN109710948 A CN 109710948A
- Authority
- CN
- China
- Prior art keywords
- translation
- original text
- score
- machine translation
- sample
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 30
- 238000013519 translation Methods 0.000 claims abstract description 457
- 230000014616 translation Effects 0.000 claims abstract description 455
- 238000012549 training Methods 0.000 claims description 24
- 238000013135 deep learning Methods 0.000 claims description 22
- 230000006870 function Effects 0.000 claims description 14
- 238000004422 calculation algorithm Methods 0.000 claims description 11
- 238000004891 communication Methods 0.000 claims description 7
- 230000001052 transient effect Effects 0.000 claims description 4
- 230000007306 turnover Effects 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 8
- 238000005259 measurement Methods 0.000 description 4
- 241000208340 Araliaceae Species 0.000 description 3
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 3
- 235000003140 Panax quinquefolius Nutrition 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 3
- 238000004590 computer program Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 235000008434 ginseng Nutrition 0.000 description 3
- 230000002708 enhancing effect Effects 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
Abstract
The embodiment of the present invention provides a kind of MT engine recommended method and device, which comprises the original text translated to interpreter is inputted multiple and different MT engines and is translated, multiple machine translation translations are obtained;The original text to interpreter's translation and the multiple machine translation translation are input in preparatory trained MT engine assessment models, the score of the multiple machine translation translation of the MT engine assessment models output is obtained;The machine translation translation of highest scoring is recommended into the interpreter;Wherein, the MT engine assessment models are obtained after the score based on original text sample and corresponding multiple machine translation translation samples and predetermined the multiple machine translation translation sample is trained.The embodiment of the present invention recommends better machine translation translation by assessing multiple machine translation translations documents to be translated, for interpreter, to promote interpreter's translation quality and translation efficiency.
Description
Technical field
The present embodiments relate to natural language processing technique fields, push away more particularly, to a kind of MT engine
Recommend method and device.
Background technique
With the development of artificial intelligence, the quality of machine translation is constantly improved, and the postedit based on machine translation becomes
A kind of new trend of interpreter's translation.But the MT engine of mainstream is not all to translate best, Yi Xieqi in any field
His MT engine also has the characteristic of oneself in certain fields.
Have multiple popular MT engines at present, when being translated, the selected machine translation of interpreter
An important factor for engine is influence interpreter's translation quality.Therefore, how to provide a kind of method can be according to original to be translated
Text recommends suitable MT engine to interpreter, to improve translation quality, it appears particularly important.
Summary of the invention
The embodiment of the present invention provides a kind of machine translation for overcoming the above problem or at least being partially solved the above problem
Engines recommendations method and device.
In a first aspect, the embodiment of the present invention provides a kind of MT engine recommended method, comprising:
The original text translated to interpreter is inputted multiple and different MT engines to translate, obtains multiple machine translation
Translation;
The original text to interpreter's translation and the multiple machine translation translation are input to preparatory trained machine
In translation engine assessment models, obtaining for the multiple machine translation translation of the MT engine assessment models output is obtained
Point;
The machine translation translation of highest scoring is recommended into the interpreter;
Wherein, the MT engine assessment models are based on original text sample and corresponding multiple machine translation translation samples
What the score of this and predetermined the multiple machine translation translation sample obtained after being trained.
Second aspect, the embodiment of the present invention provide a kind of MT engine recommendation apparatus, comprising:
Translation module is translated for the original text translated to interpreter to be inputted multiple and different MT engines, is obtained
Obtain multiple machine translation translations;
Prediction module, for the original text to interpreter's translation and the multiple machine translation translation to be input in advance
In trained MT engine assessment models, the multiple machine of the MT engine assessment models output is obtained
Translate the score of translation;
Recommending module, for the machine translation translation of highest scoring to be recommended the interpreter;
Wherein, the MT engine assessment models are based on original text sample and corresponding multiple machine translation translation samples
What the score of this and predetermined the multiple machine translation translation sample obtained after being trained.
The third aspect, the embodiment of the present invention provides a kind of electronic equipment, including memory, processor and is stored in memory
Computer program that is upper and can running on a processor, is realized when the processor executes described program as first aspect provides
MT engine recommended method the step of.
Fourth aspect, the embodiment of the present invention provide a kind of non-transient computer readable storage medium, are stored thereon with calculating
Machine program, realizing the MT engine recommended method as provided by first aspect when which is executed by processor
Step.
MT engine recommended method provided in an embodiment of the present invention and device, can be by largely learning human translation
Reference translation and machine translation translation establish assessment models, by commenting multiple machine translation translations documents to be translated
Estimate, recommends better machine translation translation for interpreter, to promote interpreter's translation quality and translation efficiency.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is this hair
Bright some embodiments for those of ordinary skill in the art without creative efforts, can be with root
Other attached drawings are obtained according to these attached drawings.
Fig. 1 is the flow diagram of MT engine recommended method provided in an embodiment of the present invention;
Fig. 2 is the flow diagram of training machine translation engine assessment models provided in an embodiment of the present invention;
Fig. 3 is the structural schematic diagram of MT engine recommendation apparatus provided in an embodiment of the present invention;
Fig. 4 is the entity structure schematic diagram of electronic equipment provided in an embodiment of the present invention.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention
In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is
A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art
Every other embodiment obtained without creative efforts, shall fall within the protection scope of the present invention.
Fig. 1 is the flow diagram of MT engine recommended method provided in an embodiment of the present invention, as shown, packet
It includes:
The original text translated to interpreter is inputted multiple and different MT engines and translates by step 100, is obtained multiple
Machine translation translation.
Existing MT engine includes: that Google, Baidu, Netease have, and original text to be translated is inputted different
MT engine obtains multiple machine translation translations.
The original text to interpreter's translation and the multiple machine translation translation are input to preparatory training by step 101
In good MT engine assessment models, the multiple machine translation of the MT engine assessment models output is obtained
The score of translation.
Specifically, the input of MT engine assessment models is the original text and multiple machines obtained translated to interpreter
Translation is translated, is exported as the score of multiple machine translation translations.
MT engine assessment models, which have, translates machine translation based on original text and corresponding multiple machine translation translations
The function that the score of text is predicted.
Wherein, the MT engine assessment models are based on original text sample and corresponding multiple machine translation translation samples
What the score of this and predetermined the multiple machine translation translation sample obtained after being trained.
It is translated it is worth noting that the score of machine translation translation reflects machine translation translation with reference corresponding to original text
Similarity between text.
The machine translation translation of highest scoring is recommended the interpreter by step 102;
Specifically, the score of multiple machine translation translations is compared, highest scoring illustrates machine translation translation and ginseng
The similarity highest of translation is examined to get the quality highest of machine translation translation corresponding to point highest, by the machine of highest scoring
Translation translation recommends interpreter, so that interpreter carries out postedit on the basis of this machine translation translation, forms final translation
Original text, to promote interpreter's translation quality and translation efficiency.
MT engine recommended method provided in an embodiment of the present invention, can be by largely learning the reference of human translation
Translation and machine translation translation establish assessment models, by assessing multiple machine translation translations documents to be translated, are
Interpreter recommends better machine translation translation, to promote interpreter's translation quality and translation efficiency.
As shown in Fig. 2, being the flow diagram of training machine translation engine assessment models provided in an embodiment of the present invention, i.e.,
Content based on the above embodiment, training obtains the MT engine assessment models with the following method:
Step 200 obtains original text sample and the corresponding reference translation of the original text sample by bilingualism corpora, and adjusts
The corresponding multiple machine translation translation samples of the original text sample are obtained with different machines translation engine.
Specifically, the corresponding reference translation of original text sample is the personnel of Professional translator's translation, by existing bilingual
Corpus is available to arrive a large amount of original text and corresponding reference translation.Original text is input to multiple and different machine translation to draw
It holds up and is translated, obtain multiple machine translation translations.
Step 201, according to the corresponding reference translation of the original text sample, calculate the corresponding multiple machines of the original text sample
Translate the score of translation sample.
Specifically, the score of machine translation translation sample for measure machine translation translation and the artificial translation result of profession it
Between similarity, score is higher to illustrate machine translation translation closer to the artificial translation result of profession, the quality of machine translation translation
It is higher.Therefore, the score of machine translation translation can translate the similarity between translation and reference translation by computing machine
To obtain.
After calculating the score for obtaining multiple machine translation translation samples, sample data set is established, wherein each sample includes
As input, the score of multiple machine translation translations is exported as target for original text and corresponding multiple machine translation translations.
Step 202, building deep learning network model, the original text sample and the original text sample is corresponding multiple
Machine translation translation sample inputs the deep learning network model and is trained, and is turned over according to the multiple machine that model exports
The score of translation sample and the score for calculating the multiple machine translation translation sample obtained calculate loss function, by anti-
The parameter of the deep learning network model is updated to propagation algorithm, until meeting preset trained termination condition, saves training
At the end of the deep learning network model parameter, obtain MT engine assessment models.
Specifically, based on accessed sample set, deep learning network model is constructed, initializes deep learning network mould
The parameter of type will using original text sample and corresponding multiple machine translation translation samples as the input of deep learning network model
The target that the score of the multiple machine translation translations obtained is exported as model is calculated, starts to carry out deep learning network of network
Training, and according to the mesh of the prediction score of multiple machine translation translation samples of model output and the multiple machine translation translation
It marks score and calculates loss function, model parameter is updated by back-propagation algorithm, until meeting pre-set training terminates item
Part after training, obtains MT engine assessment models.
Content based on the above embodiment, it is described according to the corresponding reference translation of the original text sample, calculate the original text
The step of score of the corresponding multiple machine translation translation samples of sample, specifically:
According to the corresponding reference translation of the original text sample, it is corresponding multiple that the original text sample is calculated using following formula
The score of machine translation translation sample:
Score=(BLEU+Similarity+PosScore)/3 (1)
Wherein, BLEU indicates the BLEU score between machine translation translation and reference translation, and Similarity indicates machine
The editing distance similarity between translation and reference translation is translated, PosScore is indicated between machine translation translation and reference translation
Sincere Word similarity.
Specifically, BLEU (bilingual evaluation understudy, bilingual intertranslation quality auxiliary tool) is one
It is a common for measuring the index of machine translation text.But the accuracy of BLEU is not so good.Therefore, implement in the present invention
BLEU is combined to common measurement machine translation translation in example with string editing Distance conformability degree and sincere Word similarity
Quality.
Editing distance calculating formula of similarity are as follows:
Similarity=(Max (x, y)-Levenshtein (x, y))/Max (x, y) (2)
X indicates interpreter's reference translation in formula (2), and y indicates machine translation translation, most greatly enhancing between Max (x, y) expression x and y
Degree, Levenshtein (x, y) indicate the Levenshtein distance between x and y;
Sincere Word similarity calculation formula are as follows:
In formula (3), i=0,1,2,3 respectively represent noun, verb, adjective, adverbial word, α0Value be 0.3, α1Value be
0.3, α2Value be 0.25, α3Value be 0.15, countiRepresent the quantity of all kinds of parts of speech in reference translation, wordjIndicate reference
Belong to a certain vocabulary of a certain part of speech, n=count in translationi- 1, sim (wordj) indicate vocabulary and ginseng in machine translation translation
The similarity of the vocabulary of same type in translation is examined, if the quantity of a kind of part of speech of certain in reference translation is zero,It is taken as 1.
The calculating process of sincere Word similarity PosScore is exemplified below:
Original text: The time-of-flight camera calculates the distance of the object
based on the measured time.
Reference translation: time of flight camera is based on the measured time come the distance of computing object.
Machine translation translation: time-of-flight camera calculates the distance of object according to the time of measurement.
Reference translation word segmentation result are as follows: flight (verb) time (noun) camera (noun) is based on measured by (preposition)
(adjective) time (noun) (preposition) calculates (auxiliary word) distance (noun) of (verb) object (noun)
Machine translation translation word segmentation result are as follows: flight (verb) time (noun) camera (noun) is according to (preposition) measurement
(adjective) time (noun) calculates (auxiliary word) distance (noun) of (verb) object (noun)
Noun can be calculated and be scored at (1+0.67+1+1)/5=0.734, verb is scored at 1, and adjective is scored at
0.75, adverbial word is scored at 1.
So, PosScore is scored at 0.3*0.734+0.3*1+0.25*0.75+0.15*1=0.86.
Using the score for multiple machine translation translation samples that formula (1) calculates, calculated result is more accurate.
Content based on the above embodiment, the described the step of machine translation translation of highest scoring is recommended into the interpreter
Later, further includes:
The translation original text finally confirmed using the interpreter recalculates the machine translation for recommending the interpreter as reference translation
The score of translation;
Loss function is calculated according to the score that the score recalculated and the MT engine assessment models export, is led to
It crosses back-propagation algorithm and updates the MT engine assessment models.
Specifically, after the machine translation translation of highest scoring being recommended interpreter, interpreter translates according to the machine translation of recommendation
Text is edited, and determines final translation original text, which is the reference translation for having professional human translation acknowledged, according to this
Translation original text can use formula (1) and recalculate the score for recommending the machine translation translation of interpreter, and then basis recalculates
Score and the score of the machine translation translation of MT engine assessment models output calculate loss function, by reversed
Propagation algorithm updates the MT engine assessment models.
MT engine recommended method provided in an embodiment of the present invention can be by constantly learning the final translation of interpreter
Original text constantly updates MT engine assessment models.
As shown in figure 3, being the structural schematic diagram of MT engine recommendation apparatus provided in an embodiment of the present invention, comprising:
Translation module 310, prediction module 320 and recommending module 330, wherein
Translation module 310 is translated for the original text translated to interpreter to be inputted multiple and different MT engines,
Obtain multiple machine translation translation
Specifically, existing MT engine includes: that Google, Baidu, Netease have, and translation module 310 will be wait turn over
The original text translated inputs different MT engines, obtains multiple machine translation translations.
Prediction module 320, for the original text to interpreter's translation and the multiple machine translation translation to be input to
In preparatory trained MT engine assessment models, the multiple of the MT engine assessment models output is obtained
The score of machine translation translation.
Specifically, the input of MT engine assessment models is the original text and multiple machines obtained translated to interpreter
Translation is translated, is exported as the score of multiple machine translation translations.
MT engine assessment models, which have, translates machine translation based on original text and corresponding multiple machine translation translations
The function that the score of text is predicted.
Wherein, the MT engine assessment models are based on original text sample and corresponding multiple machine translation translation samples
What the score of this and predetermined the multiple machine translation translation sample obtained after being trained.
It is translated it is worth noting that the score of machine translation translation reflects machine translation translation with reference corresponding to original text
Similarity between text.
Recommending module 330, for the machine translation translation of highest scoring to be recommended the interpreter;
Specifically, recommending module 330 is compared the score of multiple machine translation translations, and highest scoring illustrates that machine turns over
The similarity highest of translation and reference translation to get machine translation translation corresponding to point highest quality highest, by score
Highest machine translation translation recommends interpreter, so that interpreter carries out postedit on the basis of this machine translation translation, is formed
Final translation original text, to promote interpreter's translation quality and translation efficiency
MT engine recommendation apparatus provided in an embodiment of the present invention, can be by largely learning the reference of human translation
Translation and machine translation translation establish assessment models, by assessing multiple machine translation translations documents to be translated, are
Interpreter recommends better machine translation translation, to promote interpreter's translation quality and translation efficiency.
Content based on the above embodiment, described device further include training module, and the training module is specifically used for:
Original text sample and the corresponding reference translation of the original text sample are obtained by bilingualism corpora, and calls different machines
Device translation engine obtains the corresponding multiple machine translation translation samples of the original text sample;
According to the corresponding reference translation of the original text sample, the corresponding multiple machine translation translations of the original text sample are calculated
The score of sample;
Deep learning network model is constructed, by the original text sample and the corresponding multiple machine translation of the original text sample
Translation sample inputs the deep learning network model and is trained, the multiple machine translation translation sample exported according to model
This score and the score for calculating the multiple machine translation translation sample obtained calculate loss function, are calculated by backpropagation
Method updates the parameter of the deep learning network model, until meeting preset trained termination condition, saves institute at the end of training
The parameter of deep learning network model is stated, MT engine assessment models are obtained.
Specifically, the corresponding reference translation of original text sample is the personnel of Professional translator's translation, by existing bilingual
Corpus is available to arrive a large amount of original text and corresponding reference translation.Original text is input to multiple and different machine translation to draw
It holds up and is translated, obtain multiple machine translation translations.
The score of machine translation translation sample is used to measure the phase between machine translation translation and professional artificial translation result
Like degree, score is higher to illustrate machine translation translation closer to the artificial translation result of profession, and the quality of machine translation translation is higher.
Therefore, the score of machine translation translation can translate the similarity between translation and reference translation by computing machine to obtain.
After calculating the score for obtaining multiple machine translation translation samples, sample data set is established, wherein each sample includes
As input, the score of multiple machine translation translations is exported as target for original text and corresponding multiple machine translation translations.
Training module constructs deep learning network model based on accessed sample set, initializes deep learning network
The parameter of model, using original text sample and corresponding multiple machine translation translation samples as the input of deep learning network model,
The target that exports as model of score of the multiple machine translation translations obtained will be calculated, start to deep learning network of network into
Row training, and according to the prediction score of multiple machine translation translation samples of model output and the multiple machine translation translation
Target score calculates loss function, updates model parameter by back-propagation algorithm, until meeting pre-set training terminates
Condition after training, obtains MT engine assessment models.
Content based on the above embodiment, the training module are specifically used for:
According to the corresponding reference translation of the original text sample, it is corresponding multiple that the original text sample is calculated using following formula
The score of machine translation translation sample:
Score=(BLEU+Similarity+PosScore)/3 (1),
Wherein, BLEU indicates the BLEU score between machine translation translation and reference translation, and Similarity indicates machine
The editing distance similarity between translation and reference translation is translated, PosScore is indicated between machine translation translation and reference translation
Sincere Word similarity.
Specifically, BLEU (bilingual evaluation understudy, bilingual intertranslation quality auxiliary tool) is one
It is a common for measuring the index of machine translation text.But the accuracy of BLEU is not so good.Therefore, implement in the present invention
BLEU is combined to common measurement machine translation translation in example with string editing Distance conformability degree and sincere Word similarity
Quality.
Editing distance calculating formula of similarity are as follows:
Similarity=(Max (x, y)-Levenshtein (x, y))/Max (x, y) (2)
X indicates interpreter's reference translation in formula (2), and y indicates machine translation translation, most greatly enhancing between Max (x, y) expression x and y
Degree, Levenshtein (x, y) indicate the Levenshtein distance between x and y;
Sincere Word similarity calculation formula are as follows:
In formula (3), i=0,1,2,3 respectively represent noun, verb, adjective, adverbial word, α0Value be 0.3, α1Value be
0.3, α2Value be 0.25, α3Value be 0.15, countiRepresent the quantity of all kinds of parts of speech in reference translation, wordjIndicate reference
Belong to a certain vocabulary of a certain part of speech, n=count in translationi- 1, sim (wordj) indicate vocabulary and ginseng in machine translation translation
The similarity of the vocabulary of same type in translation is examined, if the quantity of a kind of part of speech of certain in reference translation is zero,It is taken as 1.
Content based on the various embodiments described above, described device further include update module, and the update module is specifically used for:
The translation original text finally confirmed using the interpreter recalculates the machine translation for recommending the interpreter as reference translation
The score of translation;
Loss function is calculated according to the score that the score recalculated and the MT engine assessment models export, is led to
It crosses back-propagation algorithm and updates the MT engine assessment models.
Specifically, after the machine translation translation of highest scoring being recommended interpreter, interpreter translates according to the machine translation of recommendation
Text is edited, and determines final translation original text, which is the reference translation for having professional human translation acknowledged, updates mould
Root tuber, which according to the translation original text can use formula (1) and recalculate, recommends the score of the machine translation translation of interpreter, then basis
The score for recalculating the machine translation translation of score and MT engine assessment models output calculates loss function,
The MT engine assessment models are updated by back-propagation algorithm.
Fig. 4 is the entity structure schematic diagram of electronic equipment provided in an embodiment of the present invention, as shown in figure 4, the electronic equipment
It may include: processor (processor) 410,420, memory communication interface (Communications Interface)
(memory) 430 and communication bus 440, wherein processor 410, communication interface 420, memory 430 pass through communication bus 440
Complete mutual communication.Processor 410 can call the meter that is stored on memory 430 and can run on the processor 410
Calculation machine program, to execute the MT engine recommended method of the various embodiments described above offer, for example, by what is translated to interpreter
Original text inputs multiple and different MT engines and is translated, and obtains multiple machine translation translations;It described will be translated to interpreter
Original text and the multiple machine translation translation be input in preparatory trained MT engine assessment models, obtain institute
State the score of the multiple machine translation translation of MT engine assessment models output;The machine translation of highest scoring is translated
Text recommends the interpreter;Wherein, the MT engine assessment models are based on original text sample and corresponding multiple machines
What the score of translation translation sample and predetermined the multiple machine translation translation sample obtained after being trained.
In addition, the logical order in above-mentioned memory 430 can be realized by way of SFU software functional unit and conduct
Independent product when selling or using, can store in a computer readable storage medium.Based on this understanding, originally
The technical solution of the inventive embodiments substantially part of the part that contributes to existing technology or the technical solution in other words
It can be embodied in the form of software products, which is stored in a storage medium, including several fingers
It enables and using so that a computer equipment (can be personal computer, server or the network equipment etc.) executes the present invention respectively
The all or part of the steps of a embodiment the method.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory
(ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic or disk
Etc. the various media that can store program code.
The embodiment of the present invention also provides a kind of non-transient computer readable storage medium, is stored thereon with computer program,
The MT engine recommended method that the various embodiments described above provide is realized when the computer program is executed by processor, such as is wrapped
It includes: the original text translated to interpreter being inputted into multiple and different MT engines and is translated, obtain multiple machine translation translations;
The original text to interpreter's translation and the multiple machine translation translation are input to preparatory trained MT engine
In assessment models, the score of the multiple machine translation translation of the MT engine assessment models output is obtained;Will
Highest machine translation translation is divided to recommend the interpreter;Wherein, the MT engine assessment models are based on original text sample
The score of this and corresponding multiple machine translation translation samples and predetermined the multiple machine translation translation sample into
It is obtained after row training.
The apparatus embodiments described above are merely exemplary, wherein described, unit can as illustrated by the separation member
It is physically separated with being or may not be, component shown as a unit may or may not be physics list
Member, it can it is in one place, or may be distributed over multiple network units.It can be selected according to the actual needs
In some or all of the modules achieve the purpose of the solution of this embodiment.Those of ordinary skill in the art are not paying creativeness
Labour in the case where, it can understand and implement.
Through the above description of the embodiments, those skilled in the art can be understood that each embodiment can
It realizes by means of software and necessary general hardware platform, naturally it is also possible to pass through hardware.Based on this understanding, on
Stating technical solution, substantially the part that contributes to existing technology can be embodied in the form of software products in other words, should
Computer software product may be stored in a computer readable storage medium, such as ROM/RAM, magnetic disk, CD, including several fingers
It enables and using so that a computer equipment (can be personal computer, server or the network equipment etc.) executes each implementation
Method described in certain parts of example or embodiment.
Finally, it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although
Present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: it still may be used
To modify the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features;
And these are modified or replaceed, technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution spirit and
Range.
Claims (10)
1. a kind of MT engine recommended method characterized by comprising
The original text translated to interpreter is inputted multiple and different MT engines to translate, multiple machine translation is obtained and translates
Text;
The original text to interpreter's translation and the multiple machine translation translation are input to preparatory trained machine translation
In engine assessment models, the score of the multiple machine translation translation of the MT engine assessment models output is obtained;
The machine translation translation of highest scoring is recommended into the interpreter;
Wherein, the MT engine assessment models are to be based on original text sample and corresponding multiple machine translation translation samples,
And the score of predetermined the multiple machine translation translation sample be trained after obtain.
2. the method according to claim 1, wherein the MT engine assessment models are with the following method
Training obtains:
Original text sample and the corresponding reference translation of the original text sample are obtained by bilingualism corpora, and different machines is called to turn over
It translates engine and obtains the corresponding multiple machine translation translation samples of the original text sample;
According to the corresponding reference translation of the original text sample, the corresponding multiple machine translation translation samples of the original text sample are calculated
Score;
Deep learning network model is constructed, by the original text sample and the corresponding multiple machine translation translations of the original text sample
Sample inputs the deep learning network model and is trained, according to the multiple machine translation translation sample of model output
Score and the score for calculating the multiple machine translation translation sample obtained calculate loss function, more by back-propagation algorithm
The parameter of the new deep learning network model saves the depth at the end of training until meeting preset trained termination condition
The parameter of learning network model is spent, MT engine assessment models are obtained.
3. according to the method described in claim 2, it is characterized in that, described according to the corresponding reference translation of the original text sample,
The step of calculating the score of the corresponding multiple machine translation translation samples of the original text sample, specifically:
According to the corresponding reference translation of the original text sample, the corresponding multiple machines of the original text sample are calculated using following formula
Translate the score of translation sample:
Score=(BLEU+Similarity+PosScore)/3,
Wherein, BLEU indicates the BLEU score between machine translation translation and reference translation, and Similarity indicates machine translation
Editing distance similarity between translation and reference translation, PosScore indicate the reality between machine translation translation and reference translation
Meaning Word similarity.
4. the method according to claim 1, wherein described recommend institute for the machine translation translation of highest scoring
After the step of stating interpreter, further includes:
The translation original text finally confirmed using the interpreter recalculates the machine translation translation for recommending the interpreter as reference translation
Score;
Loss function is calculated according to the score that the score recalculated and the MT engine assessment models export, by anti-
The MT engine assessment models are updated to propagation algorithm.
5. a kind of MT engine recommendation apparatus characterized by comprising
Translation module is translated for the original text translated to interpreter to be inputted multiple and different MT engines, is obtained more
A machine translation translation;
Prediction module, for the original text to interpreter's translation and the multiple machine translation translation to be input to preparatory training
In good MT engine assessment models, the multiple machine translation of the MT engine assessment models output is obtained
The score of translation;
Recommending module, for the machine translation translation of highest scoring to be recommended the interpreter;
Wherein, the MT engine assessment models are to be based on original text sample and corresponding multiple machine translation translation samples,
And the score of predetermined the multiple machine translation translation sample be trained after obtain.
6. device according to claim 5, which is characterized in that further include training module, the training module is specifically used for:
Original text sample and the corresponding reference translation of the original text sample are obtained by bilingualism corpora, and different machines is called to turn over
It translates engine and obtains the corresponding multiple machine translation translation samples of the original text sample;
According to the corresponding reference translation of the original text sample, the corresponding multiple machine translation translation samples of the original text sample are calculated
Score;
Deep learning network model is constructed, by the original text sample and the corresponding multiple machine translation translations of the original text sample
Sample inputs the deep learning network model and is trained, according to the multiple machine translation translation sample of model output
Score and the score for calculating the multiple machine translation translation sample obtained calculate loss function, more by back-propagation algorithm
The parameter of the new deep learning network model saves the depth at the end of training until meeting preset trained termination condition
The parameter of learning network model is spent, MT engine assessment models are obtained.
7. device according to claim 5, which is characterized in that the training module is specifically used for:
According to the corresponding reference translation of the original text sample, the corresponding multiple machines of the original text sample are calculated using following formula
Translate the score of translation sample:
Score=(BLEU+Similarity+PosScore)/3,
Wherein, BLEU indicates the BLEU score between machine translation translation and reference translation, and Similarity indicates machine translation
Editing distance similarity between translation and reference translation, PosScore indicate the reality between machine translation translation and reference translation
Meaning Word similarity.
8. device according to claim 5, which is characterized in that further include update module, the update module is specifically used for:
The translation original text finally confirmed using the interpreter recalculates the machine translation translation for recommending the interpreter as reference translation
Score;
Loss function is calculated according to the score that the score recalculated and the MT engine assessment models export, by anti-
The MT engine assessment models are updated to propagation algorithm.
9. a kind of electronic equipment characterized by comprising
At least one processor;And
At least one processor being connect with the processor communication, in which:
The memory is stored with the program instruction that can be executed by the processor, and the processor calls described program to instruct energy
Enough methods executed as described in Claims 1-4 is any.
10. a kind of non-transient computer readable storage medium, which is characterized in that the non-transient computer readable storage medium is deposited
Computer instruction is stored up, the computer instruction makes the computer execute the method as described in Claims 1-4 is any.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811426193.1A CN109710948A (en) | 2018-11-27 | 2018-11-27 | MT engine recommended method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811426193.1A CN109710948A (en) | 2018-11-27 | 2018-11-27 | MT engine recommended method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109710948A true CN109710948A (en) | 2019-05-03 |
Family
ID=66255191
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811426193.1A Pending CN109710948A (en) | 2018-11-27 | 2018-11-27 | MT engine recommended method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109710948A (en) |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110175335A (en) * | 2019-05-08 | 2019-08-27 | 北京百度网讯科技有限公司 | The training method and device of translation model |
CN110502762A (en) * | 2019-08-27 | 2019-11-26 | 北京金山数字娱乐科技有限公司 | A kind of transcription platform and its management method |
CN110532574A (en) * | 2019-08-20 | 2019-12-03 | 语联网(武汉)信息技术有限公司 | MT engine selection method and device |
CN110543643A (en) * | 2019-08-21 | 2019-12-06 | 语联网(武汉)信息技术有限公司 | Training method and device of text translation model |
CN110674871A (en) * | 2019-09-24 | 2020-01-10 | 北京中科凡语科技有限公司 | Translation-oriented automatic scoring method and automatic scoring system |
CN110717340A (en) * | 2019-09-29 | 2020-01-21 | 百度在线网络技术(北京)有限公司 | Recommendation method and device, electronic equipment and storage medium |
CN110991193A (en) * | 2019-11-27 | 2020-04-10 | 语联网(武汉)信息技术有限公司 | Translation matrix model selection system based on OpenKiwi |
CN110991194A (en) * | 2019-11-27 | 2020-04-10 | 语联网(武汉)信息技术有限公司 | Engine optimization method based on OpenKiwi evolution and translation system |
CN111046676A (en) * | 2019-11-27 | 2020-04-21 | 语联网(武汉)信息技术有限公司 | GMM-based machine-turning engine testing method and translation toolkit |
CN111144134A (en) * | 2019-11-27 | 2020-05-12 | 语联网(武汉)信息技术有限公司 | Translation engine automatic evaluation system based on OpenKiwi |
CN111160048A (en) * | 2019-11-27 | 2020-05-15 | 语联网(武汉)信息技术有限公司 | Translation engine optimization system and method based on cluster evolution |
CN111626066A (en) * | 2020-05-27 | 2020-09-04 | 辛钧意 | Paragraph translation system and method based on big data |
CN111666776A (en) * | 2020-06-23 | 2020-09-15 | 北京字节跳动网络技术有限公司 | Document translation method and device, storage medium and electronic equipment |
CN111680526A (en) * | 2020-06-09 | 2020-09-18 | 语联网(武汉)信息技术有限公司 | Human-computer interaction translation system and method based on reverse translation result comparison |
CN111753559A (en) * | 2020-06-28 | 2020-10-09 | 语联网(武汉)信息技术有限公司 | Large-scale translation corpus task processing system under multi-source input mode |
CN111797639A (en) * | 2020-06-28 | 2020-10-20 | 语联网(武汉)信息技术有限公司 | Machine translation quality evaluation method and system |
CN111814493A (en) * | 2020-04-21 | 2020-10-23 | 北京嘀嘀无限科技发展有限公司 | Machine translation method, device, electronic equipment and storage medium |
CN112749316A (en) * | 2019-10-29 | 2021-05-04 | 阿里巴巴集团控股有限公司 | Translation quality determination method and device, storage medium and processor |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110029300A1 (en) * | 2009-07-28 | 2011-02-03 | Daniel Marcu | Translating Documents Based On Content |
CN103678285A (en) * | 2012-08-31 | 2014-03-26 | 富士通株式会社 | Machine translation method and machine translation system |
CN104731777A (en) * | 2015-03-31 | 2015-06-24 | 网易有道信息技术(北京)有限公司 | Translation evaluation method and device |
US20170132217A1 (en) * | 2015-11-06 | 2017-05-11 | Samsung Electronics Co., Ltd. | Apparatus and method for evaluating quality of automatic translation and for constructing distributed representation model |
CN107133223A (en) * | 2017-04-20 | 2017-09-05 | 南京大学 | A kind of machine translation optimization method for exploring more reference translation information automatically |
-
2018
- 2018-11-27 CN CN201811426193.1A patent/CN109710948A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110029300A1 (en) * | 2009-07-28 | 2011-02-03 | Daniel Marcu | Translating Documents Based On Content |
CN103678285A (en) * | 2012-08-31 | 2014-03-26 | 富士通株式会社 | Machine translation method and machine translation system |
CN104731777A (en) * | 2015-03-31 | 2015-06-24 | 网易有道信息技术(北京)有限公司 | Translation evaluation method and device |
US20170132217A1 (en) * | 2015-11-06 | 2017-05-11 | Samsung Electronics Co., Ltd. | Apparatus and method for evaluating quality of automatic translation and for constructing distributed representation model |
CN107133223A (en) * | 2017-04-20 | 2017-09-05 | 南京大学 | A kind of machine translation optimization method for exploring more reference translation information automatically |
Non-Patent Citations (4)
Title |
---|
D LIU: "Source-Language Features and Maximum Correlation Training for Machine Translation Evaluation", 《PROCEEDINGS OF THE CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS》 * |
J GIM6NEZ: "Heterogeneous Autmatic MT Evaluation Through Non-Parametric Metric Combination", 《PROCEEDINGS OF THE THIRD INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING》 * |
吴焕钦: "基于伪数据的机器翻译质量评估模型的训练", 《北京大学学报(自然科学版)》 * |
李良友等: "机器翻译自动评价综述", 《中文信息学报》 * |
Cited By (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110175335B (en) * | 2019-05-08 | 2023-05-09 | 北京百度网讯科技有限公司 | Translation model training method and device |
CN110175335A (en) * | 2019-05-08 | 2019-08-27 | 北京百度网讯科技有限公司 | The training method and device of translation model |
CN110532574A (en) * | 2019-08-20 | 2019-12-03 | 语联网(武汉)信息技术有限公司 | MT engine selection method and device |
CN110543643A (en) * | 2019-08-21 | 2019-12-06 | 语联网(武汉)信息技术有限公司 | Training method and device of text translation model |
CN110543643B (en) * | 2019-08-21 | 2022-11-11 | 语联网(武汉)信息技术有限公司 | Training method and device of text translation model |
CN110502762A (en) * | 2019-08-27 | 2019-11-26 | 北京金山数字娱乐科技有限公司 | A kind of transcription platform and its management method |
CN110674871A (en) * | 2019-09-24 | 2020-01-10 | 北京中科凡语科技有限公司 | Translation-oriented automatic scoring method and automatic scoring system |
CN110674871B (en) * | 2019-09-24 | 2023-04-07 | 北京中科凡语科技有限公司 | Translation-oriented automatic scoring method and automatic scoring system |
CN110717340A (en) * | 2019-09-29 | 2020-01-21 | 百度在线网络技术(北京)有限公司 | Recommendation method and device, electronic equipment and storage medium |
CN110717340B (en) * | 2019-09-29 | 2023-11-21 | 百度在线网络技术(北京)有限公司 | Recommendation method, recommendation device, electronic equipment and storage medium |
CN112749316A (en) * | 2019-10-29 | 2021-05-04 | 阿里巴巴集团控股有限公司 | Translation quality determination method and device, storage medium and processor |
CN111144134A (en) * | 2019-11-27 | 2020-05-12 | 语联网(武汉)信息技术有限公司 | Translation engine automatic evaluation system based on OpenKiwi |
CN111160048A (en) * | 2019-11-27 | 2020-05-15 | 语联网(武汉)信息技术有限公司 | Translation engine optimization system and method based on cluster evolution |
CN111046676A (en) * | 2019-11-27 | 2020-04-21 | 语联网(武汉)信息技术有限公司 | GMM-based machine-turning engine testing method and translation toolkit |
CN110991194A (en) * | 2019-11-27 | 2020-04-10 | 语联网(武汉)信息技术有限公司 | Engine optimization method based on OpenKiwi evolution and translation system |
CN110991193A (en) * | 2019-11-27 | 2020-04-10 | 语联网(武汉)信息技术有限公司 | Translation matrix model selection system based on OpenKiwi |
CN111814493A (en) * | 2020-04-21 | 2020-10-23 | 北京嘀嘀无限科技发展有限公司 | Machine translation method, device, electronic equipment and storage medium |
CN111626066A (en) * | 2020-05-27 | 2020-09-04 | 辛钧意 | Paragraph translation system and method based on big data |
CN111626066B (en) * | 2020-05-27 | 2021-04-13 | 重庆六花网络科技有限公司 | Paragraph translation system and method based on big data |
CN111680526A (en) * | 2020-06-09 | 2020-09-18 | 语联网(武汉)信息技术有限公司 | Human-computer interaction translation system and method based on reverse translation result comparison |
CN111680526B (en) * | 2020-06-09 | 2023-09-08 | 语联网(武汉)信息技术有限公司 | Man-machine interactive translation system and method based on comparison of reverse translation results |
US11580314B2 (en) | 2020-06-23 | 2023-02-14 | Beijing Bytedance Network Technology Co., Ltd. | Document translation method and apparatus, storage medium, and electronic device |
CN111666776B (en) * | 2020-06-23 | 2021-07-23 | 北京字节跳动网络技术有限公司 | Document translation method and device, storage medium and electronic equipment |
CN111666776A (en) * | 2020-06-23 | 2020-09-15 | 北京字节跳动网络技术有限公司 | Document translation method and device, storage medium and electronic equipment |
CN111797639A (en) * | 2020-06-28 | 2020-10-20 | 语联网(武汉)信息技术有限公司 | Machine translation quality evaluation method and system |
CN111753559A (en) * | 2020-06-28 | 2020-10-09 | 语联网(武汉)信息技术有限公司 | Large-scale translation corpus task processing system under multi-source input mode |
CN111753559B (en) * | 2020-06-28 | 2024-02-23 | 语联网(武汉)信息技术有限公司 | Large-scale translation corpus task processing system in multi-source input mode |
CN111797639B (en) * | 2020-06-28 | 2024-03-26 | 语联网(武汉)信息技术有限公司 | Machine translation quality assessment method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109710948A (en) | MT engine recommended method and device | |
CN108363790B (en) | Method, device, equipment and storage medium for evaluating comments | |
CN109670191B (en) | Calibration optimization method and device for machine translation and electronic equipment | |
KR20200094627A (en) | Method, apparatus, device and medium for determining text relevance | |
CN109299280B (en) | Short text clustering analysis method and device and terminal equipment | |
CN107704503A (en) | User's keyword extracting device, method and computer-readable recording medium | |
CN111310440B (en) | Text error correction method, device and system | |
CN110457708B (en) | Vocabulary mining method and device based on artificial intelligence, server and storage medium | |
CN105893410A (en) | Keyword extraction method and apparatus | |
CN108875074A (en) | Based on answer selection method, device and the electronic equipment for intersecting attention neural network | |
KR20170055970A (en) | Computer-implemented identification of related items | |
CN111046154A (en) | Information retrieval method, information retrieval device, information retrieval medium and electronic equipment | |
CN112579727B (en) | Document content extraction method and device, electronic equipment and storage medium | |
JP7430820B2 (en) | Sorting model training method and device, electronic equipment, computer readable storage medium, computer program | |
CN110874528B (en) | Text similarity obtaining method and device | |
CN110532575A (en) | Text interpretation method and device | |
CN111144120A (en) | Training sentence acquisition method and device, storage medium and electronic equipment | |
CN111563384A (en) | Evaluation object identification method and device for E-commerce products and storage medium | |
CN110210028A (en) | For domain feature words extracting method, device, equipment and the medium of speech translation text | |
JP2018025874A (en) | Text analyzer and program | |
CN109117475B (en) | Text rewriting method and related equipment | |
CN110263127A (en) | Text search method and device is carried out based on user query word | |
CN115860006A (en) | Aspect level emotion prediction method and device based on semantic syntax | |
CN112463989A (en) | Knowledge graph-based information acquisition method and system | |
WO2023029354A1 (en) | Text information extraction method and apparatus, and storage medium and computer device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190503 |
|
RJ01 | Rejection of invention patent application after publication |