CN108804428A - Correcting method, system and the relevant apparatus of term mistranslation in a kind of translation - Google Patents

Correcting method, system and the relevant apparatus of term mistranslation in a kind of translation Download PDF

Info

Publication number
CN108804428A
CN108804428A CN201810600694.0A CN201810600694A CN108804428A CN 108804428 A CN108804428 A CN 108804428A CN 201810600694 A CN201810600694 A CN 201810600694A CN 108804428 A CN108804428 A CN 108804428A
Authority
CN
China
Prior art keywords
translation
text
retroversion
former
cypher
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810600694.0A
Other languages
Chinese (zh)
Inventor
洪宇
刘梦眙
姚建民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou University
Original Assignee
Suzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou University filed Critical Suzhou University
Priority to CN201810600694.0A priority Critical patent/CN108804428A/en
Publication of CN108804428A publication Critical patent/CN108804428A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/58Use of machine translation, e.g. for multi-lingual retrieval, for server-side translation for client devices or for real-time translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/40Processing or translation of natural language
    • G06F40/55Rule-based translation

Abstract

This application discloses a kind of correcting methods of term mistranslation in translation, the correcting method obtains the former cypher text of target terms in the first translation, and candidate translation corresponding with the target terms is obtained from training set, set all candidate translations to pseudo- cypher text;Each puppet cypher text is replaced into former cypher text described in the first translation respectively and obtains N number of second translation, and retroversion is executed to the first translation and all second translations and operates to obtain N+1 retroversion text;Source text and all retroversion texts are subjected to the translation order of accuarcy that text relatively determines the first translation, and correct the former cypher text of target terms according to translation order of accuarcy.This method can realize that the mistranslation of field term in machine translation is corrected under the premise of not depending on resource in a large amount of fields.Disclosed herein as well is a kind of a kind of correcting devices of term mistranslation in the correcting system, computer readable storage medium of term mistranslation in a kind of translation and translation, have the above advantageous effect.

Description

Correcting method, system and the relevant apparatus of term mistranslation in a kind of translation
Technical field
The present invention relates to machine translation field, more particularly to the correcting method of term mistranslation, system, one kind in a kind of translation The correcting device of term mistranslation in computer readable storage medium and a kind of translation.
Background technology
Machine translation mothod refers to being turned over a kind of original text of natural language (that is, original language) using computing devices such as computers It is translated into the technology of the translation of another natural language (that is, object language).Since this translation process is completed by machine, so with Human translation is compared, can be in the relatively short a large amount of translation of time-triggered protocol.
But when having the text of more specific area technical term using machine translation mothod translation, due to universal machine The translation occurrence number of the translation or field term that lack specific area term in the training corpus of device translation system is less to be led It causes translation probability relatively low, therefore is often malfunctioned using general machine translation method to translate this class text.It is asked for above-mentioned Topic, the method that term machine translation text is corrected in the prior art are:First regard each word in the translation of output as differentiation Object, construction lexical feature, syntactic feature etc. select disaggregated model appropriate such as maximum entropy classifiers, random forest, two-way LSTM etc. labels to each word, judges correcting errors for word;The term of mistranslation is corrected if mistake.But it is this Method is in the process for correcting term mistranslation dependent on resource in a large amount of fields, in the unknown text of domain-oriented, rare language Speech resource will limit the versatility of such method.
Therefore, how under the premise of not depending on resource in a large amount of fields to realize that the mistranslation of field term is entangled in machine translation Just it is a technical problem that technical personnel in the field need to solve at present.
Invention content
The purpose of the application is to provide the correcting method of term mistranslation in a kind of translation, system, a kind of computer-readable deposits The correcting device of term mistranslation in storage media and a kind of translation, can realize machine under the premise of not depending on resource in a large amount of fields The mistranslation of field term is corrected in device translation.
In order to solve the above technical problems, the application provides a kind of correcting method of term mistranslation in translation, the correcting method Including:
The former cypher text of target terms in the first translation is obtained, and acquisition is corresponding with the target terms from training set Candidate translation, set all candidate translations to pseudo- cypher text;Wherein, first translation is translated by source text It arrives;
Each described pseudo- cypher text is replaced into former cypher text described in first translation respectively and obtains N number of second Translation, and retroversion is executed to first translation and all second translations and operates to obtain N+1 retroversion text;
The source text and all retroversion texts are subjected to text and relatively determine that the translation of first translation is accurate Degree, and according to the former cypher text translated order of accuarcy and correct the target terms.
Optionally, candidate translation corresponding with the target terms is obtained from training set, by all candidate translations Being set as pseudo- cypher text includes:
The cryptographic Hash of the target terms is calculated, and the phrase table built from the training set is subjected to Hash piecemeal, is looked into The candidate translation corresponding with the target terms is looked for, sets all candidate translations to the pseudo- cypher text.
Optionally, the former cypher text of target terms includes in the first translation of acquisition:
The word alignment information for obtaining first translation, where determining the target terms according to the word alignment information The original cypher text.
Optionally, the source text and all retroversion texts are subjected to text and relatively determine turning over for the retroversion text Order of accuarcy is translated, and includes according to the former cypher text that the translation order of accuarcy corrects the target terms:
The probabilistic language model score of all retroversion texts is calculated using language model, and determines the language model Probability score maximum value;
The retroversion text is mapped as first eigenvector, the source text is mapped as second feature vector, and will Each the COS distance between the corresponding first eigenvector of retroversion text and the second feature vector is set as described Semantic similarity between retroversion text and the source text determines the semantic similarity maximum value;
Judge that the probabilistic language model score of the corresponding retroversion text of first translation is obtained with the probabilistic language model Divide whether the difference of the score of maximum value is less than or equal to preset value, obtains the first judging result;
Judge whether the semantic similarity between the corresponding retroversion text of first translation and the source text is described Semantic similarity maximum value obtains the second judging result.
Judge whether first judging result and second judging result are no;
If being no, the former cypher text translation error is judged, and correct the former cypher text of the target terms.
Optionally, the former cypher text for correcting the target terms includes:
First translation is replaced with into corresponding second translation of the semantic similarity maximum value.
Optionally, the former cypher text for correcting the target terms includes:
All retroversion texts and the source text are represented as dependency tree, are calculated from retroversion text described in each Corresponding dependency tree is converted into minimum editor's cost of the corresponding dependency tree of source text;
Select corresponding second translation of retroversion text of minimum editor's Least-cost as optimal translation;
Or, the semantic similarity is selected to be translated with minimum editor's maximum retroversion text of cost difference corresponding second Text is used as the optimal translation;
First translation is replaced with into the optimal translation.
Present invention also provides a kind of correcting system of term mistranslation in translation, which includes:
Synonymous word acquisition module, the former cypher text for obtaining target terms in the first translation, and from training set Candidate translation corresponding with the target terms is obtained, sets all candidate translations to pseudo- cypher text;Wherein, described First translation is translated to obtain by source text;
Decoding module, for each described pseudo- cypher text to be replaced former translation text described in first translation respectively Originally N number of second translation is obtained, and retroversion is executed to first translation and all second translations and operates to obtain N+1 retroversion Text;
Module is corrected, relatively determines that described first translates for the source text and all retroversion texts to be carried out text The translation order of accuarcy of text, and according to the former cypher text translated order of accuarcy and correct the target terms.
Optionally, the synonymous word acquisition module includes:
Former cypher text determination unit, the word alignment information for obtaining first translation are believed according to the word alignment Breath determines the former cypher text where the target terms.
Pseudo- cypher text determination unit, the cryptographic Hash for calculating the target terms, and will be built from the training set Phrase table carry out Hash piecemeal, search the candidate translation corresponding with the target terms, will all candidate's translations It is set as the pseudo- cypher text.
Present invention also provides a kind of computer readable storage mediums, are stored thereon with computer program, the computer Program realizes the step of correcting method of term mistranslation in above-mentioned translation executes when executing.
It is described to deposit present invention also provides a kind of correcting device of term mistranslation in translation, including memory and processor Computer program is stored in reservoir, the processor is realized when calling the computer program in the memory in above-mentioned translation The step of correcting method of term mistranslation executes.
The present invention provides a kind of correcting methods of term mistranslation in translation, including obtain target terms in the first translation Former cypher text, and candidate translation corresponding with the target terms is obtained from training set, all candidate translations are set It is set to pseudo- cypher text;Wherein, first translation is translated to obtain by source text;Each described pseudo- cypher text is replaced respectively It changes former cypher text described in first translation and obtains N number of second translation, and to first translation and all described second Translation executes retroversion and operates to obtain N+1 retroversion text;The source text and all retroversion texts are subjected to text comparison It determines the translation order of accuarcy of first translation, and the former of the target terms is corrected according to the translation order of accuarcy and is translated Text.
Translation refers to converting the source text of the first language to the version for second of language for expressing identical semanteme, And the text that the contrary operation for executing translation again to the version of second of language obtains the first language is referred to as retroversion The process of text, this converse translation is just referred to as retroversion, if the version for second of language that translation obtains is not present Mistranslation, then retroversion text will keep higher consistency with source text.Further, due to one in a certain language Word has the different candidate translation of multiple semantemes in another language, and only directly carrying out retroversion to candidate's translation can not be true The order of accuarcy of fixed candidate's translation (may also obtain because even being wrong candidate translation progress retroversion in correct source text Former word), therefore candidate translation can be placed in the complete sentence of context and carry out retroversion and obtain retroversion text, will The translation accuracy that can evaluate some word in version is compared with source text for retroversion text, and then selects a translation The corresponding candidate translation of the highest retroversion text of accuracy is as correctly translation.Based on this, the present invention is by the knowledge of term mistranslation Other process is converted into the comparison problem between retroversion text and source text, and pseudo- translation corresponding with target terms is searched by comparing The replacement that text carries out text obtains N number of second translation, and the first translation and the second translation, which are carried out retroversion, obtains multiple retroversion texts This, the translation order of accuarcy that text relatively determines first translation is carried out by the source text and all retroversion texts, And the former cypher text of the target terms is corrected according to the translation order of accuarcy.This programme can not depend on a large amount of fields Realize that the mistranslation of field term in machine translation is corrected under the premise of interior resource.The application additionally provides art in a kind of translation simultaneously The correcting device of term mistranslation in the correcting system of language mistranslation, a kind of computer readable storage medium and a kind of translation has upper Advantageous effect is stated, details are not described herein.
Description of the drawings
In order to illustrate more clearly of the embodiment of the present application, attached drawing needed in the embodiment will be done simply below It introduces, it should be apparent that, the accompanying drawings in the following description is only some embodiments of the present application, for ordinary skill people For member, without creative efforts, other drawings may also be obtained based on these drawings.
The flow chart of the correcting method of term mistranslation in a kind of translation that Fig. 1 is provided by the embodiment of the present application;
The flow chart of the correcting method of term mistranslation in another translation that Fig. 2 is provided by the embodiment of the present application;
The structural schematic diagram of the correcting system of term mistranslation in a kind of translation that Fig. 3 is provided by the embodiment of the present application.
Specific implementation mode
To keep the purpose, technical scheme and advantage of the embodiment of the present application clearer, below in conjunction with the embodiment of the present application In attached drawing, technical solutions in the embodiments of the present application is clearly and completely described, it is clear that described embodiment is Some embodiments of the present application, instead of all the embodiments.Based on the embodiment in the application, those of ordinary skill in the art The every other embodiment obtained without making creative work, shall fall in the protection scope of this application.
Refer to Fig. 1 below, the correcting method of term mistranslation in a kind of translation that Fig. 1 is provided by the embodiment of the present application Flow chart.
Specific steps may include:
S101:The former cypher text of target terms in the first translation is obtained, and is obtained and the target art from training set The corresponding candidate translation of language sets all candidate translations to pseudo- cypher text;
Wherein, the present embodiment acquiescence is the first translation for the source text of the first language to be translated as to second of language Later, the process corrected for the mistranslation text of term in translation.The purpose of this step is to obtain mesh in the first translation Mark other candidate's translations of term, that is, pseudo- cypher text.Source text, retroversion text and target terms in the present embodiment are The first language, and former cypher text, pseudo- cypher text, the first translation and the second translation are second of language.
First, the method for obtaining the former cypher text of target terms in the first translation has very much, such as is believed by word alignment Breath compares the position of target terms in source text and the first translation to obtain former cypher text, can also be by comparing the first translation In the rare degree of all words determine former cypher text.Certainly, the former cypher text mentioned in this step is the first language The word of the target terms of speech identical semanteme in second of language, former cypher text herein can be a word, can also It is multiple contaminations.
It is well known that a word usually has depositing for multiple translations in another language in a kind of language The meaning for obtaining pseudo- cypher text in the present embodiment is that the relative accuracy of translation can be evaluated, i.e., in certain language Selection and the highest word of semantic consistency in source text in all cypher texts corresponding with target terms.Such as:It is turning over There may be such phenomenon during translating:A is translated as a1, and there are other two kinds translation a2 and a3 by A, in source text Middle a2 is the semantic word of expression A that can be best, so when a2 can be replaced a1.Wherein, from training set in this step Middle to obtain candidate translation corresponding with the target terms, setting all candidate translations to pseudo- cypher text can be specific For:The cryptographic Hash for calculating the target terms, by the phrase table built from the training set carry out Hash piecemeal, search with it is described The corresponding candidate translation of target terms sets all candidate translations to the pseudo- cypher text.
Below by one more specifically example illustrate, the process of the present embodiment:
Source text:When you want to connect to some computer time,bus network more suitable.
First translation:When you want to connect some computers, bus network is more suitable.
Target terms are bus network, and former cypher text is bus network.
Other translations of target terms are determined by training set:Public traffic network (i.e. pseudo- cypher text).
Former cypher text is replaced with into pseudo- cypher text and obtains the second translation:It is public when you want to connect some computers Hand over network more suitable.
Retroversion is carried out to the first translation and the second translation:
The corresponding retroversion text of first translation is:When you want to connect to a computer,the bus network is more appropriate.
The corresponding retroversion text of second translation is:When you want to connect to some computer time,public transportation network more suitable.
It is compared with source text, determines that the similarity degree of the corresponding retroversion text of the first translation and source text is higher, because The corresponding first translation translation of this original cypher text bus network is correct.
It should be noted that the present embodiment acquiescence has term and its training set of translation, training set can be passed through It searches and the semantic identical pseudo- cypher text of former cypher text.Source text can be obtained from the test set built in advance, structure Building the process of test set can be:Chinese and English abstract and keyword are obtained using web crawlers, is screened in the Chinese and English abstract Sentence sample with the keyword;Structure includes the test set of all sentence samples.Specifically, it can utilize Web crawlers obtains Chinese and English abstract and keyword from periodical.It first has to carry out subordinate sentence, detects sentence boundary;Secondly, right In each Chinese key, the sentence that positioning keyword occurs in Chinese is made a summary, then go in the english sentence of manipulative indexing to look into The keyword is looked for, manipulative indexing front and back can extend two windows herein, the reason is that, most people is write Chinese abstract as English It is not to translate sentence by sentence when abstract.Based on this, what the sentence pair of acquisition can be regarded as translating each other, the keyword in sentence pair can To regard term as.
S102:It each puppet cypher text is replaced into described in the first translation former cypher text respectively obtains N number of second and translate Text, and retroversion is executed to the first translation and all second translations and operates to obtain N+1 retroversion text;
Wherein, if only carrying out the obtained retroversion text of retroversion to former cypher text and source text is compared, due to this The comparison of sample lacks the reference object about comparison result, therefore can not accurately evaluate the quality of retroversion text.Therefore this step On the basis of S101, pseudo- cypher text replacement is subjected to former cypher text and has obtained N number of second translation, N is just more than 0 Integer, and the numerical value representated by N is consistent with the pseudo- quantity of cypher text.It should be noted that although retroversion is the reverse of translation Process, but be not the reverse process of translation result, even therefore the corresponding retroversion text of the first translation with source text nor Completely the same.
S103:The source text and all retroversion texts are subjected to the translation that text relatively determines first translation Order of accuarcy, and according to the former cypher text translated order of accuarcy and correct the target terms.
It is understood that language is the complicated notation being made of by certain grammer vocabulary, it includes Voice system, lexical system and grammar system, and in same language, all sentences are all to follow same syntax rule And use same lexical system.Therefore, all the first translation and the second translation the base of retroversion has been subjected in S102 On plinth, source text is compared with retroversion text to the translation that can evaluate corresponding first translation of retroversion text or the second translation Order of accuarcy.
Wherein, the translation accuracy of the first translation is determined in this step, and corrects former translation according to according to translation accuracy There are following steps for the operation acquiescence of text:
Step 1:The source text and all retroversion texts are subjected to text and relatively determine turning over for first translation Translate order of accuarcy;
Step 2:Judge to translate whether order of accuarcy meets preset standard;If so, terminating flow;If it is not, then entering step Rapid three:
Step 3:Correct the former cypher text of the target terms.
Translation refers to converting the source text of the first language to the version for second of language for expressing identical semanteme, And the text that the contrary operation for executing translation again to the version of second of language obtains the first language is referred to as retroversion The process of text, this converse translation is just referred to as retroversion, if the version for second of language that translation obtains is not present Mistranslation, then retroversion text will keep higher consistency with source text.Further, due to one in a certain language Word has the different candidate translation of multiple semantemes in another language, and only directly carrying out retroversion to candidate's translation can not be true The order of accuarcy of fixed candidate's translation (may also obtain because even being wrong candidate translation progress retroversion in correct source text Former word), therefore candidate translation can be placed in the complete sentence of context and carry out retroversion and obtain retroversion text, will The translation accuracy that can evaluate some word in version is compared with source text for retroversion text, and then selects a translation The corresponding candidate translation of the highest retroversion text of accuracy is as correctly translation.Based on this, the present embodiment is by term mistranslation Identification process is converted into the comparison problem between retroversion text and source text, is turned over by comparing lookup puppet corresponding with target terms The replacement of translation this progress text obtains N number of second translation, and the first translation and the second translation, which are carried out retroversion, obtains multiple retroversion The source text and all retroversion texts are carried out the accurate journey of translation that text relatively determines first translation by text Degree, and according to the former cypher text translated order of accuarcy and correct the target terms.The present embodiment can not depend on greatly Realize that the mistranslation of field term in machine translation is corrected in amount field under the premise of resource.
Refer to Fig. 2 below, the correcting method of term mistranslation in another translation that Fig. 2 is provided by the embodiment of the present application Flow chart;Further explanation is carried out to the operation of S103 in previous step in the present embodiment, other steps and upper one Embodiment is almost the same, can be with cross-reference, and details are not described herein again.
Specific steps may include:
S201:The word alignment information for obtaining first translation determines the target terms according to the word alignment information The former cypher text at place;
S202:The cryptographic Hash of the target terms is calculated, and the phrase table built from the training set is subjected to Hash point Block searches the candidate translation corresponding with the target terms, sets all candidate translations to the pseudo- translation text This.
S203:Each described pseudo- cypher text is replaced into former cypher text described in first translation respectively and obtains N A second translation, and retroversion is executed to first translation and all second translations and operates to obtain N+1 retroversion text;
S204:The probabilistic language model score of all retroversion texts is calculated using language model, and determines institute's predicate Say model probability score maximum value;
Specifically, ngram language models are trained by extensive single language English language material, for each retroversion text, participle As a result it is w1,w2,...,wn, computational language model probability score logp (w1,w2,...,wn), wherein p is that the sentence is according to just The possibility of Chang Yuyan.When using 5gram language models
M is 4.
S205:The retroversion text is mapped as first eigenvector, the source text is mapped as second feature vector, And it sets the COS distance between each corresponding first eigenvector of retroversion text and the second feature vector to Semantic similarity between the retroversion text and the source text determines the semantic similarity maximum value;
The process in S207 is specifically illustrated, continuous three sentence (s are giveni-1,si,si+1), it enablesIndicate sentence Sub- siIn t-th of word,For corresponding term vector.Following formula gives sentence siCataloged procedure.
rt=σ (Wrxt+Urht-1)
zt=σ (Wzxt+Uzht-1)
ztAnd rtRespectively update door and resetting door.Update door is worked as controlling the status information of previous moment and being brought into Degree in preceding state, updating the value of door, bigger to illustrate that the status information of previous moment is brought into more.Resetting door is neglected for controlling The slightly degree of the status information of previous moment, the smaller explanation of value for resetting door is ignored more, and h indicates the hiding shape at each moment State,Currently to remember content, ⊙ is Hadamard products, i.e. matrix corresponding element product.
Decoder is a kind of output h with encoderiFor the neural language model of condition.Calculating process is similar with coding, The difference is that introducing Matrix Cz, CrAnd C, the calculating for being biased update door with sentence vector, being reset door and hidden state. Need two decoders to surrounding sentence si-1And si+1It is decoded respectively, with sentence si+1For, it enablesFor hiding for t moment State, formula below give sentence si+1Decoding process.
The probability of t-th of word is:
Given tuple (si-1,si,si+1), object function is intended to using the hidden state of current sentence as condition, and optimization is previous The sum of the log probability of a sentence and the latter sentence, as shown by the following formula, total losses function is in all instructions to mathematical definition Practice the sum of the object function on sample.
After the model for obtaining pre-training, retroversion translation and source text are distinguished into input model, the last one list of acquisition sentence The hidden layer state of word can indicate entire sentence.Two vectorial semantic cosine similarities are calculated, numerical value is bigger, represents two sentences It is sub semantic more close.
S206:Judge that probabilistic language model score and the language model of the corresponding retroversion text of first translation are general Whether the difference of the score of rate score maximum value is less than or equal to preset value, obtains the first judging result, and enter S208;
S207:Judge the semantic similarity between the corresponding retroversion text of first translation and the source text whether be The semantic similarity maximum value obtains the second judging result, and enters S208;
S208:Judge whether first judging result and second judging result are no;If being no, enter S209;If not being no, terminate flow.
Specifically, the condition into S209 is:The probabilistic language model score and language of the corresponding retroversion text of first translation 0.015) and first it says and the difference of model probability score maximum value is more than preset value (as a preferred option, which can be Semantic similarity between the corresponding retroversion text of translation and the source text is not the maximum value of semantic similarity, should if meeting It the phenomenon that condition then illustrates the former cypher text in the first translation about target terms there are mistranslation, is corrected.
S209:Judge the former cypher text translation error, and corrects the former cypher text of the target terms.
Wherein, this step is built upon have determined former cypher text there are mistranslation on the basis of, to former cypher text into Row is corrected.Wherein, correct the former cypher text of target terms method can there are many kinds of, be exemplified below three kinds it is preferred Correcting method:
Correcting method one:First translation is replaced with into corresponding second translation of the semantic similarity maximum value.
Correcting method two:All retroversion texts and the source text are represented as dependency tree, are calculated from each The corresponding dependency tree of the retroversion text is converted into minimum editor's cost of the corresponding dependency tree of source text;Selection minimum editor's generation Corresponding second translation of retroversion text of valence minimum replaces with the optimal translation as optimal translation and by first translation.
Correcting method three:All retroversion texts and the source text are represented as dependency tree, are calculated from each The corresponding dependency tree of the retroversion text is converted into minimum editor's cost of the corresponding dependency tree of source text;Select the semantic phase Like degree minimum corresponding second translation of the maximum retroversion text of cost difference is edited with described as the optimal translation and by institute It states the first translation and replaces with the optimal translation.
It should be noted that calculations set forth above is converted into source document from the corresponding dependency tree of retroversion text described in each The process of minimum editor's cost of this corresponding dependency tree is as follows:
Specifically, using the dependency tree of Stamford natural language processing tool analysis retroversion text and source text.Interdependent In tree, each node is expressed as three fields:Root, part-of-speech tagging, the dependence with father node.Basic edit operation Be redefined nine types, the first six kind (INS_LEAF, INS_SUBTREE, INS, DEL_LEAF, DEL_SUBTREE, DEL) be inserted into or delete a leafy node, one whole subtree or any one node neither leafy node nor in subtree A part.Three kinds of (REN_POS, REN_DEP, REN_POS_DEP) renaming part-of-speech taggings, dependences or both are complete afterwards Portion's renaming.
When beginning, it is 1.0 to enable the cost of each basic edit operation, this makes the insert or delete operation of entire node Cost is 3 (three fields are entirely insertable or delete).The renaming of part-of-speech tagging or relationship type is allowed to be tied and if only if source Point and the root of destination node are identical.It is not present in edit script if two nodes are identical, or because identical word Root is renamed, and claiming the two nodes is aligned by tree edit model.In addition, the renaming cost of stop words is revised as 2.5, no matter whether two stop words have identical part-of-speech tagging or relationship type.The reason is that, stop words often have it is fixed Part-of-speech tagging and dependence, therefore compared with renaming notional word, can be aligned with lower cost.
If the term that translation system exports in translation is wrong, then carries out retroversion, the syntax knot of retroversion text to translation Structure is likely to different from source text.Based on this, calculates the tree between retroversion text and source text dependency tree using the above method and compile Volume distance, apart from smaller, the syntactic structure similarity of two sentences is higher.
In order to examine the validity of put forward field term mistranslation recognition methods herein, evaluation index selects accuracy rate PR, fixed Justice is as follows:
Wherein, molecule #of correctly translated terms are all source text (test sets in above-mentioned formula In all texts) in the term number correctly translated, denominator Total#of terms are the total of term in all source texts Number.
Refer to Fig. 3, the structure of the correcting system of term mistranslation in a kind of translation that Fig. 3 is provided by the embodiment of the present application Schematic diagram;
The correcting system may include:
Synonymous word acquisition module 100, for obtaining the former cypher texts of target terms in the first translation, and from training set It is middle to obtain candidate translation corresponding with the target terms, set all candidate translations to pseudo- cypher text;Wherein, institute The first translation is stated to translate to obtain by source text;
Decoding module 200 is turned over for each described pseudo- cypher text to be replaced original described in first translation respectively Translation originally obtains N number of second translation, and executes retroversion to first translation and all second translations and operate to obtain N+1 Retroversion text;
Module 300 is corrected, described the is relatively determined for the source text and all retroversion texts to be carried out text The translation order of accuarcy of one translation, and according to the former cypher text translated order of accuarcy and correct the target terms.
Further, synonymous word acquisition module 100 includes:
Former cypher text determination unit, the word alignment information for obtaining first translation are believed according to the word alignment Breath determines the former cypher text where the target terms.
Pseudo- cypher text determination unit, the cryptographic Hash for calculating the target terms, and will be built from the training set Phrase table carry out Hash piecemeal, search the candidate translation corresponding with the target terms, will all candidate's translations It is set as the pseudo- cypher text.
Further, correcting module 300 includes:
Probabilistic language model score calculation unit, the language mould for calculating all retroversion texts using language model Type probability score, and determine the probabilistic language model score maximum value;
Semantic Similarity Measurement unit, for the retroversion text to be mapped as first eigenvector, by the source text It is mapped as second feature vector, and incites somebody to action the corresponding first eigenvector of each retroversion text and the second feature vector Between COS distance be set as the semantic similarity between the retroversion text and the source text, determine the semantic similarity Maximum value;
First evaluation unit, the probabilistic language model score for judging the corresponding retroversion text of first translation and institute Whether the difference for stating the score of probabilistic language model score maximum value is less than or equal to preset value, obtains the first judging result;
Second evaluation unit, for judging the semanteme between the corresponding retroversion text of first translation and the source text Whether similarity is the semantic similarity maximum value, obtains the second judging result.
Judging unit, for judging whether first judging result and second judging result are no;
Text corrects unit, for when first judging result and second judging result are no, then judging The original cypher text translation error, and correct the former cypher text of the target terms.
Further, the text correction unit includes:
First corrects subelement, for first translation to be replaced with the semantic similarity maximum value corresponding second Translation.
Cost computation subunit is edited, for all retroversion texts and the source text to be represented as dependency tree, Calculate minimum editor's cost that the corresponding dependency tree of source text is converted into from the corresponding dependency tree of retroversion text described in each;
Or, second corrects subelement, for selecting corresponding second translation of retroversion text of minimum editor's Least-cost to make The optimal translation is replaced with for optimal translation and by first translation.
Or, second corrects subelement, for selecting the semantic similarity and minimum editor's cost difference maximum Corresponding second translation of retroversion text replaces with the optimal translation as the optimal translation and by first translation.
Since the embodiment of components of system as directed is corresponded with the embodiment of method part, the embodiment of components of system as directed is asked Referring to the description of the embodiment of method part, wouldn't repeat here.
Present invention also provides a kind of computer readable storage mediums, have computer program thereon, the computer program It is performed and the step of above-described embodiment is provided may be implemented.The storage medium may include:USB flash disk, read-only is deposited mobile hard disk Reservoir (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disc or The various media that can store program code such as CD.
Present invention also provides a kind of correcting devices of term mistranslation in translation, may include memory and processor, institute It states and has computer program in memory, when the processor calls the computer program in the memory, may be implemented State the step of embodiment is provided.The correcting device of term mistranslation can also include various network interfaces in certain translation, The components such as power supply.
Each embodiment is described by the way of progressive in specification, the highlights of each of the examples are with other realities Apply the difference of example, just to refer each other for identical similar portion between each embodiment.For system disclosed in embodiment Speech, since it is corresponded to the methods disclosed in the examples, so description is fairly simple, related place is referring to method part illustration ?.It should be pointed out that for those skilled in the art, under the premise of not departing from the application principle, also Can to the application, some improvement and modification can also be carried out, these improvement and modification also fall into the application scope of the claims It is interior.
It should also be noted that, in the present specification, relational terms such as first and second and the like be used merely to by One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation Between there are any actual relationship or orders.Moreover, the terms "include", "comprise" or its any other variant meaning Covering non-exclusive inclusion, so that the process, method, article or equipment including a series of elements includes not only that A little elements, but also include other elements that are not explicitly listed, or further include for this process, method, article or The intrinsic element of equipment.Under the situation not limited more, the element limited by sentence "including a ..." is not arranged Except there is also other identical elements in the process, method, article or apparatus that includes the element.

Claims (10)

1. the correcting method of term mistranslation in a kind of translation, which is characterized in that including:
The former cypher text of target terms in the first translation is obtained, and obtains time corresponding with the target terms from training set Choosing translation sets all candidate translations to pseudo- cypher text;Wherein, first translation is translated to obtain by source text;
It each described pseudo- cypher text is replaced into former cypher text described in first translation respectively obtains N number of second and translate Text, and retroversion is executed to first translation and all second translations and operates to obtain N+1 retroversion text;
The source text and all retroversion texts are subjected to the translation order of accuarcy that text relatively determines first translation, And the former cypher text of the target terms is corrected according to the translation order of accuarcy.
2. correcting method according to claim 1, which is characterized in that obtained from training set corresponding with the target terms Candidate's translation, setting all candidate translations to pseudo- cypher text includes:
Calculate the cryptographic Hash of the target terms, and the phrase table built from the training set be subjected to Hash piecemeal, search with The corresponding candidate translation of the target terms sets all candidate translations to the pseudo- cypher text.
3. correcting method according to claim 1, which is characterized in that obtain the former cypher text of target terms in the first translation Including:
The word alignment information for obtaining first translation, determined where the target terms according to the word alignment information described in Former cypher text.
4. correcting method according to claim 1, which is characterized in that carry out the source text and all retroversion texts Text relatively determines the translation order of accuarcy of the retroversion text, and corrects the target terms according to the translation order of accuarcy Former cypher text include:
The probabilistic language model score of all retroversion texts is calculated using language model, and determines the probabilistic language model Score maximum value;
The retroversion text is mapped as first eigenvector, the source text is mapped as second feature vector, and will be each COS distance between the corresponding first eigenvector of retroversion text and the second feature vector is set as the retroversion Semantic similarity between text and the source text determines the semantic similarity maximum value;
Judge the probabilistic language model score of the corresponding retroversion text of first translation with the probabilistic language model score most Whether the difference for the score being worth greatly is less than or equal to preset value, obtains the first judging result;
Judge whether the semantic similarity between the corresponding retroversion text of first translation and the source text is the semanteme Similarity maximum value obtains the second judging result;
Judge whether first judging result and second judging result are no;
If being no, the former cypher text translation error is judged, and correct the former cypher text of the target terms.
5. correcting method according to claim 4, which is characterized in that the former cypher text for correcting the target terms includes:
First translation is replaced with into corresponding second translation of the semantic similarity maximum value.
6. correcting method according to claim 4, which is characterized in that the former cypher text for correcting the target terms includes:
All retroversion texts and the source text are represented as dependency tree, calculating is corresponded to from retroversion text described in each Dependency tree be converted into minimum editor's cost of the corresponding dependency tree of source text;
Select corresponding second translation of retroversion text of minimum editor's Least-cost as optimal translation;
Or, the semantic similarity the second translation corresponding with minimum editor's maximum retroversion text of cost difference is selected to make For the optimal translation;
First translation is replaced with into the optimal translation.
7. the correcting system of term mistranslation in a kind of translation, which is characterized in that including:
Synonymous word acquisition module, the former cypher text for obtaining target terms in the first translation, and obtained from training set Candidate translation corresponding with the target terms sets all candidate translations to pseudo- cypher text;Wherein, described first Translation is translated to obtain by source text;
Decoding module is obtained for each described pseudo- cypher text to be replaced former cypher text described in first translation respectively To N number of second translation, and retroversion is executed to first translation and all second translations and operates to obtain N+1 retroversion text This;
Module is corrected, first translation is relatively determined for the source text and all retroversion texts to be carried out text Order of accuarcy is translated, and corrects the former cypher text of the target terms according to the translation order of accuarcy.
8. correcting system according to claim 7, which is characterized in that the synonymous word acquisition module includes:
Former cypher text determination unit, the word alignment information for obtaining first translation are true according to the word alignment information The former cypher text where the fixed target terms;
Pseudo- cypher text determination unit, the cryptographic Hash for calculating the target terms, and it is short by being built from the training set Language table carries out Hash piecemeal, searches the candidate translation corresponding with the target terms, by all candidate translation settings For the pseudo- cypher text.
9. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium Program is realized when the computer program is executed by processor such as term mistranslation in claim 1 to 6 any one of them translation Correcting method the step of.
10. the correcting device of term mistranslation in a kind of translation, which is characterized in that including:
Memory, for storing computer program;
Processor is executed when for executing the computer program as term is wrong in claim 1 to 6 any one of them translation The step of correcting method translated.
CN201810600694.0A 2018-06-12 2018-06-12 Correcting method, system and the relevant apparatus of term mistranslation in a kind of translation Pending CN108804428A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810600694.0A CN108804428A (en) 2018-06-12 2018-06-12 Correcting method, system and the relevant apparatus of term mistranslation in a kind of translation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810600694.0A CN108804428A (en) 2018-06-12 2018-06-12 Correcting method, system and the relevant apparatus of term mistranslation in a kind of translation

Publications (1)

Publication Number Publication Date
CN108804428A true CN108804428A (en) 2018-11-13

Family

ID=64085477

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810600694.0A Pending CN108804428A (en) 2018-06-12 2018-06-12 Correcting method, system and the relevant apparatus of term mistranslation in a kind of translation

Country Status (1)

Country Link
CN (1) CN108804428A (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109582977A (en) * 2018-11-20 2019-04-05 科大讯飞股份有限公司 A kind of interactive text interpretation method and device
CN109858042A (en) * 2018-11-20 2019-06-07 科大讯飞股份有限公司 A kind of determination method and device of translation quality
CN110807338A (en) * 2019-11-08 2020-02-18 北京中献电子技术开发有限公司 English-Chinese machine translation term consistency self-correcting system and method
CN111191468A (en) * 2019-12-17 2020-05-22 语联网(武汉)信息技术有限公司 Term replacement method and device
CN111385612A (en) * 2018-12-28 2020-07-07 深圳Tcl数字技术有限公司 Television playing method based on hearing-impaired people, smart television and storage medium
CN111597826A (en) * 2020-05-15 2020-08-28 苏州七星天专利运营管理有限责任公司 Method for processing terms in auxiliary translation
CN111652006A (en) * 2020-06-09 2020-09-11 北京中科凡语科技有限公司 Computer-aided translation method and device
CN111680524A (en) * 2020-06-09 2020-09-18 语联网(武汉)信息技术有限公司 Human-machine feedback translation method and system based on reverse matrix analysis
CN111680526A (en) * 2020-06-09 2020-09-18 语联网(武汉)信息技术有限公司 Human-computer interaction translation system and method based on reverse translation result comparison
CN111898387A (en) * 2019-05-06 2020-11-06 阿里巴巴集团控股有限公司 Translation method and device, storage medium and computer equipment
CN111916050A (en) * 2020-08-03 2020-11-10 北京字节跳动网络技术有限公司 Speech synthesis method, speech synthesis device, storage medium and electronic equipment
CN112084301A (en) * 2020-08-11 2020-12-15 网易有道信息技术(北京)有限公司 Training method and device of text correction model and text correction method and device
CN112528683A (en) * 2020-12-23 2021-03-19 深圳市爱科云通科技有限公司 Text translation correction method, device, system, server and readable storage medium
CN112667208A (en) * 2020-12-22 2021-04-16 深圳壹账通智能科技有限公司 Translation error recognition method and device, computer equipment and readable storage medium
CN112765968A (en) * 2021-01-05 2021-05-07 网易有道信息技术(北京)有限公司 Grammar error correction method and training method and product for grammar error correction model
WO2021092730A1 (en) * 2019-11-11 2021-05-20 深圳市欢太科技有限公司 Digest generation method and apparatus, electronic device, and storage medium
CN113051937A (en) * 2021-03-19 2021-06-29 北京大米科技有限公司 Machine error correction method, device, electronic equipment and readable storage medium
CN115455981A (en) * 2022-11-11 2022-12-09 合肥智能语音创新发展有限公司 Semantic understanding method, device, equipment and storage medium for multi-language sentences
CN115688703A (en) * 2022-10-31 2023-02-03 国网山东省电力公司烟台供电公司 Specific field text error correction method, storage medium and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140288915A1 (en) * 2013-03-19 2014-09-25 Educational Testing Service Round-Trip Translation for Automated Grammatical Error Correction
CN106202059A (en) * 2015-05-25 2016-12-07 松下电器(美国)知识产权公司 Machine translation method and machine translation apparatus

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140288915A1 (en) * 2013-03-19 2014-09-25 Educational Testing Service Round-Trip Translation for Automated Grammatical Error Correction
CN106202059A (en) * 2015-05-25 2016-12-07 松下电器(美国)知识产权公司 Machine translation method and machine translation apparatus

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
MENGYI LIU ET AL.: "Terminology Translation Error Identification and Correction", 《CHINESE NATION CONFERENCE ON SOCIAL MEDIA PROCESSING》 *

Cited By (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109858042A (en) * 2018-11-20 2019-06-07 科大讯飞股份有限公司 A kind of determination method and device of translation quality
CN109582977A (en) * 2018-11-20 2019-04-05 科大讯飞股份有限公司 A kind of interactive text interpretation method and device
CN109582977B (en) * 2018-11-20 2023-06-02 科大讯飞股份有限公司 Interactive text translation method and device
CN109858042B (en) * 2018-11-20 2024-02-20 科大讯飞股份有限公司 Translation quality determining method and device
CN111385612A (en) * 2018-12-28 2020-07-07 深圳Tcl数字技术有限公司 Television playing method based on hearing-impaired people, smart television and storage medium
CN111898387A (en) * 2019-05-06 2020-11-06 阿里巴巴集团控股有限公司 Translation method and device, storage medium and computer equipment
CN110807338A (en) * 2019-11-08 2020-02-18 北京中献电子技术开发有限公司 English-Chinese machine translation term consistency self-correcting system and method
CN110807338B (en) * 2019-11-08 2022-03-04 北京中献电子技术开发有限公司 English-Chinese machine translation term consistency self-correcting system and method
WO2021092730A1 (en) * 2019-11-11 2021-05-20 深圳市欢太科技有限公司 Digest generation method and apparatus, electronic device, and storage medium
CN111191468B (en) * 2019-12-17 2023-08-25 语联网(武汉)信息技术有限公司 Term replacement method and device
CN111191468A (en) * 2019-12-17 2020-05-22 语联网(武汉)信息技术有限公司 Term replacement method and device
CN111597826B (en) * 2020-05-15 2021-10-01 苏州七星天专利运营管理有限责任公司 Method for processing terms in auxiliary translation
CN111597826A (en) * 2020-05-15 2020-08-28 苏州七星天专利运营管理有限责任公司 Method for processing terms in auxiliary translation
CN111680526A (en) * 2020-06-09 2020-09-18 语联网(武汉)信息技术有限公司 Human-computer interaction translation system and method based on reverse translation result comparison
CN111652006B (en) * 2020-06-09 2021-02-09 北京中科凡语科技有限公司 Computer-aided translation method and device
CN111680524A (en) * 2020-06-09 2020-09-18 语联网(武汉)信息技术有限公司 Human-machine feedback translation method and system based on reverse matrix analysis
CN111652006A (en) * 2020-06-09 2020-09-11 北京中科凡语科技有限公司 Computer-aided translation method and device
CN111680526B (en) * 2020-06-09 2023-09-08 语联网(武汉)信息技术有限公司 Man-machine interactive translation system and method based on comparison of reverse translation results
CN111916050A (en) * 2020-08-03 2020-11-10 北京字节跳动网络技术有限公司 Speech synthesis method, speech synthesis device, storage medium and electronic equipment
CN112084301B (en) * 2020-08-11 2023-12-15 网易有道信息技术(北京)有限公司 Training method and device for text correction model, text correction method and device
CN112084301A (en) * 2020-08-11 2020-12-15 网易有道信息技术(北京)有限公司 Training method and device of text correction model and text correction method and device
CN112667208A (en) * 2020-12-22 2021-04-16 深圳壹账通智能科技有限公司 Translation error recognition method and device, computer equipment and readable storage medium
CN112528683A (en) * 2020-12-23 2021-03-19 深圳市爱科云通科技有限公司 Text translation correction method, device, system, server and readable storage medium
CN112765968A (en) * 2021-01-05 2021-05-07 网易有道信息技术(北京)有限公司 Grammar error correction method and training method and product for grammar error correction model
CN113051937A (en) * 2021-03-19 2021-06-29 北京大米科技有限公司 Machine error correction method, device, electronic equipment and readable storage medium
CN115688703A (en) * 2022-10-31 2023-02-03 国网山东省电力公司烟台供电公司 Specific field text error correction method, storage medium and device
CN115688703B (en) * 2022-10-31 2024-03-12 国网山东省电力公司烟台供电公司 Text error correction method, storage medium and device in specific field
CN115455981A (en) * 2022-11-11 2022-12-09 合肥智能语音创新发展有限公司 Semantic understanding method, device, equipment and storage medium for multi-language sentences
CN115455981B (en) * 2022-11-11 2024-03-19 合肥智能语音创新发展有限公司 Semantic understanding method, device and equipment for multilingual sentences and storage medium

Similar Documents

Publication Publication Date Title
CN108804428A (en) Correcting method, system and the relevant apparatus of term mistranslation in a kind of translation
US7636657B2 (en) Method and apparatus for automatic grammar generation from data entries
CN111241294A (en) Graph convolution network relation extraction method based on dependency analysis and key words
CN109857846B (en) Method and device for matching user question and knowledge point
JP2005115328A (en) Slot for rule-based grammar and statistical model for preterminal in natural language understanding (nlu) system
JP2008165786A (en) Sequence classification for machine translation
JP2008165783A (en) Discriminative training for model for sequence classification
CN106570180A (en) Artificial intelligence based voice searching method and device
CN112052324A (en) Intelligent question answering method and device and computer equipment
CN110930993A (en) Specific field language model generation method and voice data labeling system
CN111339269A (en) Knowledge graph question-answer training and application service system with automatically generated template
CN115357719A (en) Power audit text classification method and device based on improved BERT model
CN112417823B (en) Chinese text word order adjustment and word completion method and system
CN111581968A (en) Training method, recognition method, system, device and medium for spoken language understanding model
CN113821605A (en) Event extraction method
CN116151132A (en) Intelligent code completion method, system and storage medium for programming learning scene
JP3669870B2 (en) Optimal template pattern search method, search device, and recording medium
Quick Learning production probabilities for musical grammars
Carter et al. Syntactic discriminative language model rerankers for statistical machine translation
CN113868382A (en) Method and device for extracting structured knowledge from Chinese natural language
Wang et al. Combination of CFG and n-gram modeling in semantic grammar learning.
Ács et al. Evaluating contextualized language models for hungarian
JP6772394B1 (en) Information learning device, information processing device, information learning method, information processing method and program
CN116483314A (en) Automatic intelligent activity diagram generation method
Le et al. Automatic quality estimation for speech translation using joint ASR and MT features

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20181113